Re: [HACKERS] Interval for launching the table sync worker

2017-04-28 Thread Peter Eisentraut
On 4/27/17 21:20, Masahiko Sawada wrote:
> Isn't it better to use  != NIL instead as follows?
> 
>else if (table_state != NIL && last_start_times)

I'm not a fan of that in general, and it doesn't really add any clarity
here.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-28 Thread Peter Eisentraut
On 4/27/17 15:33, Petr Jelinek wrote:
> On 27/04/17 21:00, Peter Eisentraut wrote:
>> On 4/27/17 06:47, Petr Jelinek wrote:
>>> One thing I am missing in your patch however is cleanup of entries for
>>> relations that finished sync. I wonder if it would be enough to just
>>> destroy the hash when we get to empty list.
>>
>> I had omitted that because the amount of memory "leaked" is not much,
>> but I guess it wouldn't hurt to clean it up.
>>
>> How about the attached?
> 
> Yes that's more or less what I meant. All good.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-28 Thread Masahiko Sawada
On Fri, Apr 28, 2017 at 5:26 PM, Kyotaro HORIGUCHI
 wrote:
> At Fri, 28 Apr 2017 10:20:48 +0900, Masahiko Sawada  
> wrote in 

Re: [HACKERS] Interval for launching the table sync worker

2017-04-28 Thread Kyotaro HORIGUCHI
At Fri, 28 Apr 2017 10:20:48 +0900, Masahiko Sawada  
wrote in 

Re: [HACKERS] Interval for launching the table sync worker

2017-04-27 Thread Masahiko Sawada
On Fri, Apr 28, 2017 at 4:00 AM, Peter Eisentraut
 wrote:
> On 4/27/17 06:47, Petr Jelinek wrote:
>> One thing I am missing in your patch however is cleanup of entries for
>> relations that finished sync. I wonder if it would be enough to just
>> destroy the hash when we get to empty list.
>
> I had omitted that because the amount of memory "leaked" is not much,
> but I guess it wouldn't hurt to clean it up.
>
> How about the attached?
>

Thank you for updating patch!

+   /*
+* Clean up the hash table when we're done with all tables (just to
+* release the bit of memory).
+*/
+   else if (!table_states && last_start_times)
+   {

Isn't it better to use  != NIL instead as follows?

   else if (table_state != NIL && last_start_times)


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-27 Thread Petr Jelinek
On 27/04/17 21:00, Peter Eisentraut wrote:
> On 4/27/17 06:47, Petr Jelinek wrote:
>> One thing I am missing in your patch however is cleanup of entries for
>> relations that finished sync. I wonder if it would be enough to just
>> destroy the hash when we get to empty list.
> 
> I had omitted that because the amount of memory "leaked" is not much,
> but I guess it wouldn't hurt to clean it up.
> 
> How about the attached?
> 

Yes that's more or less what I meant. All good.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-27 Thread Peter Eisentraut
On 4/27/17 06:47, Petr Jelinek wrote:
> One thing I am missing in your patch however is cleanup of entries for
> relations that finished sync. I wonder if it would be enough to just
> destroy the hash when we get to empty list.

I had omitted that because the amount of memory "leaked" is not much,
but I guess it wouldn't hurt to clean it up.

How about the attached?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


v2-0001-Wait-between-tablesync-worker-restarts.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-27 Thread Petr Jelinek
On 25/04/17 19:54, Peter Eisentraut wrote:
> I feel it's getting a bit late for reworkings of this extent, also
> considering the marginal nature of the problem we are trying to fix.  My
> patch from April 18 is very localized and gets the job done.
> 
> I think this is still a good direction to investigate, but if we have to
> extend the hash table API to get it done, this might not be the best time.
> 

Yeah the hash API change needed is a bummer at this stage.

One thing I am missing in your patch however is cleanup of entries for
relations that finished sync. I wonder if it would be enough to just
destroy the hash when we get to empty list.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-25 Thread Peter Eisentraut
On 4/20/17 15:36, Peter Eisentraut wrote:
> On 4/19/17 23:02, Noah Misch wrote:
>> This PostgreSQL 10 open item is past due for your status update.  Kindly send
>> a status update within 24 hours, and include a date for your subsequent 
>> status
>> update.  Refer to the policy on open item ownership:
>> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> 
> This is currently on the back burner relative to other issues.  Next
> check-in from me by Monday.

Update: I think there are some light disagreements about whether to go
for a simple localized patch or a more extensive rework.  If the latter
camp doesn't convince me, I'll commit the former solution by Friday.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-25 Thread Peter Eisentraut
I feel it's getting a bit late for reworkings of this extent, also
considering the marginal nature of the problem we are trying to fix.  My
patch from April 18 is very localized and gets the job done.

I think this is still a good direction to investigate, but if we have to
extend the hash table API to get it done, this might not be the best time.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-25 Thread Peter Eisentraut
On 4/21/17 09:59, Petr Jelinek wrote:
> Rereading the code again, it's actually not bug as we update the rstate
> to what syncworker says, but it's obviously confusing so probably still
> worth to commit that.

We don't have the syncworker->relmutex at that point, so it's probably
better to read the previously-copied value in rstate->state.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-25 Thread Kyotaro HORIGUCHI
Hello,

At Tue, 25 Apr 2017 00:52:09 +0900, Masahiko Sawada  
wrote in 
> + elog(ERROR, "this hash hashtable \"%s\" is not frozen",
> + hashp->tabname);
> 
> Maybe the error message should be "this hashtable \"%s\" is not frozen".

Both of "hashtable" and "hash table" appear there but hash_freeze
uses the former. I followed that.

> s/leavs/leaves/
> s/freezed/frozen/
> s/rurn/run/

Thanks! But the "rurn" was a typo of "turn".


> On Tue, Apr 25, 2017 at 1:42 AM, Petr Jelinek
>  wrote:
> > On 24/04/17 17:52, Masahiko Sawada wrote:
> >> On Mon, Apr 24, 2017 at 4:41 PM, Kyotaro HORIGUCHI
> >>  wrote:
> >> + /*
> >> + * Remove entries no longer necessary. The flag signals nothing if
> >> + * subrel_local_state is not updated above. We can remove entries in
> >> + * frozen hash safely.
> >> + */
> >> + if (local_state_updated && !wstate->alive)
> >> + {
> >> + hash_search(subrel_local_state, >rs.relid,
> >> + HASH_REMOVE, NULL);
> >> + continue;
> >> + }
> >>
> >> IIUC since the apply worker can change the status from
> >> SUBREL_STATE_SYNCWAIT to SUBREL_STATE_READY in a hash_seq_search loop
> >> the table sync worker which is changed to SUBREL_STATE_READY by the
> >> apply worker before updating the subrel_local_state could be remained
> >> in the hash table.

Every time after pg_subscription_rel is updated, the hash entries
are marked alive only when the corresponding not-ready relations
found in pg_subscription_rel. If any live entries remains, nrels
becomes a positive number and dead entries are removed in the
loop just after. If no entry survives, the hash will be
immediately destroyed. Some dead entries can survive under
ceratin condition but the one of the aboves will occur shortly.

If it is hard to understand, I might should put some additional
comments.

> >>I think that we should scan pg_subscription_rel by
> >> using only a condition "subid".
> >>
> >
> > I don't follow this.
> >
> 
> Hmm, I'd misunderstood something. It should work fine. Sorry for the noise.

Anyway, the typo-fixed version is attached.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From c79c2c47a325e7b03e217cfed9cd4ac50ec22423 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Mon, 24 Apr 2017 16:06:10 +0900
Subject: [PATCH 1/2] Add unfrozen feature to dynahash

Scans on unfrozen hash cannot live beyond a transaction lifetime.  A
hash can be frozen to allow modify-free sequential scan which can live
ybeyond transactions but there's no means to release the status. This
patch adds a new function hash_unfreeze to allow us to reset frozen
hashes to insertable state.
---
 src/backend/utils/hash/dynahash.c | 20 
 src/include/utils/hsearch.h   |  1 +
 2 files changed, 21 insertions(+)

diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index 12b1658..e253ab2 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -1334,6 +1334,9 @@ hash_get_num_entries(HTAB *hashp)
  * wherein it is inconvenient to track whether a scan is still open, and
  * there's no possibility of further insertions after readout has begun.
  *
+ * NOTE: it is possible to unfreeze a frozen hash. All running sequential
+ * scans must be abandoned by the scanners' responsibility.
+ *
  * NOTE: to use this with a partitioned hashtable, caller had better hold
  * at least shared lock on all partitions of the table throughout the scan!
  * We can cope with insertions or deletions by our own backend, but *not*
@@ -1456,6 +1459,23 @@ hash_freeze(HTAB *hashp)
 	hashp->frozen = true;
 }
 
+/*
+ * hash_unfreeze
+ *			Reset the freeze state of a hashtable
+ *
+ * This re-enables insertion to a hash again. Active sequentual scans started
+ * during the freeze must not continue here after.
+ */
+void
+hash_unfreeze(HTAB *hashp)
+{
+	Assert (!hashp->isshared);
+	if (!hashp->frozen)
+		elog(ERROR, "this hashtable \"%s\" is not frozen",
+			 hashp->tabname);
+	hashp->frozen = false;
+}
+
 
 /* UTILITIES /
 
diff --git a/src/include/utils/hsearch.h b/src/include/utils/hsearch.h
index 7964087..60920d4 100644
--- a/src/include/utils/hsearch.h
+++ b/src/include/utils/hsearch.h
@@ -136,6 +136,7 @@ extern void hash_seq_init(HASH_SEQ_STATUS *status, HTAB *hashp);
 extern void *hash_seq_search(HASH_SEQ_STATUS *status);
 extern void hash_seq_term(HASH_SEQ_STATUS *status);
 extern void hash_freeze(HTAB *hashp);
+extern void hash_unfreeze(HTAB *hashp);
 extern Size hash_estimate_size(long num_entries, Size entrysize);
 extern long hash_select_dirsize(long num_entries);
 extern Size hash_get_shared_size(HASHCTL *info, int flags);
-- 
2.9.2

>From a422d7b936eb50c885e5480bc21c7bae636a559a Mon Sep 17 00:00:00 2001
From: Kyotaro 

Re: [HACKERS] Interval for launching the table sync worker

2017-04-24 Thread Masahiko Sawada
On Tue, Apr 25, 2017 at 1:42 AM, Petr Jelinek
 wrote:
> On 24/04/17 17:52, Masahiko Sawada wrote:
>> On Mon, Apr 24, 2017 at 4:41 PM, Kyotaro HORIGUCHI
>>  wrote:
>> + /*
>> + * Remove entries no longer necessary. The flag signals nothing if
>> + * subrel_local_state is not updated above. We can remove entries in
>> + * frozen hash safely.
>> + */
>> + if (local_state_updated && !wstate->alive)
>> + {
>> + hash_search(subrel_local_state, >rs.relid,
>> + HASH_REMOVE, NULL);
>> + continue;
>> + }
>>
>> IIUC since the apply worker can change the status from
>> SUBREL_STATE_SYNCWAIT to SUBREL_STATE_READY in a hash_seq_search loop
>> the table sync worker which is changed to SUBREL_STATE_READY by the
>> apply worker before updating the subrel_local_state could be remained
>> in the hash table. I think that we should scan pg_subscription_rel by
>> using only a condition "subid".
>>
>
> I don't follow this.
>

Hmm, I'd misunderstood something. It should work fine. Sorry for the noise.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-24 Thread Petr Jelinek
On 24/04/17 17:52, Masahiko Sawada wrote:
> On Mon, Apr 24, 2017 at 4:41 PM, Kyotaro HORIGUCHI
>  wrote:
> + /*
> + * Remove entries no longer necessary. The flag signals nothing if
> + * subrel_local_state is not updated above. We can remove entries in
> + * frozen hash safely.
> + */
> + if (local_state_updated && !wstate->alive)
> + {
> + hash_search(subrel_local_state, >rs.relid,
> + HASH_REMOVE, NULL);
> + continue;
> + }
> 
> IIUC since the apply worker can change the status from
> SUBREL_STATE_SYNCWAIT to SUBREL_STATE_READY in a hash_seq_search loop
> the table sync worker which is changed to SUBREL_STATE_READY by the
> apply worker before updating the subrel_local_state could be remained
> in the hash table. I think that we should scan pg_subscription_rel by
> using only a condition "subid".
> 

I don't follow this.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-24 Thread Masahiko Sawada
On Mon, Apr 24, 2017 at 4:41 PM, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
> At Sun, 23 Apr 2017 00:51:52 +0900, Masahiko Sawada  
> wrote in 
>> On Fri, Apr 21, 2017 at 11:19 PM, Masahiko Sawada  
>> wrote:
>> > On Fri, Apr 21, 2017 at 5:33 PM, Kyotaro HORIGUCHI
>> >  wrote:
>> >> Hello,
>> >>
>> >> At Thu, 20 Apr 2017 13:21:14 +0900, Masahiko Sawada 
>> >>  wrote in 
>> >> 

Re: [HACKERS] Interval for launching the table sync worker

2017-04-24 Thread Kyotaro HORIGUCHI
Hello,

At Sun, 23 Apr 2017 00:51:52 +0900, Masahiko Sawada  
wrote in 
> On Fri, Apr 21, 2017 at 11:19 PM, Masahiko Sawada  
> wrote:
> > On Fri, Apr 21, 2017 at 5:33 PM, Kyotaro HORIGUCHI
> >  wrote:
> >> Hello,
> >>
> >> At Thu, 20 Apr 2017 13:21:14 +0900, Masahiko Sawada 
> >>  wrote in 
> >> 

Re: [HACKERS] Interval for launching the table sync worker

2017-04-22 Thread Masahiko Sawada
On Fri, Apr 21, 2017 at 11:19 PM, Masahiko Sawada  wrote:
> On Fri, Apr 21, 2017 at 5:33 PM, Kyotaro HORIGUCHI
>  wrote:
>> Hello,
>>
>> At Thu, 20 Apr 2017 13:21:14 +0900, Masahiko Sawada  
>> wrote in 

Re: [HACKERS] Interval for launching the table sync worker

2017-04-21 Thread Masahiko Sawada
On Fri, Apr 21, 2017 at 5:33 PM, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
> At Thu, 20 Apr 2017 13:21:14 +0900, Masahiko Sawada  
> wrote in 

Re: [HACKERS] Interval for launching the table sync worker

2017-04-21 Thread Petr Jelinek
On 21/04/17 10:33, Kyotaro HORIGUCHI wrote:
> Hello,
> 
> At Thu, 20 Apr 2017 13:21:14 +0900, Masahiko Sawada  
> wrote in 

Re: [HACKERS] Interval for launching the table sync worker

2017-04-21 Thread Petr Jelinek
On 21/04/17 04:38, Masahiko Sawada wrote:
> On Thu, Apr 20, 2017 at 8:43 PM, Petr Jelinek
>  wrote:
>> On 20/04/17 06:21, Masahiko Sawada wrote:
>>> On Thu, Apr 20, 2017 at 12:30 AM, Petr Jelinek
>>>  wrote:
 On 19/04/17 15:57, Masahiko Sawada wrote:
> On Wed, Apr 19, 2017 at 10:07 PM, Petr Jelinek
>  wrote:
>> On 19/04/17 14:42, Masahiko Sawada wrote:
>>> On Wed, Apr 19, 2017 at 5:12 PM, Kyotaro HORIGUCHI
>>>  wrote:
 At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek 
  wrote in 
 
> On 18/04/17 18:14, Peter Eisentraut wrote:
>> On 4/18/17 11:59, Petr Jelinek wrote:
>>> Hmm if we create hashtable for this, I'd say create hashtable for 
>>> the
>>> whole table_states then. The reason why it's list now was that it 
>>> seemed
>>> unnecessary to have hashtable when it will be empty almost always 
>>> but
>>> there is no need to have both hashtable + list IMHO.

 I understant that but I also don't like the frequent palloc/pfree
 in long-lasting context and double loop like Peter.

>> The difference is that we blow away the list of states when the 
>> catalog
>> changes, but we keep the hash table with the start times around.  We
>> need two things with different life times.

 On the other hand, hash seems overdone. Addition to that, the
 hash-version leaks stale entries while subscriptions are
 modified. But vacuuming them costs high.

> Why can't we just update the hashtable based on the catalog? I mean 
> once
> the record is not needed in the list, the table has been synced so 
> there
> is no need for the timestamp either since we'll not try to start the
> worker again.
>>>
>>> I guess the table sync worker for the same table could need to be
>>> started again. For example, please image a case where the table
>>> belonging to the publication is removed from it and the corresponding
>>> subscription is refreshed, and then the table is added to it again. We
>>> have the record of the table with timestamp in the hash table when the
>>> table sync in the first time, but the table sync after refreshed could
>>> have to wait for the interval.
>>>
>>
>> But why do we want to wait in such case where user has explicitly
>> requested refresh?
>>
>
> Yeah, sorry, I meant that we don't want to wait but cannot launch the
> tablesync worker in such case.
>
> But after more thought, the latest patch Peter proposed has the same
> problem. Perhaps we need to scan always whole pg_subscription_rel and
> remove the entry if the corresponding table get synced.
>

 Yes that's what I mean by "Why can't we just update the hashtable based
 on the catalog". And if we do that then I don't understand why do we
 need both hastable and linked list if we need to update both based on
 catalog reads anyway.
>>>
>>> Thanks, I've now understood correctly. Yes, I think you're right. If
>>> we update the hash table based on the catalog whenever table state is
>>> invalidated, we don't need to have both hash table and list.
>>>
>>> BTW, in current HEAD the SUBREL_STATE_SYNCWAIT is not stored in the
>>> pg_subscription_catalog. So the following condition seems not correct.
>>> We should use "syncworker->relstate == SUBSCRIPTION_STATE_SYNCWAIT"
>>> instead?
>>>
>>> /*
>>>  * There is a worker synchronizing the relation and waiting for
>>>  * apply to do something.
>>>  */
>>> if (syncworker && rstate->state == SUBREL_STATE_SYNCWAIT)
>>> {
>>> /*
>>>  * There are three possible synchronization situations here.
>>>  *
>>>  * a) Apply is in front of the table sync: We tell the table
>>>  *sync to CATCHUP.
>>>  *
>>>  * b) Apply is behind the table sync: We tell the table sync
>>>  *to mark the table as SYNCDONE and finish.
>>>
>>>  * c) Apply and table sync are at the same position: We tell
>>>  *table sync to mark the table as READY and finish.
>>>  *
>>>  * In any case we'll need to wait for table sync to change
>>>  * the state in catalog and only then continue ourselves.
>>>  */
>>>
>>
>> Good catch. Although it's not comment that's wrong, it's the if. We
>> should not compare rstate->state but syncworker->relstate.
> 
> I've attached a patch to fix this bug.
> 


Re: [HACKERS] Interval for launching the table sync worker

2017-04-21 Thread Kyotaro HORIGUCHI
Hello,

At Thu, 20 Apr 2017 13:21:14 +0900, Masahiko Sawada  
wrote in 

Re: [HACKERS] Interval for launching the table sync worker

2017-04-20 Thread Masahiko Sawada
On Thu, Apr 20, 2017 at 8:43 PM, Petr Jelinek
 wrote:
> On 20/04/17 06:21, Masahiko Sawada wrote:
>> On Thu, Apr 20, 2017 at 12:30 AM, Petr Jelinek
>>  wrote:
>>> On 19/04/17 15:57, Masahiko Sawada wrote:
 On Wed, Apr 19, 2017 at 10:07 PM, Petr Jelinek
  wrote:
> On 19/04/17 14:42, Masahiko Sawada wrote:
>> On Wed, Apr 19, 2017 at 5:12 PM, Kyotaro HORIGUCHI
>>  wrote:
>>> At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek 
>>>  wrote in 
>>> 
 On 18/04/17 18:14, Peter Eisentraut wrote:
> On 4/18/17 11:59, Petr Jelinek wrote:
>> Hmm if we create hashtable for this, I'd say create hashtable for the
>> whole table_states then. The reason why it's list now was that it 
>> seemed
>> unnecessary to have hashtable when it will be empty almost always but
>> there is no need to have both hashtable + list IMHO.
>>>
>>> I understant that but I also don't like the frequent palloc/pfree
>>> in long-lasting context and double loop like Peter.
>>>
> The difference is that we blow away the list of states when the 
> catalog
> changes, but we keep the hash table with the start times around.  We
> need two things with different life times.
>>>
>>> On the other hand, hash seems overdone. Addition to that, the
>>> hash-version leaks stale entries while subscriptions are
>>> modified. But vacuuming them costs high.
>>>
 Why can't we just update the hashtable based on the catalog? I mean 
 once
 the record is not needed in the list, the table has been synced so 
 there
 is no need for the timestamp either since we'll not try to start the
 worker again.
>>
>> I guess the table sync worker for the same table could need to be
>> started again. For example, please image a case where the table
>> belonging to the publication is removed from it and the corresponding
>> subscription is refreshed, and then the table is added to it again. We
>> have the record of the table with timestamp in the hash table when the
>> table sync in the first time, but the table sync after refreshed could
>> have to wait for the interval.
>>
>
> But why do we want to wait in such case where user has explicitly
> requested refresh?
>

 Yeah, sorry, I meant that we don't want to wait but cannot launch the
 tablesync worker in such case.

 But after more thought, the latest patch Peter proposed has the same
 problem. Perhaps we need to scan always whole pg_subscription_rel and
 remove the entry if the corresponding table get synced.

>>>
>>> Yes that's what I mean by "Why can't we just update the hashtable based
>>> on the catalog". And if we do that then I don't understand why do we
>>> need both hastable and linked list if we need to update both based on
>>> catalog reads anyway.
>>
>> Thanks, I've now understood correctly. Yes, I think you're right. If
>> we update the hash table based on the catalog whenever table state is
>> invalidated, we don't need to have both hash table and list.
>>
>> BTW, in current HEAD the SUBREL_STATE_SYNCWAIT is not stored in the
>> pg_subscription_catalog. So the following condition seems not correct.
>> We should use "syncworker->relstate == SUBSCRIPTION_STATE_SYNCWAIT"
>> instead?
>>
>> /*
>>  * There is a worker synchronizing the relation and waiting for
>>  * apply to do something.
>>  */
>> if (syncworker && rstate->state == SUBREL_STATE_SYNCWAIT)
>> {
>> /*
>>  * There are three possible synchronization situations here.
>>  *
>>  * a) Apply is in front of the table sync: We tell the table
>>  *sync to CATCHUP.
>>  *
>>  * b) Apply is behind the table sync: We tell the table sync
>>  *to mark the table as SYNCDONE and finish.
>>
>>  * c) Apply and table sync are at the same position: We tell
>>  *table sync to mark the table as READY and finish.
>>  *
>>  * In any case we'll need to wait for table sync to change
>>  * the state in catalog and only then continue ourselves.
>>  */
>>
>
> Good catch. Although it's not comment that's wrong, it's the if. We
> should not compare rstate->state but syncworker->relstate.

I've attached a patch to fix this bug.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


bug_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing 

Re: [HACKERS] Interval for launching the table sync worker

2017-04-20 Thread Peter Eisentraut
On 4/19/17 23:02, Noah Misch wrote:
> This PostgreSQL 10 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

This is currently on the back burner relative to other issues.  Next
check-in from me by Monday.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-20 Thread Petr Jelinek
On 20/04/17 06:21, Masahiko Sawada wrote:
> On Thu, Apr 20, 2017 at 12:30 AM, Petr Jelinek
>  wrote:
>> On 19/04/17 15:57, Masahiko Sawada wrote:
>>> On Wed, Apr 19, 2017 at 10:07 PM, Petr Jelinek
>>>  wrote:
 On 19/04/17 14:42, Masahiko Sawada wrote:
> On Wed, Apr 19, 2017 at 5:12 PM, Kyotaro HORIGUCHI
>  wrote:
>> At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek 
>>  wrote in 
>> 
>>> On 18/04/17 18:14, Peter Eisentraut wrote:
 On 4/18/17 11:59, Petr Jelinek wrote:
> Hmm if we create hashtable for this, I'd say create hashtable for the
> whole table_states then. The reason why it's list now was that it 
> seemed
> unnecessary to have hashtable when it will be empty almost always but
> there is no need to have both hashtable + list IMHO.
>>
>> I understant that but I also don't like the frequent palloc/pfree
>> in long-lasting context and double loop like Peter.
>>
 The difference is that we blow away the list of states when the catalog
 changes, but we keep the hash table with the start times around.  We
 need two things with different life times.
>>
>> On the other hand, hash seems overdone. Addition to that, the
>> hash-version leaks stale entries while subscriptions are
>> modified. But vacuuming them costs high.
>>
>>> Why can't we just update the hashtable based on the catalog? I mean once
>>> the record is not needed in the list, the table has been synced so there
>>> is no need for the timestamp either since we'll not try to start the
>>> worker again.
>
> I guess the table sync worker for the same table could need to be
> started again. For example, please image a case where the table
> belonging to the publication is removed from it and the corresponding
> subscription is refreshed, and then the table is added to it again. We
> have the record of the table with timestamp in the hash table when the
> table sync in the first time, but the table sync after refreshed could
> have to wait for the interval.
>

 But why do we want to wait in such case where user has explicitly
 requested refresh?

>>>
>>> Yeah, sorry, I meant that we don't want to wait but cannot launch the
>>> tablesync worker in such case.
>>>
>>> But after more thought, the latest patch Peter proposed has the same
>>> problem. Perhaps we need to scan always whole pg_subscription_rel and
>>> remove the entry if the corresponding table get synced.
>>>
>>
>> Yes that's what I mean by "Why can't we just update the hashtable based
>> on the catalog". And if we do that then I don't understand why do we
>> need both hastable and linked list if we need to update both based on
>> catalog reads anyway.
> 
> Thanks, I've now understood correctly. Yes, I think you're right. If
> we update the hash table based on the catalog whenever table state is
> invalidated, we don't need to have both hash table and list.
> 
> BTW, in current HEAD the SUBREL_STATE_SYNCWAIT is not stored in the
> pg_subscription_catalog. So the following condition seems not correct.
> We should use "syncworker->relstate == SUBSCRIPTION_STATE_SYNCWAIT"
> instead?
> 
> /*
>  * There is a worker synchronizing the relation and waiting for
>  * apply to do something.
>  */
> if (syncworker && rstate->state == SUBREL_STATE_SYNCWAIT)
> {
> /*
>  * There are three possible synchronization situations here.
>  *
>  * a) Apply is in front of the table sync: We tell the table
>  *sync to CATCHUP.
>  *
>  * b) Apply is behind the table sync: We tell the table sync
>  *to mark the table as SYNCDONE and finish.
> 
>  * c) Apply and table sync are at the same position: We tell
>  *table sync to mark the table as READY and finish.
>  *
>  * In any case we'll need to wait for table sync to change
>  * the state in catalog and only then continue ourselves.
>  */
> 

Good catch. Although it's not comment that's wrong, it's the if. We
should not compare rstate->state but syncworker->relstate.

The reason why SUBREL_STATE_SYNCWAIT is not stored in catalog is that
we'd have to commit from sync worker which could leave table in
partially synchronized state on crash without the ability to resume
afterwards (since the slot on other side will be gone).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list 

Re: [HACKERS] Interval for launching the table sync worker

2017-04-19 Thread Masahiko Sawada
On Thu, Apr 20, 2017 at 12:30 AM, Petr Jelinek
 wrote:
> On 19/04/17 15:57, Masahiko Sawada wrote:
>> On Wed, Apr 19, 2017 at 10:07 PM, Petr Jelinek
>>  wrote:
>>> On 19/04/17 14:42, Masahiko Sawada wrote:
 On Wed, Apr 19, 2017 at 5:12 PM, Kyotaro HORIGUCHI
  wrote:
> At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek 
>  wrote in 
> 
>> On 18/04/17 18:14, Peter Eisentraut wrote:
>>> On 4/18/17 11:59, Petr Jelinek wrote:
 Hmm if we create hashtable for this, I'd say create hashtable for the
 whole table_states then. The reason why it's list now was that it 
 seemed
 unnecessary to have hashtable when it will be empty almost always but
 there is no need to have both hashtable + list IMHO.
>
> I understant that but I also don't like the frequent palloc/pfree
> in long-lasting context and double loop like Peter.
>
>>> The difference is that we blow away the list of states when the catalog
>>> changes, but we keep the hash table with the start times around.  We
>>> need two things with different life times.
>
> On the other hand, hash seems overdone. Addition to that, the
> hash-version leaks stale entries while subscriptions are
> modified. But vacuuming them costs high.
>
>> Why can't we just update the hashtable based on the catalog? I mean once
>> the record is not needed in the list, the table has been synced so there
>> is no need for the timestamp either since we'll not try to start the
>> worker again.

 I guess the table sync worker for the same table could need to be
 started again. For example, please image a case where the table
 belonging to the publication is removed from it and the corresponding
 subscription is refreshed, and then the table is added to it again. We
 have the record of the table with timestamp in the hash table when the
 table sync in the first time, but the table sync after refreshed could
 have to wait for the interval.

>>>
>>> But why do we want to wait in such case where user has explicitly
>>> requested refresh?
>>>
>>
>> Yeah, sorry, I meant that we don't want to wait but cannot launch the
>> tablesync worker in such case.
>>
>> But after more thought, the latest patch Peter proposed has the same
>> problem. Perhaps we need to scan always whole pg_subscription_rel and
>> remove the entry if the corresponding table get synced.
>>
>
> Yes that's what I mean by "Why can't we just update the hashtable based
> on the catalog". And if we do that then I don't understand why do we
> need both hastable and linked list if we need to update both based on
> catalog reads anyway.

Thanks, I've now understood correctly. Yes, I think you're right. If
we update the hash table based on the catalog whenever table state is
invalidated, we don't need to have both hash table and list.

BTW, in current HEAD the SUBREL_STATE_SYNCWAIT is not stored in the
pg_subscription_catalog. So the following condition seems not correct.
We should use "syncworker->relstate == SUBSCRIPTION_STATE_SYNCWAIT"
instead?

/*
 * There is a worker synchronizing the relation and waiting for
 * apply to do something.
 */
if (syncworker && rstate->state == SUBREL_STATE_SYNCWAIT)
{
/*
 * There are three possible synchronization situations here.
 *
 * a) Apply is in front of the table sync: We tell the table
 *sync to CATCHUP.
 *
 * b) Apply is behind the table sync: We tell the table sync
 *to mark the table as SYNCDONE and finish.

 * c) Apply and table sync are at the same position: We tell
 *table sync to mark the table as READY and finish.
 *
 * In any case we'll need to wait for table sync to change
 * the state in catalog and only then continue ourselves.
 */

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-19 Thread Noah Misch
On Sun, Apr 16, 2017 at 06:08:41AM +, Noah Misch wrote:
> On Thu, Apr 06, 2017 at 11:33:22AM +0900, Masahiko Sawada wrote:
> > While testing table sync worker for logical replication I noticed that
> > if the table sync worker of logical replication failed to insert the
> > data for whatever reason, the table sync worker process exits with
> > error. And then the main apply worker launches the table sync worker
> > again soon without interval. This routine is executed at very high
> > frequency without interval.
> > 
> > Should we do put a interval (wal_retrieve_interval or make a new GUC
> > parameter?) for launching the table sync worker?
> 
> [Action required within three days.  This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 10 open item.  Peter,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.
> 
> [1] 
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-19 Thread Petr Jelinek
On 19/04/17 15:57, Masahiko Sawada wrote:
> On Wed, Apr 19, 2017 at 10:07 PM, Petr Jelinek
>  wrote:
>> On 19/04/17 14:42, Masahiko Sawada wrote:
>>> On Wed, Apr 19, 2017 at 5:12 PM, Kyotaro HORIGUCHI
>>>  wrote:
 At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek 
  wrote in 
 
> On 18/04/17 18:14, Peter Eisentraut wrote:
>> On 4/18/17 11:59, Petr Jelinek wrote:
>>> Hmm if we create hashtable for this, I'd say create hashtable for the
>>> whole table_states then. The reason why it's list now was that it seemed
>>> unnecessary to have hashtable when it will be empty almost always but
>>> there is no need to have both hashtable + list IMHO.

 I understant that but I also don't like the frequent palloc/pfree
 in long-lasting context and double loop like Peter.

>> The difference is that we blow away the list of states when the catalog
>> changes, but we keep the hash table with the start times around.  We
>> need two things with different life times.

 On the other hand, hash seems overdone. Addition to that, the
 hash-version leaks stale entries while subscriptions are
 modified. But vacuuming them costs high.

> Why can't we just update the hashtable based on the catalog? I mean once
> the record is not needed in the list, the table has been synced so there
> is no need for the timestamp either since we'll not try to start the
> worker again.
>>>
>>> I guess the table sync worker for the same table could need to be
>>> started again. For example, please image a case where the table
>>> belonging to the publication is removed from it and the corresponding
>>> subscription is refreshed, and then the table is added to it again. We
>>> have the record of the table with timestamp in the hash table when the
>>> table sync in the first time, but the table sync after refreshed could
>>> have to wait for the interval.
>>>
>>
>> But why do we want to wait in such case where user has explicitly
>> requested refresh?
>>
> 
> Yeah, sorry, I meant that we don't want to wait but cannot launch the
> tablesync worker in such case.
> 
> But after more thought, the latest patch Peter proposed has the same
> problem. Perhaps we need to scan always whole pg_subscription_rel and
> remove the entry if the corresponding table get synced.
> 

Yes that's what I mean by "Why can't we just update the hashtable based
on the catalog". And if we do that then I don't understand why do we
need both hastable and linked list if we need to update both based on
catalog reads anyway.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-19 Thread Masahiko Sawada
On Wed, Apr 19, 2017 at 10:07 PM, Petr Jelinek
 wrote:
> On 19/04/17 14:42, Masahiko Sawada wrote:
>> On Wed, Apr 19, 2017 at 5:12 PM, Kyotaro HORIGUCHI
>>  wrote:
>>> At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek 
>>>  wrote in 
>>> 
 On 18/04/17 18:14, Peter Eisentraut wrote:
> On 4/18/17 11:59, Petr Jelinek wrote:
>> Hmm if we create hashtable for this, I'd say create hashtable for the
>> whole table_states then. The reason why it's list now was that it seemed
>> unnecessary to have hashtable when it will be empty almost always but
>> there is no need to have both hashtable + list IMHO.
>>>
>>> I understant that but I also don't like the frequent palloc/pfree
>>> in long-lasting context and double loop like Peter.
>>>
> The difference is that we blow away the list of states when the catalog
> changes, but we keep the hash table with the start times around.  We
> need two things with different life times.
>>>
>>> On the other hand, hash seems overdone. Addition to that, the
>>> hash-version leaks stale entries while subscriptions are
>>> modified. But vacuuming them costs high.
>>>
 Why can't we just update the hashtable based on the catalog? I mean once
 the record is not needed in the list, the table has been synced so there
 is no need for the timestamp either since we'll not try to start the
 worker again.
>>
>> I guess the table sync worker for the same table could need to be
>> started again. For example, please image a case where the table
>> belonging to the publication is removed from it and the corresponding
>> subscription is refreshed, and then the table is added to it again. We
>> have the record of the table with timestamp in the hash table when the
>> table sync in the first time, but the table sync after refreshed could
>> have to wait for the interval.
>>
>
> But why do we want to wait in such case where user has explicitly
> requested refresh?
>

Yeah, sorry, I meant that we don't want to wait but cannot launch the
tablesync worker in such case.

But after more thought, the latest patch Peter proposed has the same
problem. Perhaps we need to scan always whole pg_subscription_rel and
remove the entry if the corresponding table get synced.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-19 Thread Petr Jelinek
On 19/04/17 14:42, Masahiko Sawada wrote:
> On Wed, Apr 19, 2017 at 5:12 PM, Kyotaro HORIGUCHI
>  wrote:
>> At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek 
>>  wrote in 
>> 
>>> On 18/04/17 18:14, Peter Eisentraut wrote:
 On 4/18/17 11:59, Petr Jelinek wrote:
> Hmm if we create hashtable for this, I'd say create hashtable for the
> whole table_states then. The reason why it's list now was that it seemed
> unnecessary to have hashtable when it will be empty almost always but
> there is no need to have both hashtable + list IMHO.
>>
>> I understant that but I also don't like the frequent palloc/pfree
>> in long-lasting context and double loop like Peter.
>>
 The difference is that we blow away the list of states when the catalog
 changes, but we keep the hash table with the start times around.  We
 need two things with different life times.
>>
>> On the other hand, hash seems overdone. Addition to that, the
>> hash-version leaks stale entries while subscriptions are
>> modified. But vacuuming them costs high.
>>
>>> Why can't we just update the hashtable based on the catalog? I mean once
>>> the record is not needed in the list, the table has been synced so there
>>> is no need for the timestamp either since we'll not try to start the
>>> worker again.
> 
> I guess the table sync worker for the same table could need to be
> started again. For example, please image a case where the table
> belonging to the publication is removed from it and the corresponding
> subscription is refreshed, and then the table is added to it again. We
> have the record of the table with timestamp in the hash table when the
> table sync in the first time, but the table sync after refreshed could
> have to wait for the interval.
> 

But why do we want to wait in such case where user has explicitly
requested refresh?

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-19 Thread Masahiko Sawada
On Wed, Apr 19, 2017 at 5:12 PM, Kyotaro HORIGUCHI
 wrote:
> At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek 
>  wrote in 
> 
>> On 18/04/17 18:14, Peter Eisentraut wrote:
>> > On 4/18/17 11:59, Petr Jelinek wrote:
>> >> Hmm if we create hashtable for this, I'd say create hashtable for the
>> >> whole table_states then. The reason why it's list now was that it seemed
>> >> unnecessary to have hashtable when it will be empty almost always but
>> >> there is no need to have both hashtable + list IMHO.
>
> I understant that but I also don't like the frequent palloc/pfree
> in long-lasting context and double loop like Peter.
>
>> > The difference is that we blow away the list of states when the catalog
>> > changes, but we keep the hash table with the start times around.  We
>> > need two things with different life times.
>
> On the other hand, hash seems overdone. Addition to that, the
> hash-version leaks stale entries while subscriptions are
> modified. But vacuuming them costs high.
>
>> Why can't we just update the hashtable based on the catalog? I mean once
>> the record is not needed in the list, the table has been synced so there
>> is no need for the timestamp either since we'll not try to start the
>> worker again.

I guess the table sync worker for the same table could need to be
started again. For example, please image a case where the table
belonging to the publication is removed from it and the corresponding
subscription is refreshed, and then the table is added to it again. We
have the record of the table with timestamp in the hash table when the
table sync in the first time, but the table sync after refreshed could
have to wait for the interval.

>
> Considering the anticipated complexity when many subscriptions
> exist in the list, and complexity to remove stale entries in the
> hash, I'm inclining toward poroposing just to add 'worker_start'
> in pg_subscription_rel. We already have the similars in
> pg_stat_activity and pg_stat_replication.
>

I was thinking the same. But I'm concerned last start time of table
sync worker is not kind of useful information for the user and we
already have similar value in pg_stat_activity
(pg_stat_replication.backend_start is actually taken from
pg_stat_activity.backend_start). I'm not sure whether it's good idea
to show the slightly shifed timestamps having same meaning in two
system view.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-19 Thread Kyotaro HORIGUCHI
At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek  
wrote in 
> On 18/04/17 18:14, Peter Eisentraut wrote:
> > On 4/18/17 11:59, Petr Jelinek wrote:
> >> Hmm if we create hashtable for this, I'd say create hashtable for the
> >> whole table_states then. The reason why it's list now was that it seemed
> >> unnecessary to have hashtable when it will be empty almost always but
> >> there is no need to have both hashtable + list IMHO.

I understant that but I also don't like the frequent palloc/pfree
in long-lasting context and double loop like Peter.

> > The difference is that we blow away the list of states when the catalog
> > changes, but we keep the hash table with the start times around.  We
> > need two things with different life times.

On the other hand, hash seems overdone. Addition to that, the
hash-version leaks stale entries while subscriptions are
modified. But vacuuming them costs high.

> Why can't we just update the hashtable based on the catalog? I mean once
> the record is not needed in the list, the table has been synced so there
> is no need for the timestamp either since we'll not try to start the
> worker again.

Considering the anticipated complexity when many subscriptions
exist in the list, and complexity to remove stale entries in the
hash, I'm inclining toward poroposing just to add 'worker_start'
in pg_subscription_rel. We already have the similars in
pg_stat_activity and pg_stat_replication.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-18 Thread Petr Jelinek
On 18/04/17 18:14, Peter Eisentraut wrote:
> On 4/18/17 11:59, Petr Jelinek wrote:
>> Hmm if we create hashtable for this, I'd say create hashtable for the
>> whole table_states then. The reason why it's list now was that it seemed
>> unnecessary to have hashtable when it will be empty almost always but
>> there is no need to have both hashtable + list IMHO.
> 
> The difference is that we blow away the list of states when the catalog
> changes, but we keep the hash table with the start times around.  We
> need two things with different life times.
> 

Why can't we just update the hashtable based on the catalog? I mean once
the record is not needed in the list, the table has been synced so there
is no need for the timestamp either since we'll not try to start the
worker again.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-18 Thread Peter Eisentraut
On 4/18/17 11:59, Petr Jelinek wrote:
> Hmm if we create hashtable for this, I'd say create hashtable for the
> whole table_states then. The reason why it's list now was that it seemed
> unnecessary to have hashtable when it will be empty almost always but
> there is no need to have both hashtable + list IMHO.

The difference is that we blow away the list of states when the catalog
changes, but we keep the hash table with the start times around.  We
need two things with different life times.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-18 Thread Masahiko Sawada
On Tue, Apr 18, 2017 at 11:22 PM, Peter Eisentraut
 wrote:
> On 4/13/17 06:23, Masahiko Sawada wrote:
>> Attached the latest patch. It didn't actually necessary to change
>> GetSubscriptionNotReadyRelations. I just changed the logic refreshing
>> the sync table state list.
>
> I think this was the right direction, but then I got worried about
> having a loop within a loop to copy over the last start times.  If you
> have very many tables, that could be a big nested loop.
>
> Here is an alternative proposal to store the last start times in a hash
> table.
>

If we use wal_retrieve_retry_interval for the table sync worker, I
think we need to update the documentation as well. Currently the
documentation mentions that a bit, but since
wal_retrieve_retry_interval will be used at three different places for
different reason it would confuse the user.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-18 Thread Petr Jelinek
On 18/04/17 16:22, Peter Eisentraut wrote:
> On 4/13/17 06:23, Masahiko Sawada wrote:
>> Attached the latest patch. It didn't actually necessary to change
>> GetSubscriptionNotReadyRelations. I just changed the logic refreshing
>> the sync table state list.
> 
> I think this was the right direction, but then I got worried about
> having a loop within a loop to copy over the last start times.  If you
> have very many tables, that could be a big nested loop.
> 
> Here is an alternative proposal to store the last start times in a hash
> table.
> 

Hmm if we create hashtable for this, I'd say create hashtable for the
whole table_states then. The reason why it's list now was that it seemed
unnecessary to have hashtable when it will be empty almost always but
there is no need to have both hashtable + list IMHO.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-18 Thread Peter Eisentraut
On 4/13/17 06:23, Masahiko Sawada wrote:
> Attached the latest patch. It didn't actually necessary to change
> GetSubscriptionNotReadyRelations. I just changed the logic refreshing
> the sync table state list.

I think this was the right direction, but then I got worried about
having a loop within a loop to copy over the last start times.  If you
have very many tables, that could be a big nested loop.

Here is an alternative proposal to store the last start times in a hash
table.

(We also might want to lower the interval for the test suite, because
there are intentional failures in there, and this patch or one like it
will cause the tests to run a few seconds longer.)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0001-Wait-between-tablesync-worker-restarts.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-16 Thread Noah Misch
On Thu, Apr 06, 2017 at 11:33:22AM +0900, Masahiko Sawada wrote:
> While testing table sync worker for logical replication I noticed that
> if the table sync worker of logical replication failed to insert the
> data for whatever reason, the table sync worker process exits with
> error. And then the main apply worker launches the table sync worker
> again soon without interval. This routine is executed at very high
> frequency without interval.
> 
> Should we do put a interval (wal_retrieve_interval or make a new GUC
> parameter?) for launching the table sync worker?

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Peter,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-14 Thread Peter Eisentraut
On 4/13/17 18:09, Petr Jelinek wrote:
> It would have the
> disadvantage that if some tables were consistently failing, no other
> tables could get synchronized as the amount of workers is limited. OTOH
> the approach chosen in the patch will first try all tables and only then
> start rate limiting, not quite sure which is better.

I think the latter is better.

One of the scenarios cited originally somewhere was one table failing
because of an encoding issue.  So it's useful to allow other tables to
continue.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-14 Thread Peter Eisentraut
On 4/13/17 06:23, Masahiko Sawada wrote:
> Attached the latest patch. It didn't actually necessary to change
> GetSubscriptionNotReadyRelations. I just changed the logic refreshing
> the sync table state list.
> Please give me feedback.

I think this is a reasonable direction, but do we really need
table_sync_retry_interval separate from wal_retrieve_retry_interval?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-14 Thread Masahiko Sawada
On Fri, Apr 14, 2017 at 9:14 PM, Petr Jelinek
 wrote:
> On 14/04/17 12:18, Masahiko Sawada wrote:
>> On Fri, Apr 14, 2017 at 7:09 AM, Petr Jelinek
>>  wrote:
>>> On 13/04/17 12:23, Masahiko Sawada wrote:
 On Thu, Apr 13, 2017 at 11:53 AM, Masahiko Sawada  
 wrote:
> On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut
>  wrote:
>> On 4/12/17 00:48, Masahiko Sawada wrote:
>>> On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut
 Perhaps instead of a global last_start_time, we store a per relation
 last_start_time in SubscriptionRelState?
>>>
>>> I was thinking the same. But a problem is that the list of
>>> SubscriptionRelState is refreshed whenever the syncing table state
>>> becomes invalid (table_state_valid = false). I guess we need to
>>> improve these logic including GetSubscriptionNotReadyRelations().
>>
>> The table states are invalidated on a syscache callback from
>> pg_subscription_rel, which happens roughly speaking when a table
>> finishes the initial sync.  So if we're worried about failing tablesync
>> workers relaunching to quickly, this would only be a problem if a
>> tablesync of another table finishes right in that restart window.  That
>> doesn't seem a terrible issue to me.
>>
>
> I think the table states are invalidated whenever the table sync
> worker starts, because the table sync worker updates its status of
> pg_subscription_rel and commits it before starting actual copy. So we
> cannot rely on that. I thought we can store last_start_time into
> pg_subscription_rel but it might be overkill. I'm now thinking to
> change GetSubscriptionNotReadyRealtions so that last_start_time in
> SubscriptionRelState is taken over to new list.
>

 Attached the latest patch. It didn't actually necessary to change
 GetSubscriptionNotReadyRelations. I just changed the logic refreshing
 the sync table state list.
 Please give me feedback.

>>>
>>> Hmm this might work. Although I was actually wondering if we could store
>>> the last start timestamp in the worker shared memory and do some magic
>>> with that (ie, not clearing subid and relid and try to then do rate
>>> limiting based on subid+relid+timestamp stored in shmem). That would
>>> then work same way for the main apply workers as well. It would have the
>>> disadvantage that if some tables were consistently failing, no other
>>> tables could get synchronized as the amount of workers is limited.
>>
>> Hmm I guess that it's not a good design that a table sync worker and a
>> apply worker for a relation takes sole possession of a worker slot
>> until it successes. It would bother each other.
>>
>
> Not sure what you mean by apply worker for relation, I meant the main
> subscription apply worker, the "per relation apply worker" is same as
> tablesync worker.

Oops, I made a mistake. I meant just a apply worker.

>
> Thinking about the possible failure scenarios more, I guess you are
> right. Because if there is "global" event for the publisher/subscriber
> connection (ie network goes down or something) it will affect the main
> worker as well so it won't launch anything. If there is table specific
> issue it will affect only that table.
>
> The corner-cases with doing it per table where we launch tablesync
> needlessly are pretty much just a) connection limit on publisher
> reached, b) slot limit  on publisher reached, that's probably okay. We
> might be able to catch these two specifically and do some global
> limiting based on those (something like your original patch except the
> start-time global variable would only be set on specific error and
> stored in shmem), but that does not need to be solved immediately.
>
> --
>   Petr Jelinek  http://www.2ndQuadrant.com/
>   PostgreSQL Development, 24x7 Support, Training & Services


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-14 Thread Petr Jelinek
On 14/04/17 12:18, Masahiko Sawada wrote:
> On Fri, Apr 14, 2017 at 7:09 AM, Petr Jelinek
>  wrote:
>> On 13/04/17 12:23, Masahiko Sawada wrote:
>>> On Thu, Apr 13, 2017 at 11:53 AM, Masahiko Sawada  
>>> wrote:
 On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut
  wrote:
> On 4/12/17 00:48, Masahiko Sawada wrote:
>> On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut
>>> Perhaps instead of a global last_start_time, we store a per relation
>>> last_start_time in SubscriptionRelState?
>>
>> I was thinking the same. But a problem is that the list of
>> SubscriptionRelState is refreshed whenever the syncing table state
>> becomes invalid (table_state_valid = false). I guess we need to
>> improve these logic including GetSubscriptionNotReadyRelations().
>
> The table states are invalidated on a syscache callback from
> pg_subscription_rel, which happens roughly speaking when a table
> finishes the initial sync.  So if we're worried about failing tablesync
> workers relaunching to quickly, this would only be a problem if a
> tablesync of another table finishes right in that restart window.  That
> doesn't seem a terrible issue to me.
>

 I think the table states are invalidated whenever the table sync
 worker starts, because the table sync worker updates its status of
 pg_subscription_rel and commits it before starting actual copy. So we
 cannot rely on that. I thought we can store last_start_time into
 pg_subscription_rel but it might be overkill. I'm now thinking to
 change GetSubscriptionNotReadyRealtions so that last_start_time in
 SubscriptionRelState is taken over to new list.

>>>
>>> Attached the latest patch. It didn't actually necessary to change
>>> GetSubscriptionNotReadyRelations. I just changed the logic refreshing
>>> the sync table state list.
>>> Please give me feedback.
>>>
>>
>> Hmm this might work. Although I was actually wondering if we could store
>> the last start timestamp in the worker shared memory and do some magic
>> with that (ie, not clearing subid and relid and try to then do rate
>> limiting based on subid+relid+timestamp stored in shmem). That would
>> then work same way for the main apply workers as well. It would have the
>> disadvantage that if some tables were consistently failing, no other
>> tables could get synchronized as the amount of workers is limited.
> 
> Hmm I guess that it's not a good design that a table sync worker and a
> apply worker for a relation takes sole possession of a worker slot
> until it successes. It would bother each other.
> 

Not sure what you mean by apply worker for relation, I meant the main
subscription apply worker, the "per relation apply worker" is same as
tablesync worker.

Thinking about the possible failure scenarios more, I guess you are
right. Because if there is "global" event for the publisher/subscriber
connection (ie network goes down or something) it will affect the main
worker as well so it won't launch anything. If there is table specific
issue it will affect only that table.

The corner-cases with doing it per table where we launch tablesync
needlessly are pretty much just a) connection limit on publisher
reached, b) slot limit  on publisher reached, that's probably okay. We
might be able to catch these two specifically and do some global
limiting based on those (something like your original patch except the
start-time global variable would only be set on specific error and
stored in shmem), but that does not need to be solved immediately.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-14 Thread Masahiko Sawada
On Fri, Apr 14, 2017 at 7:09 AM, Petr Jelinek
 wrote:
> On 13/04/17 12:23, Masahiko Sawada wrote:
>> On Thu, Apr 13, 2017 at 11:53 AM, Masahiko Sawada  
>> wrote:
>>> On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut
>>>  wrote:
 On 4/12/17 00:48, Masahiko Sawada wrote:
> On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut
>> Perhaps instead of a global last_start_time, we store a per relation
>> last_start_time in SubscriptionRelState?
>
> I was thinking the same. But a problem is that the list of
> SubscriptionRelState is refreshed whenever the syncing table state
> becomes invalid (table_state_valid = false). I guess we need to
> improve these logic including GetSubscriptionNotReadyRelations().

 The table states are invalidated on a syscache callback from
 pg_subscription_rel, which happens roughly speaking when a table
 finishes the initial sync.  So if we're worried about failing tablesync
 workers relaunching to quickly, this would only be a problem if a
 tablesync of another table finishes right in that restart window.  That
 doesn't seem a terrible issue to me.

>>>
>>> I think the table states are invalidated whenever the table sync
>>> worker starts, because the table sync worker updates its status of
>>> pg_subscription_rel and commits it before starting actual copy. So we
>>> cannot rely on that. I thought we can store last_start_time into
>>> pg_subscription_rel but it might be overkill. I'm now thinking to
>>> change GetSubscriptionNotReadyRealtions so that last_start_time in
>>> SubscriptionRelState is taken over to new list.
>>>
>>
>> Attached the latest patch. It didn't actually necessary to change
>> GetSubscriptionNotReadyRelations. I just changed the logic refreshing
>> the sync table state list.
>> Please give me feedback.
>>
>
> Hmm this might work. Although I was actually wondering if we could store
> the last start timestamp in the worker shared memory and do some magic
> with that (ie, not clearing subid and relid and try to then do rate
> limiting based on subid+relid+timestamp stored in shmem). That would
> then work same way for the main apply workers as well. It would have the
> disadvantage that if some tables were consistently failing, no other
> tables could get synchronized as the amount of workers is limited.

Hmm I guess that it's not a good design that a table sync worker and a
apply worker for a relation takes sole possession of a worker slot
until it successes. It would bother each other.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-14 Thread Kyotaro HORIGUCHI
At Thu, 13 Apr 2017 20:02:33 +0200, Petr Jelinek  
wrote in 
> >> Although this is not a problem of this patch, this is a problem
> >> generally.
> 
> Huh? We explicitly switch to CacheMemoryContext before pallocing
> anything that should survive long term.

Ah.. yes, sorry for the noise.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-13 Thread Petr Jelinek
On 13/04/17 12:23, Masahiko Sawada wrote:
> On Thu, Apr 13, 2017 at 11:53 AM, Masahiko Sawada  
> wrote:
>> On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut
>>  wrote:
>>> On 4/12/17 00:48, Masahiko Sawada wrote:
 On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut
> Perhaps instead of a global last_start_time, we store a per relation
> last_start_time in SubscriptionRelState?

 I was thinking the same. But a problem is that the list of
 SubscriptionRelState is refreshed whenever the syncing table state
 becomes invalid (table_state_valid = false). I guess we need to
 improve these logic including GetSubscriptionNotReadyRelations().
>>>
>>> The table states are invalidated on a syscache callback from
>>> pg_subscription_rel, which happens roughly speaking when a table
>>> finishes the initial sync.  So if we're worried about failing tablesync
>>> workers relaunching to quickly, this would only be a problem if a
>>> tablesync of another table finishes right in that restart window.  That
>>> doesn't seem a terrible issue to me.
>>>
>>
>> I think the table states are invalidated whenever the table sync
>> worker starts, because the table sync worker updates its status of
>> pg_subscription_rel and commits it before starting actual copy. So we
>> cannot rely on that. I thought we can store last_start_time into
>> pg_subscription_rel but it might be overkill. I'm now thinking to
>> change GetSubscriptionNotReadyRealtions so that last_start_time in
>> SubscriptionRelState is taken over to new list.
>>
> 
> Attached the latest patch. It didn't actually necessary to change
> GetSubscriptionNotReadyRelations. I just changed the logic refreshing
> the sync table state list.
> Please give me feedback.
> 

Hmm this might work. Although I was actually wondering if we could store
the last start timestamp in the worker shared memory and do some magic
with that (ie, not clearing subid and relid and try to then do rate
limiting based on subid+relid+timestamp stored in shmem). That would
then work same way for the main apply workers as well. It would have the
disadvantage that if some tables were consistently failing, no other
tables could get synchronized as the amount of workers is limited. OTOH
the approach chosen in the patch will first try all tables and only then
start rate limiting, not quite sure which is better.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-13 Thread Petr Jelinek
On 13/04/17 13:01, Kyotaro HORIGUCHI wrote:
> Ouch! I replied to wrong mail.
> 
> At Thu, 13 Apr 2017 19:55:04 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
>  wrote in 
> <20170413.195504.89348773.horiguchi.kyot...@lab.ntt.co.jp>
>> I confused sync and apply workers.
>> sync worker failure at start causes immediate retries.
>>
>> At Thu, 13 Apr 2017 11:53:27 +0900, Masahiko Sawada  
>> wrote in 
>>> On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut
>>>  wrote:
 On 4/12/17 00:48, Masahiko Sawada wrote:
> On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut
>> Perhaps instead of a global last_start_time, we store a per relation
>> last_start_time in SubscriptionRelState?
>
> I was thinking the same. But a problem is that the list of
> SubscriptionRelState is refreshed whenever the syncing table state
> becomes invalid (table_state_valid = false). I guess we need to
> improve these logic including GetSubscriptionNotReadyRelations().

 The table states are invalidated on a syscache callback from
 pg_subscription_rel, which happens roughly speaking when a table
 finishes the initial sync.  So if we're worried about failing tablesync
 workers relaunching to quickly, this would only be a problem if a
 tablesync of another table finishes right in that restart window.  That
 doesn't seem a terrible issue to me.

>>>
>>> I think the table states are invalidated whenever the table sync
>>> worker starts, because the table sync worker updates its status of
>>> pg_subscription_rel and commits it before starting actual copy. So we
>>> cannot rely on that. I thought we can store last_start_time into
>>> pg_subscription_rel but it might be overkill. I'm now thinking to
>>> change GetSubscriptionNotReadyRealtions so that last_start_time in
>>> SubscriptionRelState is taken over to new list.
> 
> The right target of "This" below is found at the following URL.
> 
> https://www.postgresql.org/message-id/CAD21AoBt_XUdppddFak661_LBM2t3CfK52aLKHG%2Bekd7SkzLmg%40mail.gmail.com
> 
>> This resolves the problem but, if I understand correctly, the
>> many pallocs in process_syncing_tables_for_apply() is working on
>> ApplyContext and the context is reset before the next visit here
>> (in LogicalRepApplyLoop).
>>
>> Although this is not a problem of this patch, this is a problem
>> generally.

Huh? We explicitly switch to CacheMemoryContext before pallocing
anything that should survive long term.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-13 Thread Kyotaro HORIGUCHI
Ouch! I replied to wrong mail.

At Thu, 13 Apr 2017 19:55:04 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20170413.195504.89348773.horiguchi.kyot...@lab.ntt.co.jp>
> I confused sync and apply workers.
> sync worker failure at start causes immediate retries.
> 
> At Thu, 13 Apr 2017 11:53:27 +0900, Masahiko Sawada  
> wrote in 
> > On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut
> >  wrote:
> > > On 4/12/17 00:48, Masahiko Sawada wrote:
> > >> On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut
> > >>> Perhaps instead of a global last_start_time, we store a per relation
> > >>> last_start_time in SubscriptionRelState?
> > >>
> > >> I was thinking the same. But a problem is that the list of
> > >> SubscriptionRelState is refreshed whenever the syncing table state
> > >> becomes invalid (table_state_valid = false). I guess we need to
> > >> improve these logic including GetSubscriptionNotReadyRelations().
> > >
> > > The table states are invalidated on a syscache callback from
> > > pg_subscription_rel, which happens roughly speaking when a table
> > > finishes the initial sync.  So if we're worried about failing tablesync
> > > workers relaunching to quickly, this would only be a problem if a
> > > tablesync of another table finishes right in that restart window.  That
> > > doesn't seem a terrible issue to me.
> > >
> > 
> > I think the table states are invalidated whenever the table sync
> > worker starts, because the table sync worker updates its status of
> > pg_subscription_rel and commits it before starting actual copy. So we
> > cannot rely on that. I thought we can store last_start_time into
> > pg_subscription_rel but it might be overkill. I'm now thinking to
> > change GetSubscriptionNotReadyRealtions so that last_start_time in
> > SubscriptionRelState is taken over to new list.

The right target of "This" below is found at the following URL.

https://www.postgresql.org/message-id/CAD21AoBt_XUdppddFak661_LBM2t3CfK52aLKHG%2Bekd7SkzLmg%40mail.gmail.com

> This resolves the problem but, if I understand correctly, the
> many pallocs in process_syncing_tables_for_apply() is working on
> ApplyContext and the context is reset before the next visit here
> (in LogicalRepApplyLoop).
> 
> Although this is not a problem of this patch, this is a problem
> generally.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-13 Thread Kyotaro HORIGUCHI
I confused sync and apply workers.
sync worker failure at start causes immediate retries.

At Thu, 13 Apr 2017 11:53:27 +0900, Masahiko Sawada  
wrote in 
> On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut
>  wrote:
> > On 4/12/17 00:48, Masahiko Sawada wrote:
> >> On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut
> >>> Perhaps instead of a global last_start_time, we store a per relation
> >>> last_start_time in SubscriptionRelState?
> >>
> >> I was thinking the same. But a problem is that the list of
> >> SubscriptionRelState is refreshed whenever the syncing table state
> >> becomes invalid (table_state_valid = false). I guess we need to
> >> improve these logic including GetSubscriptionNotReadyRelations().
> >
> > The table states are invalidated on a syscache callback from
> > pg_subscription_rel, which happens roughly speaking when a table
> > finishes the initial sync.  So if we're worried about failing tablesync
> > workers relaunching to quickly, this would only be a problem if a
> > tablesync of another table finishes right in that restart window.  That
> > doesn't seem a terrible issue to me.
> >
> 
> I think the table states are invalidated whenever the table sync
> worker starts, because the table sync worker updates its status of
> pg_subscription_rel and commits it before starting actual copy. So we
> cannot rely on that. I thought we can store last_start_time into
> pg_subscription_rel but it might be overkill. I'm now thinking to
> change GetSubscriptionNotReadyRealtions so that last_start_time in
> SubscriptionRelState is taken over to new list.

This resolves the problem but, if I understand correctly, the
many pallocs in process_syncing_tables_for_apply() is working on
ApplyContext and the context is reset before the next visit here
(in LogicalRepApplyLoop).

Although this is not a problem of this patch, this is a problem
generally.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-13 Thread Masahiko Sawada
On Thu, Apr 13, 2017 at 11:53 AM, Masahiko Sawada  wrote:
> On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut
>  wrote:
>> On 4/12/17 00:48, Masahiko Sawada wrote:
>>> On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut
 Perhaps instead of a global last_start_time, we store a per relation
 last_start_time in SubscriptionRelState?
>>>
>>> I was thinking the same. But a problem is that the list of
>>> SubscriptionRelState is refreshed whenever the syncing table state
>>> becomes invalid (table_state_valid = false). I guess we need to
>>> improve these logic including GetSubscriptionNotReadyRelations().
>>
>> The table states are invalidated on a syscache callback from
>> pg_subscription_rel, which happens roughly speaking when a table
>> finishes the initial sync.  So if we're worried about failing tablesync
>> workers relaunching to quickly, this would only be a problem if a
>> tablesync of another table finishes right in that restart window.  That
>> doesn't seem a terrible issue to me.
>>
>
> I think the table states are invalidated whenever the table sync
> worker starts, because the table sync worker updates its status of
> pg_subscription_rel and commits it before starting actual copy. So we
> cannot rely on that. I thought we can store last_start_time into
> pg_subscription_rel but it might be overkill. I'm now thinking to
> change GetSubscriptionNotReadyRealtions so that last_start_time in
> SubscriptionRelState is taken over to new list.
>

Attached the latest patch. It didn't actually necessary to change
GetSubscriptionNotReadyRelations. I just changed the logic refreshing
the sync table state list.
Please give me feedback.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


table_sync_retry_interval_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-12 Thread Masahiko Sawada
On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut
 wrote:
> On 4/12/17 00:48, Masahiko Sawada wrote:
>> On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut
>>> Perhaps instead of a global last_start_time, we store a per relation
>>> last_start_time in SubscriptionRelState?
>>
>> I was thinking the same. But a problem is that the list of
>> SubscriptionRelState is refreshed whenever the syncing table state
>> becomes invalid (table_state_valid = false). I guess we need to
>> improve these logic including GetSubscriptionNotReadyRelations().
>
> The table states are invalidated on a syscache callback from
> pg_subscription_rel, which happens roughly speaking when a table
> finishes the initial sync.  So if we're worried about failing tablesync
> workers relaunching to quickly, this would only be a problem if a
> tablesync of another table finishes right in that restart window.  That
> doesn't seem a terrible issue to me.
>

I think the table states are invalidated whenever the table sync
worker starts, because the table sync worker updates its status of
pg_subscription_rel and commits it before starting actual copy. So we
cannot rely on that. I thought we can store last_start_time into
pg_subscription_rel but it might be overkill. I'm now thinking to
change GetSubscriptionNotReadyRealtions so that last_start_time in
SubscriptionRelState is taken over to new list.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-12 Thread Peter Eisentraut
On 4/12/17 00:48, Masahiko Sawada wrote:
> On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut
>> Perhaps instead of a global last_start_time, we store a per relation
>> last_start_time in SubscriptionRelState?
> 
> I was thinking the same. But a problem is that the list of
> SubscriptionRelState is refreshed whenever the syncing table state
> becomes invalid (table_state_valid = false). I guess we need to
> improve these logic including GetSubscriptionNotReadyRelations().

The table states are invalidated on a syscache callback from
pg_subscription_rel, which happens roughly speaking when a table
finishes the initial sync.  So if we're worried about failing tablesync
workers relaunching to quickly, this would only be a problem if a
tablesync of another table finishes right in that restart window.  That
doesn't seem a terrible issue to me.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-11 Thread Masahiko Sawada
On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut
 wrote:
> On 4/10/17 08:10, Petr Jelinek wrote:
>> I don't think solution is quite this simple. This will cause all table
>> sync workers to be delayed which means concurrency will suffer and the
>> initial sync of all tables will take much longer especially if there is
>> little data. We need a way to either detect if we are launching same
>> worker that was already launched before, or alternatively if we are
>> launching crashed worker and only then apply the delay.
>
> Perhaps instead of a global last_start_time, we store a per relation
> last_start_time in SubscriptionRelState?

I was thinking the same. But a problem is that the list of
SubscriptionRelState is refreshed whenever the syncing table state
becomes invalid (table_state_valid = false). I guess we need to
improve these logic including GetSubscriptionNotReadyRelations().

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-11 Thread Peter Eisentraut
On 4/10/17 08:10, Petr Jelinek wrote:
> I don't think solution is quite this simple. This will cause all table
> sync workers to be delayed which means concurrency will suffer and the
> initial sync of all tables will take much longer especially if there is
> little data. We need a way to either detect if we are launching same
> worker that was already launched before, or alternatively if we are
> launching crashed worker and only then apply the delay.

Perhaps instead of a global last_start_time, we store a per relation
last_start_time in SubscriptionRelState?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-10 Thread Petr Jelinek
On 10/04/17 11:02, Masahiko Sawada wrote:
> On Sun, Apr 9, 2017 at 6:25 PM, Masahiko Sawada  wrote:
>> On Sat, Apr 8, 2017 at 8:13 AM, Peter Eisentraut
>>  wrote:
>>> On 4/7/17 01:10, Masahiko Sawada wrote:
 It's not critical but it could be problem. So I thought we should fix
 it before the PostgreSQL 10 release. If it's not appropriate as an
 open item I'll remove it.
>>>
>>> You wrote that you "sent" a patch, but I don't see a patch anywhere.
>>>
>>> I think a nonintrusive patch for this could be considered.
>>
>> Oops, I made a mistake. I'll send a patch tomorrow.
>>
> 
> I've attached the patch. This patch introduces GUC parameter
> table_sync_retry_interval which controls the interval of launching the
> table sync worker process.
> 

I don't think solution is quite this simple. This will cause all table
sync workers to be delayed which means concurrency will suffer and the
initial sync of all tables will take much longer especially if there is
little data. We need a way to either detect if we are launching same
worker that was already launched before, or alternatively if we are
launching crashed worker and only then apply the delay.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-10 Thread Masahiko Sawada
On Sun, Apr 9, 2017 at 6:25 PM, Masahiko Sawada  wrote:
> On Sat, Apr 8, 2017 at 8:13 AM, Peter Eisentraut
>  wrote:
>> On 4/7/17 01:10, Masahiko Sawada wrote:
>>> It's not critical but it could be problem. So I thought we should fix
>>> it before the PostgreSQL 10 release. If it's not appropriate as an
>>> open item I'll remove it.
>>
>> You wrote that you "sent" a patch, but I don't see a patch anywhere.
>>
>> I think a nonintrusive patch for this could be considered.
>
> Oops, I made a mistake. I'll send a patch tomorrow.
>

I've attached the patch. This patch introduces GUC parameter
table_sync_retry_interval which controls the interval of launching the
table sync worker process.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


table_sync_retry_interval.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-09 Thread Masahiko Sawada
On Sat, Apr 8, 2017 at 8:13 AM, Peter Eisentraut
 wrote:
> On 4/7/17 01:10, Masahiko Sawada wrote:
>> It's not critical but it could be problem. So I thought we should fix
>> it before the PostgreSQL 10 release. If it's not appropriate as an
>> open item I'll remove it.
>
> You wrote that you "sent" a patch, but I don't see a patch anywhere.
>
> I think a nonintrusive patch for this could be considered.

Oops, I made a mistake. I'll send a patch tomorrow.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-07 Thread Peter Eisentraut
On 4/7/17 01:10, Masahiko Sawada wrote:
> It's not critical but it could be problem. So I thought we should fix
> it before the PostgreSQL 10 release. If it's not appropriate as an
> open item I'll remove it.

You wrote that you "sent" a patch, but I don't see a patch anywhere.

I think a nonintrusive patch for this could be considered.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-06 Thread Masahiko Sawada
On Fri, Apr 7, 2017 at 1:23 PM, Kyotaro HORIGUCHI
 wrote:
> At Thu, 6 Apr 2017 18:42:37 +0200, Petr Jelinek 
>  wrote in 
> <8c7afb37-be73-c6bd-80bc-e87522f04...@2ndquadrant.com>
>> On 06/04/17 16:44, Masahiko Sawada wrote:
>> > On Thu, Apr 6, 2017 at 9:06 PM, Kyotaro HORIGUCHI
>> >  wrote:
>> >>> I prefer subscription option than GUC. Something like following.
>> >>>
>> >>> CREATE SUBSCRIPTION s1 CONNECTION 'blah'
>> >>>PUBLICATION p1 WITH (noreconnect = true);
>> >>>
>> >>> Stored in pg_subscription?
>>
>> I don't think that's a very good solution, you'd lose replication on
>> every network glitch, upstream server restart, etc.
>
> Yes, you're right. This would work if apply worker distinguishes
> permanent error. But it is overkill so far.
>
>> > I've added this as an open item, and sent a patch for this.
>> >
>>
>> I am not exactly sure what's the open item from this thread. To use the
>> wal_retrieve_interval to limit table sync restarts?
>
> It's not me. I also don't think this critical.
>

Thank you for the comment.

It's not critical but it could be problem. So I thought we should fix
it before the PostgreSQL 10 release. If it's not appropriate as an
open item I'll remove it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-06 Thread Kyotaro HORIGUCHI
At Thu, 6 Apr 2017 18:42:37 +0200, Petr Jelinek  
wrote in <8c7afb37-be73-c6bd-80bc-e87522f04...@2ndquadrant.com>
> On 06/04/17 16:44, Masahiko Sawada wrote:
> > On Thu, Apr 6, 2017 at 9:06 PM, Kyotaro HORIGUCHI
> >  wrote:
> >>> I prefer subscription option than GUC. Something like following.
> >>>
> >>> CREATE SUBSCRIPTION s1 CONNECTION 'blah'
> >>>PUBLICATION p1 WITH (noreconnect = true);
> >>>
> >>> Stored in pg_subscription?
> 
> I don't think that's a very good solution, you'd lose replication on
> every network glitch, upstream server restart, etc.

Yes, you're right. This would work if apply worker distinguishes
permanent error. But it is overkill so far.

> > I've added this as an open item, and sent a patch for this.
> > 
> 
> I am not exactly sure what's the open item from this thread. To use the
> wal_retrieve_interval to limit table sync restarts?

It's not me. I also don't think this critical.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-06 Thread Petr Jelinek
On 06/04/17 16:44, Masahiko Sawada wrote:
> On Thu, Apr 6, 2017 at 9:06 PM, Kyotaro HORIGUCHI
>  wrote:
>> At Thu, 06 Apr 2017 17:02:14 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
>>  wrote in 
>> <20170406.170214.263553093.horiguchi.kyot...@lab.ntt.co.jp>
>>> At Thu, 6 Apr 2017 16:15:33 +0900, Masahiko Sawada  
>>> wrote in 
>>> 
 On Thu, Apr 6, 2017 at 3:45 PM, Kyotaro HORIGUCHI
  wrote:
> I was thinking the same.
>
> At Thu, 6 Apr 2017 11:33:22 +0900, Masahiko Sawada 
>  wrote in 
> 

Re: [HACKERS] Interval for launching the table sync worker

2017-04-06 Thread Masahiko Sawada
On Thu, Apr 6, 2017 at 9:06 PM, Kyotaro HORIGUCHI
 wrote:
> At Thu, 06 Apr 2017 17:02:14 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
>  wrote in 
> <20170406.170214.263553093.horiguchi.kyot...@lab.ntt.co.jp>
>> At Thu, 6 Apr 2017 16:15:33 +0900, Masahiko Sawada  
>> wrote in 
>> > On Thu, Apr 6, 2017 at 3:45 PM, Kyotaro HORIGUCHI
>> >  wrote:
>> > > I was thinking the same.
>> > >
>> > > At Thu, 6 Apr 2017 11:33:22 +0900, Masahiko Sawada 
>> > >  wrote in 
>> > > 

Re: [HACKERS] Interval for launching the table sync worker

2017-04-06 Thread Kyotaro HORIGUCHI
At Thu, 06 Apr 2017 17:02:14 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20170406.170214.263553093.horiguchi.kyot...@lab.ntt.co.jp>
> At Thu, 6 Apr 2017 16:15:33 +0900, Masahiko Sawada  
> wrote in 
> > On Thu, Apr 6, 2017 at 3:45 PM, Kyotaro HORIGUCHI
> >  wrote:
> > > I was thinking the same.
> > >
> > > At Thu, 6 Apr 2017 11:33:22 +0900, Masahiko Sawada 
> > >  wrote in 
> > > 

Re: [HACKERS] Interval for launching the table sync worker

2017-04-06 Thread Kyotaro HORIGUCHI
At Thu, 6 Apr 2017 16:15:33 +0900, Masahiko Sawada  
wrote in 
> On Thu, Apr 6, 2017 at 3:45 PM, Kyotaro HORIGUCHI
>  wrote:
> > I was thinking the same.
> >
> > At Thu, 6 Apr 2017 11:33:22 +0900, Masahiko Sawada  
> > wrote in 
> > 

Re: [HACKERS] Interval for launching the table sync worker

2017-04-06 Thread Masahiko Sawada
On Thu, Apr 6, 2017 at 3:45 PM, Kyotaro HORIGUCHI
 wrote:
> I was thinking the same.
>
> At Thu, 6 Apr 2017 11:33:22 +0900, Masahiko Sawada  
> wrote in 

Re: [HACKERS] Interval for launching the table sync worker

2017-04-06 Thread Kyotaro HORIGUCHI
I was thinking the same.

At Thu, 6 Apr 2017 11:33:22 +0900, Masahiko Sawada  
wrote in