from:"Masahiko Sawada"

[HACKERS] ginInsertCleanup called from vacuum could still miss tuples to be deleted

2017-11-13 Thread Masahiko Sawada

Hi,

Commit e2c79e14 prevented multiple cleanup process for pending list in
GIN index. But I think that there is still possibility that vacuum
could miss tuples to be deleted if someone else is cleaning up the
pending list.

In ginInsertCleanup(), we lock the GIN meta page by LockPage and could
wait for the concurrent cleaning up process if stats == NULL. And the
source code comment says that this happen is when ginINsertCleanup is
called by [auto]vacuum/analyze or gin_clean_pending_list(). I agree
with this behavior. However, looking at the callers the stats is NULL
only either if pending list exceeds to threshold during insertions or
if only analyzing is performed by an autovacum worker or ANALYZE
command. So I think we should inVacuum = (stats != NULL) instead.
Also, we might want autoanalyze and ANALYZE command to wait for
concurrent process as well. Attached patch fixes these two issue. If
this is a bug we should back-patch to 9.6.

void
ginInsertCleanup(GinState *ginstate, bool full_clean,
 bool fill_fsm, IndexBulkDeleteResult *stats)
{

(snip)

boolinVacuum = (stats == NULL);

/*
 * We would like to prevent concurrent cleanup process. For that we will
 * lock metapage in exclusive mode using LockPage() call. Nobody other
 * will use that lock for metapage, so we keep possibility of concurrent
 * insertion into pending list
 */

if (inVacuum)
{
/*
 * We are called from [auto]vacuum/analyze or gin_clean_pending_list()
 * and we would like to wait concurrent cleanup to finish.
 */
LockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock);
workMemory =
(IsAutoVacuumWorkerProcess() && autovacuum_work_mem != -1) ?
autovacuum_work_mem : maintenance_work_mem;
}
else
{
/*
 * We are called from regular insert and if we see concurrent cleanup
 * just exit in hope that concurrent process will clean up pending
 * list.
 */
if (!ConditionalLockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock))
return;
workMemory = work_mem;
}

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_ginInsertCleanup.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-11-08 Thread Masahiko Sawada

On Wed, Nov 8, 2017 at 5:41 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Mon, Nov 6, 2017 at 4:42 AM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
>>>> I suggest that a good thing to do more or less immediately, regardless
>>>> of when this patch ends up being ready, would be to insert an
>>>> insertion that LockAcquire() is never called while holding a lock of
>>>> one of these types.  If that assertion ever fails, then the whole
>>>> theory that these lock types don't need deadlock detection is wrong,
>>>> and we'd like to find out about that sooner or later.
>>>
>>> I understood. I'll check that first.
>>
>> I've checked whether LockAcquire is called while holding a lock of one
>> of four types: LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE,
>> LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN. To summary, I think that
>> we cannot move these four lock types together out of heavy-weight
>> lock, but can move only the relation extension lock with tricks.
>>
>> Here is detail of the survey.
>
> Thanks for these details, but I'm not sure I fully understand.
>
>> * LOCKTAG_RELATION_EXTENSION
>> There is a path that LockRelationForExtension() could be called while
>> holding another relation extension lock. In brin_getinsertbuffer(), we
>> acquire a relation extension lock for a index relation and could
>> initialize a new buffer (brin_initailize_empty_new_buffer()). During
>> initializing a new buffer, we call RecordPageWithFreeSpace() which
>> eventually can call fsm_readbuf(rel, addr, true) where the third
>> argument is "extend". We can process this problem by having the list
>> (or local hash) of acquired locks and skip acquiring the lock if
>> already had. For other call paths calling LockRelationForExtension, I
>> don't see any problem.
>
> Does calling fsm_readbuf(rel,addr,true) take some heavyweight lock?

No, I meant fsm_readbuf(rel,addr,true) can acquire a relation
extension lock. So it's not problem.

> Basically, what matters here in the end is whether we can articulate a
> deadlock-proof rule around the order in which these locks are
> acquired.

You're right, my survey was not enough to make a decision.

As far as the acquiring these four lock types goes, there are two call
paths that acquire any type of locks while holding another type of
lock. The one is that acquiring a relation extension lock and then
acquiring a relation extension lock for the same relation again. As
explained before, this can be resolved by remembering the holding lock
(perhaps holding only last one is enough). Another is that acquiring
either a tuple lock, a page lock or a speculative insertion lock and
then acquiring a relation extension lock. In the second case, we try
to acquire these two locks in the same order; acquiring 3 types lock
and then extension lock. So it's not problem if we apply the rule that
is that we disallow to try acquiring these three lock types while
holding any relation extension lock. Also, as far as I surveyed there
is no path to acquire a relation lock while holding other 3 type
locks.

>  The simplest such rule would be "you can only acquire one
> lock of any of these types at a time, and you can't subsequently
> acquire a heavyweight lock".  But a more complicated rule would be OK
> too, e.g. "you can acquire as many heavyweight locks as you want, and
> after that you can optionally acquire one page, tuple, or speculative
> token lock, and after that you can acquire a relation extension lock".
> The latter rule, although more complex, is still deadlock-proof,
> because the heavyweight locks still use the deadlock detector, and the
> rest has a consistent order of lock acquisition that precludes one
> backend taking A then B while another backend takes B then A.  I'm not
> entirely clear whether your survey leads us to a place where we can
> articulate such a deadlock-proof rule.

Speaking of the acquiring these four lock types and heavy weight lock,
there obviously is a call path to acquire any of four lock types while
holding a heavy weight lock. In reverse, there also is a call path
that we acquire a heavy weight lock while holding any of four lock
types. The call path I found is that in heap_delete we acquire a tuple
lock and call XactLockTableWait or MultiXactIdWait which eventually
could acquire LOCKTAG_TRANSACTION in order to wait for the concurrent
transactions finish. But IIUC since these functions acquire the lock
for the concurrent transaction's transaction id, deadlocks doesn't
happen.
However, there might be other similar call paths if I'm missing
something. For example, we do some operations that might acquire any
heavy weight locks other than LOCKTAG_TRANSACTION, while hol

Re: [HACKERS] Fix bloom WAL tap test

2017-11-07 Thread Masahiko Sawada

On Wed, Nov 8, 2017 at 11:20 AM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> On Wed, Nov 8, 2017 at 1:58 AM, Alexander Korotkov
> <a.korot...@postgrespro.ru> wrote:
>> On Tue, Nov 7, 2017 at 4:34 PM, Masahiko Sawada <sawada.m...@gmail.com>
>> wrote:
>>> I understood the necessity of this patch and reviewed two patches.
>>
>> Good, thank you.
>
> That's clearly a bug fix.
>
>>> diff --git a/contrib/bloom/Makefile b/contrib/bloom/Makefile
>>> index 13bd397..c1566d4 100644
>>> --- a/contrib/bloom/Makefile
>>> +++ b/contrib/bloom/Makefile
>>> @@ -20,5 +20,7 @@ include $(top_builddir)/src/Makefile.global
>>>  include $(top_srcdir)/contrib/contrib-global.mk
>>>  endif
>>>
>>> +check: wal-check
>>> +
>>>  wal-check: temp-install
>>> $(prove_check)
>>
>>
>> I've tried this version Makefile.  And I've seen the only difference: when
>> tap tests are enabled, this version of Makefile runs tap tests before
>> regression tests.  While my version of Makefile runs tap tests after
>> regression tests.  That seems like more desirable behavior for me.  But it
>> would be sill nice to simplify Makefile.
>
> Why do you care about the order of actions? There is no dependency
> between each test and it seems to me that it should remain so to keep
> a maximum flexibility. This does not sacrifice coverage as well. In
> short, Sawada-san's suggestion looks like a thing to me. One piece
> that it is missing though is that installcheck triggers only the
> SQL-based tests, and it should also trigger the TAP tests.

+1

> So I think
> that you should instead do something like that:
>
> --- a/contrib/bloom/Makefile
> +++ b/contrib/bloom/Makefile
> @@ -20,5 +20,12 @@ include $(top_builddir)/src/Makefile.global
>  include $(top_srcdir)/contrib/contrib-global.mk
>  endif
>
> +installcheck: wal-installcheck
> +
> +check: wal-check
> +
> +wal-installcheck:
> +   $(prove_installcheck)
> +
>  wal-check: temp-install
> $(prove_check)
>
> This works for me as I would expect it should.

Looks good to me too.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix a typo in dsm_impl.c

2017-11-07 Thread Masahiko Sawada

On Wed, Nov 8, 2017 at 6:36 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Mon, Nov 6, 2017 at 11:22 PM, Masahiko Sawada <sawada.m...@gmail.com> 
> wrote:
>> Attached the patch for $subject.
>
> Committed.
>

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Remove duplicate setting in test/recovery/Makefile

2017-11-07 Thread Masahiko Sawada

Hi,

I found that EXTRA_INSTALL is doubly set at both top and bottom of the
src/test/recovery/Makefile. Is it necessary?

Attached patch fixes this.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


remove_duplicate_setting.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix bloom WAL tap test

2017-11-07 Thread Masahiko Sawada

On Fri, Sep 29, 2017 at 10:32 PM, Alexander Korotkov
<a.korot...@postgrespro.ru> wrote:
> On Wed, Sep 6, 2017 at 5:06 PM, Alexander Korotkov
> <a.korot...@postgrespro.ru> wrote:
>>
>> On Wed, Sep 6, 2017 at 4:08 PM, Alexander Korotkov
>> <a.korot...@postgrespro.ru> wrote:
>>>
>>> I just realized that these lines of contrib/bloom/t/001_wal.pl don't
>>> check that queries give same results on master and standby.  They just check
>>> that *return codes* of psql are equal.
>>>
>>>> # Run test queries and compare their result
>>>> my $master_result = $node_master->psql("postgres", $queries);
>>>> my $standby_result = $node_standby->psql("postgres", $queries);
>>>> is($master_result, $standby_result, "$test_name: query result matches");
>>>
>>>
>>> Attached patch fixes this problem by using safe_psql() which returns psql
>>> output instead of return code.  For safety, this patch replaces psql() with
>>> safe_psql() in other places too.
>>>
>>> I think this should be backpatched to 9.6 as bugfix.
>>
>>
>> Also, it would be nice to call wal-check on bloom check (now wal-check
>> isn't called even on check-world).
>> See attached patch.  My changes to Makefile could be cumbersome.  Sorry
>> for that, I don't have much experience with them...
>
>
> This topic didn't receive any attention yet.  Apparently, it's because of
> in-progress commitfest and upcoming release.
> I've registered it on upcoming commitfest as bug fix.
> https://commitfest.postgresql.org/15/1313/
>

I understood the necessity of this patch and reviewed two patches.

For /fix-bloom-wal-check.patch, it looks good to me. I found no
problem. But for wal-check-on-bloom-check.patch, if you want to run
wal-check even when running 'make check' or 'make check-world', we can
just add wal-check test as follows?

diff --git a/contrib/bloom/Makefile b/contrib/bloom/Makefile
index 13bd397..c1566d4 100644
--- a/contrib/bloom/Makefile
+++ b/contrib/bloom/Makefile
@@ -20,5 +20,7 @@ include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif

+check: wal-check
+
 wal-check: temp-install
$(prove_check)

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fix a typo in dsm_impl.c

2017-11-06 Thread Masahiko Sawada

Hi,

Attached the patch for $subject.

s/reamin/remain/

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_typo_in_dsm_impl_c.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-11-06 Thread Masahiko Sawada

On Mon, Oct 30, 2017 at 3:17 PM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> On Fri, Oct 27, 2017 at 12:03 AM, Robert Haas <robertmh...@gmail.com> wrote:
>> On Thu, Oct 26, 2017 at 12:36 PM, Masahiko Sawada <sawada.m...@gmail.com> 
>> wrote:
>>> Since the previous patch conflicts with current HEAD, I attached the
>>> updated patch for next CF.
>>
>> I think we should back up here and ask ourselves a couple of questions:
>
> Thank you for summarizing of the purpose and discussion of this patch.
>
>> 1. What are we trying to accomplish here?
>>
>> 2. Is this the best way to accomplish it?
>>
>> To the first question, the problem as I understand it as follows:
>> Heavyweight locks don't conflict between members of a parallel group.
>> However, this is wrong for LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE,
>> LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN.  Currently, those cases
>> don't arise, because parallel operations are strictly read-only
>> (except for inserts by the leader into a just-created table, when only
>> one member of the group can be taking the lock anyway).  However, once
>> we allow writes, they become possible, so some solution is needed.
>>
>> To the second question, there are a couple of ways we could fix this.
>> First, we could continue to allow these locks to be taken in the
>> heavyweight lock manager, but make them conflict even between members
>> of the same lock group.  This is, however, complicated.  A significant
>> problem (or so I think) is that the deadlock detector logic, which is
>> already quite hard to test, will become even more complicated, since
>> wait edges between members of a lock group need to exist at some times
>> and not other times.  Moreover, to the best of my knowledge, the
>> increased complexity would have no benefit, because it doesn't look to
>> me like we ever take any other heavyweight lock while holding one of
>> these four kinds of locks.  Therefore, no deadlock can occur: if we're
>> waiting for one of these locks, the process that holds it is not
>> waiting for any other heavyweight lock.  This gives rise to a second
>> idea: move these locks out of the heavyweight lock manager and handle
>> them with separate code that does not have deadlock detection and
>> doesn't need as many lock modes.  I think that idea is basically
>> sound, although it's possibly not the only sound idea.
>
> I'm on the same page.
>
>>
>> However, that makes me wonder whether we shouldn't be a bit more
>> aggressive with this patch: why JUST relation extension locks?  Why
>> not all four types of locks listed above?  Actually, tuple locks are a
>> bit sticky, because they have four lock modes.  The other three kinds
>> are very similar -- all you can do is "take it" (implicitly, in
>> exclusive mode), "try to take it" (again, implicitly, in exclusive
>> mode), or "wait for it to be released" (i.e. share lock and then
>> release).  Another idea is to try to handle those three types and
>> leave the tuple locking problem for another day.
>>
>> I suggest that a good thing to do more or less immediately, regardless
>> of when this patch ends up being ready, would be to insert an
>> insertion that LockAcquire() is never called while holding a lock of
>> one of these types.  If that assertion ever fails, then the whole
>> theory that these lock types don't need deadlock detection is wrong,
>> and we'd like to find out about that sooner or later.
>
> I understood. I'll check that first.

I've checked whether LockAcquire is called while holding a lock of one
of four types: LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE,
LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN. To summary, I think that
we cannot move these four lock types together out of heavy-weight
lock, but can move only the relation extension lock with tricks.

Here is detail of the survey.

* LOCKTAG_RELATION_EXTENSION
There is a path that LockRelationForExtension() could be called while
holding another relation extension lock. In brin_getinsertbuffer(), we
acquire a relation extension lock for a index relation and could
initialize a new buffer (brin_initailize_empty_new_buffer()). During
initializing a new buffer, we call RecordPageWithFreeSpace() which
eventually can call fsm_readbuf(rel, addr, true) where the third
argument is "extend". We can process this problem by having the list
(or local hash) of acquired locks and skip acquiring the lock if
already had. For other call paths calling LockRelationForExtension, I
don't see any problem.

* LOCKTAG_PAGE, LOCKTAG_TUPLE, LOCKTAG_SPECULATIVE_INSERTION
There is a path that w

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-11-05 Thread Masahiko Sawada

On Sat, Nov 4, 2017 at 7:04 AM, Alexander Korotkov
<a.korot...@postgrespro.ru> wrote:
> On Wed, Nov 1, 2017 at 5:55 AM, Masahiko Sawada <sawada.m...@gmail.com>
> wrote:
>>
>> On Tue, Oct 31, 2017 at 6:17 PM, Alexander Korotkov
>> <a.korot...@postgrespro.ru> wrote:
>> > On Tue, Oct 31, 2017 at 5:16 AM, Masahiko Sawada <sawada.m...@gmail.com>
>> > wrote:
>> >>
>> >> On Mon, Oct 30, 2017 at 10:16 PM, Robert Haas <robertmh...@gmail.com>
>> >> wrote:
>> >> > On Tue, Oct 24, 2017 at 1:26 PM, Ivan Kartyshov
>> >> > <i.kartys...@postgrespro.ru> wrote:
>> >> >> Hello. I made some bugfixes and rewrite the patch.
>> >> >
>> >> > I don't think it's a good idea to deliberately leave the state of the
>> >> > standby different from the state of the  master on the theory that it
>> >> > won't matter.  I feel like that's something that's likely to come
>> >> > back
>> >> > to bite us.
>> >>
>> >> I agree with Robert. What happen if we intentionally don't apply the
>> >> truncation WAL and switched over? If we insert a tuple on the new
>> >> master server to a block that has been truncated on the old master,
>> >> the WAL apply on the new standby will fail? I guess there are such
>> >> corner cases causing failures of WAL replay after switch-over.
>> >
>> >
>> > Yes, that looks dangerous.  One approach to cope that could be teaching
>> > heap
>> > redo function to handle such these situations.  But I expect this
>> > approach
>> > to be criticized for bad design.  And I understand fairness of this
>> > criticism.
>> >
>> > However, from user prospective of view, current behavior of
>> > hot_standby_feedback is just broken, because it both increases bloat and
>> > doesn't guarantee that read-only query on standby wouldn't be cancelled
>> > because of vacuum.  Therefore, we should be looking for solution: if one
>> > approach isn't good enough, then we should look for another approach.
>> >
>> > I can propose following alternative approach: teach read-only queries on
>> > hot
>> > standby to tolerate concurrent relation truncation.  Therefore, when
>> > non-existent heap page is accessed on hot standby, we can know that it
>> > was
>> > deleted by concurrent truncation and should be assumed to be empty.  Any
>> > thoughts?
>> >
>>
>> You also meant that the applying WAL for AccessExclusiveLock is always
>> skipped on standby servers to allow scans to access the relation?
>
>
> Definitely not every AccessExclusiveLock WAL records should be skipped, but
> only whose were emitted during heap truncation.  There are other cases when
> AccessExclusiveLock WAL records are emitted, for instance, during DDL
> operations.  But, I'd like to focus on AccessExclusiveLock WAL records
> caused by VACUUM for now.  It's kind of understandable for users that DDL
> might cancel read-only query on standby.  So, if you're running long report
> query then you should wait with your DDL.  But VACUUM is a different story.
> It runs automatically when you do normal DML queries.
>
> AccessExclusiveLock WAL records by VACUUM could be either not emitted, or
> somehow distinguished and skipped on standby.  I haven't thought out that
> level of detail for now.
>

I understood. I'm concerned the fact that we cannot distinguish that
AccessExclusiveLock WAL came from the vacuum truncation or other
operation required AccessExclusiveLock so far. So I agree that we need
to invent a way for that.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Explicit relation name in VACUUM VERBOSE log

2017-11-01 Thread Masahiko Sawada

On Mon, Oct 2, 2017 at 4:33 PM, Daniel Gustafsson <dan...@yesql.se> wrote:
>> On 29 Aug 2017, at 17:21, Robert Haas <robertmh...@gmail.com> wrote:
>>
>> On Tue, Aug 22, 2017 at 2:23 AM, Simon Riggs <si...@2ndquadrant.com> wrote:
>>> Yes, we can. I'm not sure why you would do this only for VACUUM
>>> though? I see many messages in various places that need same treatment
>>
>> I'm skeptical about the idea of doing this too generally.
>>
>> rhaas=> select * from foo;
>> ERROR:  permission denied for relation foo
>>
>> Do we really want to start saying public.foo in all such error
>> messages?  To me, that's occasionally helpful but more often just
>> useless chatter.
>>
>> One problem with making this behavior optional is that we'd then need
>> two separate translatable strings in every case, one saying "table %s
>> has problems" and the other saying "table %s.%s has problems".  Maybe
>> we could avoid that via some clever trick, but that's how we're doing
>> it in some existing cases.
>>
>> I have a feeling we're making a small patch with a narrow goal into a
>> giant patch for which everyone has a different goal.
>
> Based on the concerns raised in this thread which are left to address or
> resolve, and the fact that the patch has been without update during the CF, 
> I’m
> marking this Returned with Feedback.  Please re-submit a new version of the
> patch to a future commitfest.
>

After more thought, I'd like to focus this work on only log messages
related to VACUUM. The making all logs show explicit names might not
be good solution so we can leave the more generic solution for other
log messages. But especially for VACUUM VERBOSE log, since the log
messages could be very long when the table has multiple indices this
patch would be useful for users.
Since the previous patch conflicts with current HEAD, attached the
updated patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


vacuum_more_explicit_relname_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-10-31 Thread Masahiko Sawada

On Tue, Oct 31, 2017 at 6:17 PM, Alexander Korotkov
<a.korot...@postgrespro.ru> wrote:
> On Tue, Oct 31, 2017 at 5:16 AM, Masahiko Sawada <sawada.m...@gmail.com>
> wrote:
>>
>> On Mon, Oct 30, 2017 at 10:16 PM, Robert Haas <robertmh...@gmail.com>
>> wrote:
>> > On Tue, Oct 24, 2017 at 1:26 PM, Ivan Kartyshov
>> > <i.kartys...@postgrespro.ru> wrote:
>> >> Hello. I made some bugfixes and rewrite the patch.
>> >
>> > I don't think it's a good idea to deliberately leave the state of the
>> > standby different from the state of the  master on the theory that it
>> > won't matter.  I feel like that's something that's likely to come back
>> > to bite us.
>>
>> I agree with Robert. What happen if we intentionally don't apply the
>> truncation WAL and switched over? If we insert a tuple on the new
>> master server to a block that has been truncated on the old master,
>> the WAL apply on the new standby will fail? I guess there are such
>> corner cases causing failures of WAL replay after switch-over.
>
>
> Yes, that looks dangerous.  One approach to cope that could be teaching heap
> redo function to handle such these situations.  But I expect this approach
> to be criticized for bad design.  And I understand fairness of this
> criticism.
>
> However, from user prospective of view, current behavior of
> hot_standby_feedback is just broken, because it both increases bloat and
> doesn't guarantee that read-only query on standby wouldn't be cancelled
> because of vacuum.  Therefore, we should be looking for solution: if one
> approach isn't good enough, then we should look for another approach.
>
> I can propose following alternative approach: teach read-only queries on hot
> standby to tolerate concurrent relation truncation.  Therefore, when
> non-existent heap page is accessed on hot standby, we can know that it was
> deleted by concurrent truncation and should be assumed to be empty.  Any
> thoughts?
>

You also meant that the applying WAL for AccessExclusiveLock is always
skipped on standby servers to allow scans to access the relation?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Deadlock in ALTER SUBSCRIPTION REFRESH PUBLICATION

2017-10-31 Thread Masahiko Sawada

sUtilitySlow (pstate=0x2479dd0,
> pstmt=0x24dd998, queryString=0x24dc920 "SET SESSION synchronous_commit TO
> local; ALTER SUBSCRIPTION sub_3_1 REFRESH PUBLICATION",
> context=PROCESS_UTILITY_QUERY, params=0x0, queryEnv=0x0, dest=0x24ddcb8,
> completionTag=0x73a43c90 "") at utility.c:1629
> #12 0x008a90d0 in standard_ProcessUtility (pstmt=0x24dd998,
> queryString=0x24dc920 "SET SESSION synchronous_commit TO local; ALTER
> SUBSCRIPTION sub_3_1 REFRESH PUBLICATION",
> context=PROCESS_UTILITY_QUERY, params=0x0, queryEnv=0x0, dest=0x24ddcb8,
> completionTag=0x73a43c90 "") at utility.c:928
> #13 0x7f7b98528884 in pathman_process_utility_hook (first_arg=0x24dd998,
> queryString=0x24dc920 "SET SESSION synchronous_commit TO local; ALTER
> SUBSCRIPTION sub_3_1 REFRESH PUBLICATION",
> context=PROCESS_UTILITY_QUERY, params=0x0, queryEnv=0x0, dest=0x24ddcb8,
> completionTag=0x73a43c90 "") at src/hooks.c:913
> #14 0x008a80a2 in ProcessUtility (pstmt=0x24dd998,
> queryString=0x24dc920 "SET SESSION synchronous_commit TO local; ALTER
> SUBSCRIPTION sub_3_1 REFRESH PUBLICATION",
> context=PROCESS_UTILITY_QUERY, params=0x0, queryEnv=0x0, dest=0x24ddcb8,
> completionTag=0x73a43c90 "") at utility.c:353
> #15 0x008a7075 in PortalRunUtility (portal=0x246be60,
> pstmt=0x24dd998, isTopLevel=0 '\000', setHoldSnapshot=0 '\000',
> dest=0x24ddcb8, completionTag=0x73a43c90 "") at pquery.c:1178
> #16 0x008a728d in PortalRunMulti (portal=0x246be60, isTopLevel=0
> '\000', setHoldSnapshot=0 '\000', dest=0x24ddcb8, altdest=0x24ddcb8,
> completionTag=0x73a43c90 "") at pquery.c:1324
> #17 0x008a6757 in PortalRun (portal=0x246be60,
> count=9223372036854775807, isTopLevel=0 '\000', run_once=1 '\001',
> dest=0x24ddcb8, altdest=0x24ddcb8, completionTag=0x73a43c90 "")
> at pquery.c:799
> #18 0x008a0288 in exec_simple_query (query_string=0x24dc920 "SET
> SESSION synchronous_commit TO local; ALTER SUBSCRIPTION sub_3_1 REFRESH
> PUBLICATION") at postgres.c:1099
> #19 0x008a4823 in PostgresMain (argc=1, argv=0x247c2d0,
> dbname=0x247c2a8 "postgres", username=0x244f870 "knizhnik") at
> postgres.c:4090
> #20 0x00801753 in BackendRun (port=0x240) at postmaster.c:4357
> #21 0x00800e5f in BackendStartup (port=0x240) at
> postmaster.c:4029
> #22 0x007fd398 in ServerLoop () at postmaster.c:1753
> #23 0x007fc92f in PostmasterMain (argc=3, argv=0x244d6e0) at
> postmaster.c:1361
> #24 0x00734f08 in main (argc=3, argv=0x244d6e0) at main.c:228
>
>
>
> The reason of this deadlock seems to be clear: ALTER SUBSCRIPTION starts
> transaction at one node and tries to create slot at other node, which
> waiting for completion of all active transaction while building scnapshpot.
> Is there any way to avoid this deadlock?

I think that there is no way to avoid this deadlock except for
refreshing subscriptions sequentially. One solution I came up with is
to change ALTER TABLE REFRESH so that we can specify replication slot
names used for table sync, or can accord the ERRCODE_DUPLICATE_OBJECT
error when creating a replication slot. These are obviously poor
solutions, though.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-10-30 Thread Masahiko Sawada

On Mon, Oct 30, 2017 at 10:16 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Tue, Oct 24, 2017 at 1:26 PM, Ivan Kartyshov
> <i.kartys...@postgrespro.ru> wrote:
>> Hello. I made some bugfixes and rewrite the patch.
>
> I don't think it's a good idea to deliberately leave the state of the
> standby different from the state of the  master on the theory that it
> won't matter.  I feel like that's something that's likely to come back
> to bite us.

I agree with Robert. What happen if we intentionally don't apply the
truncation WAL and switched over? If we insert a tuple on the new
master server to a block that has been truncated on the old master,
the WAL apply on the new standby will fail? I guess there are such
corner cases causing failures of WAL replay after switch-over.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2017-10-30 Thread Masahiko Sawada

On Mon, Oct 30, 2017 at 5:48 PM, Ashutosh Bapat
<ashutosh.ba...@enterprisedb.com> wrote:
> On Thu, Oct 26, 2017 at 7:41 PM, Masahiko Sawada <sawada.m...@gmail.com> 
> wrote:
>>
>> Because I don't want to break the current user semantics. that is,
>> currently it's guaranteed that the subsequent reads can see the
>> committed result of previous writes even if the previous transactions
>> were distributed transactions. And it's ensured by writer side. If we
>> can make the reader side ensure it, the backend process don't need to
>> wait for the resolver process.
>>
>> The waiting backend process are released by resolver process after the
>> resolver process tried to resolve foreign transactions. Even if
>> resolver process failed to either connect to foreign server or to
>> resolve foreign transaction the backend process will be released and
>> the foreign transactions are leaved as dangling transaction in that
>> case, which are processed later. Also if resolver process takes a long
>> time to resolve foreign transactions for whatever reason the user can
>> cancel it by Ctl-c anytime.
>>
>
> So, there's no guarantee that the next command issued from the
> connection *will* see the committed data, since the foreign
> transaction might not have committed because of a network glitch
> (say). If we go this route of making backends wait for resolver to
> resolve the foreign transaction, we will have add complexity to make
> sure that the waiting backends are woken up in problematic events like
> crash of the resolver process OR if the resolver process hangs in a
> connection to a foreign server etc. I am not sure that the complexity
> is worth the half-guarantee.
>

Hmm, maybe I was wrong. I now think that the waiting backends can be
woken up only in following cases;
- The resolver process succeeded to resolve all foreign transactions.
- The user did the cancel (e.g. ctl-c).
- The resolver process failed to resolve foreign transaction for a
reason of there is no such prepared transaction on foreign server.

In other cases the resolver process should not release the waiters.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-10-30 Thread Masahiko Sawada

On Fri, Oct 27, 2017 at 12:03 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Thu, Oct 26, 2017 at 12:36 PM, Masahiko Sawada <sawada.m...@gmail.com> 
> wrote:
>> Since the previous patch conflicts with current HEAD, I attached the
>> updated patch for next CF.
>
> I think we should back up here and ask ourselves a couple of questions:

Thank you for summarizing of the purpose and discussion of this patch.

> 1. What are we trying to accomplish here?
>
> 2. Is this the best way to accomplish it?
>
> To the first question, the problem as I understand it as follows:
> Heavyweight locks don't conflict between members of a parallel group.
> However, this is wrong for LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE,
> LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN.  Currently, those cases
> don't arise, because parallel operations are strictly read-only
> (except for inserts by the leader into a just-created table, when only
> one member of the group can be taking the lock anyway).  However, once
> we allow writes, they become possible, so some solution is needed.
>
> To the second question, there are a couple of ways we could fix this.
> First, we could continue to allow these locks to be taken in the
> heavyweight lock manager, but make them conflict even between members
> of the same lock group.  This is, however, complicated.  A significant
> problem (or so I think) is that the deadlock detector logic, which is
> already quite hard to test, will become even more complicated, since
> wait edges between members of a lock group need to exist at some times
> and not other times.  Moreover, to the best of my knowledge, the
> increased complexity would have no benefit, because it doesn't look to
> me like we ever take any other heavyweight lock while holding one of
> these four kinds of locks.  Therefore, no deadlock can occur: if we're
> waiting for one of these locks, the process that holds it is not
> waiting for any other heavyweight lock.  This gives rise to a second
> idea: move these locks out of the heavyweight lock manager and handle
> them with separate code that does not have deadlock detection and
> doesn't need as many lock modes.  I think that idea is basically
> sound, although it's possibly not the only sound idea.

I'm on the same page.

>
> However, that makes me wonder whether we shouldn't be a bit more
> aggressive with this patch: why JUST relation extension locks?  Why
> not all four types of locks listed above?  Actually, tuple locks are a
> bit sticky, because they have four lock modes.  The other three kinds
> are very similar -- all you can do is "take it" (implicitly, in
> exclusive mode), "try to take it" (again, implicitly, in exclusive
> mode), or "wait for it to be released" (i.e. share lock and then
> release).  Another idea is to try to handle those three types and
> leave the tuple locking problem for another day.
>
> I suggest that a good thing to do more or less immediately, regardless
> of when this patch ends up being ready, would be to insert an
> insertion that LockAcquire() is never called while holding a lock of
> one of these types.  If that assertion ever fails, then the whole
> theory that these lock types don't need deadlock detection is wrong,
> and we'd like to find out about that sooner or later.

I understood. I'll check that first. If this direction has no problem
and we changed these three locks so that it uses new lock mechanism,
we'll not be able to use these locks at the same time. Since it also
means that we impose a limitation to the future we should think
carefully about it. We can implement the deadlock detection mechanism
for it again but it doesn't make sense.

>
> On the details of the patch, it appears that RelExtLockAcquire()
> executes the wait-for-lock code with the partition lock held, and then
> continues to hold the partition lock for the entire time that the
> relation extension lock is held.  That not only makes all code that
> runs while holding the lock non-interruptible but makes a lot of the
> rest of this code pointless.  How is any of this atomics code going to
> be reached by more than one process at the same time if the entire
> bucket is exclusive-locked?  I would guess that the concurrency is not
> very good here for the same reason.  Of course, just releasing the
> bucket lock wouldn't be right either, because then ext_lock might go
> away while we've got a pointer to it, which wouldn't be good.  I think
> you could make this work if each lock had both a locker count and a
> pin count, and the object can only be removed when the pin_count is 0.
> So the lock algorithm would look like this:
>
> - Acquire the partition LWLock.
> - Find the item of interest, cr

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2017-10-26 Thread Masahiko Sawada

On Thu, Oct 26, 2017 at 2:36 PM, Ashutosh Bapat
<ashutosh.ba...@enterprisedb.com> wrote:
> On Wed, Oct 25, 2017 at 3:15 AM, Masahiko Sawada <sawada.m...@gmail.com> 
> wrote:
>>
>> Foreign Transaction Resolver
>> ==
>> I introduced a new background worker called "foreign transaction
>> resolver" which is responsible for resolving the transaction prepared
>> on foreign servers. The foreign transaction resolver process is
>> launched by backend processes when commit/rollback transaction. And it
>> periodically resolves the queued transactions on a database as long as
>> the queue is not empty. If the queue has been empty for the certain
>> time specified by foreign_transaction_resolver_time GUC parameter, it
>> exits. It means that the backend doesn't launch a new resolver process
>> if the resolver process is already working. In this case, the backend
>> process just adds the entry to the queue on shared memory and wake it
>> up. The maximum number of resolver process we can launch is controlled
>> by max_foreign_transaction_resolvers. So we recommends to set larger
>> max_foreign_transaction_resolvers value than the number of databases.
>> The resolver process also tries to resolve dangling transaction as
>> well in a cycle.
>>
>> Processing Sequence
>> =
>> I've changed the processing sequence of resolving foreign transaction
>> so that the second phase of two-phase commit protocol (COMMIT/ROLLBACK
>> prepared) is executed by a resolver process, not by backend process.
>> The basic processing sequence is following;
>>
>> * Backend process
>> 1. In pre-commit phase, the backend process saves fdwxact entries, and
>> then prepares transaction on all foreign servers that can execute
>> two-phase commit protocol.
>> 2. Local commit.
>> 3. Enqueue itself to the shmem queue and change its status to WAITING
>> 4. launch or wakeup a resolver process and wait
>>
>> * Resolver process
>> 1. Dequeue the waiting process from shmem qeue
>> 2. Collect the fdwxact entries that are associated with the waiting 
>> process.
>> 3. Resolve foreign transactoins
>> 4. Release the waiting process
>
> Why do we want the the backend to linger behind, once it has added its
> foreign transaction entries in the shared memory and informed resolver
> about it? The foreign connections may take their own time and even
> after that there is no guarantee that the foreign transactions will be
> resolved in case the foreign server is not available. So, why to make
> the backend wait?

Because I don't want to break the current user semantics. that is,
currently it's guaranteed that the subsequent reads can see the
committed result of previous writes even if the previous transactions
were distributed transactions. And it's ensured by writer side. If we
can make the reader side ensure it, the backend process don't need to
wait for the resolver process.

The waiting backend process are released by resolver process after the
resolver process tried to resolve foreign transactions. Even if
resolver process failed to either connect to foreign server or to
resolve foreign transaction the backend process will be released and
the foreign transactions are leaved as dangling transaction in that
case, which are processed later. Also if resolver process takes a long
time to resolve foreign transactions for whatever reason the user can
cancel it by Ctl-c anytime.

>>
>> 5. Wake up and restart
>>
>> This is still under the design phase and I'm sure that there is room
>> for improvement and consider more sensitive behaviour but I'd like to
>> share the current status of the patch. The patch includes regression
>> tests but not includes fully documentation.
>
> Any background worker, backend should be child of the postmaster, so
> we should not let a backend start a resolver process. It should be the
> job of the postmaster.
>

Of course I won't. I used the term of "the backend process launches
the resolver process" for explaining easier. Sorry for confusing you.
The backend process calls RegisterDynamicBackgroundWorker() function
to launch a resolver process, so they are launched by postmaster.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-10-26 Thread Masahiko Sawada

On Fri, Sep 8, 2017 at 4:32 AM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> On Fri, Sep 8, 2017 at 7:24 AM, Thomas Munro
> <thomas.mu...@enterprisedb.com> wrote:
>> On Wed, Aug 16, 2017 at 2:13 PM, Masahiko Sawada <sawada.m...@gmail.com> 
>> wrote:
>>> The previous patch conflicts with current HEAD, I rebased the patch to
>>> current HEAD.
>>
>> Hi Masahiko-san,
>>
>> FYI this doesn't build anymore.  I think it's just because the wait
>> event enumerators were re-alphabetised in pgstat.h:
>>
>> ../../../../src/include/pgstat.h:820:2: error: redeclaration of
>> enumerator ‘WAIT_EVENT_LOGICAL_SYNC_DATA’
>>   WAIT_EVENT_LOGICAL_SYNC_DATA,
>>   ^
>> ../../../../src/include/pgstat.h:806:2: note: previous definition of
>> ‘WAIT_EVENT_LOGICAL_SYNC_DATA’ was here
>>   WAIT_EVENT_LOGICAL_SYNC_DATA,
>>   ^
>> ../../../../src/include/pgstat.h:821:2: error: redeclaration of
>> enumerator ‘WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE’
>>   WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE,
>>   ^
>> ../../../../src/include/pgstat.h:807:2: note: previous definition of
>> ‘WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE’ was here
>>   WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE,
>>   ^
>>
>
> Thank you for the information! Attached rebased patch.
>

Since the previous patch conflicts with current HEAD, I attached the
updated patch for next CF.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index 80f803e..b928c1a 100644
--- a/src/backend/access/brin/brin_pageops.c
+++ b/src/backend/access/brin/brin_pageops.c
@@ -609,8 +609,8 @@ brin_page_cleanup(Relation idxrel, Buffer buf)
 	 */
 	if (PageIsNew(page))
 	{
-		LockRelationForExtension(idxrel, ShareLock);
-		UnlockRelationForExtension(idxrel, ShareLock);
+		LockRelationForExtension(idxrel, RELEXT_SHARED);
+		UnlockRelationForExtension(idxrel, RELEXT_SHARED);
 
 		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
 		if (PageIsNew(page))
@@ -702,7 +702,7 @@ brin_getinsertbuffer(Relation irel, Buffer oldbuf, Size itemsz,
 			 */
 			if (!RELATION_IS_LOCAL(irel))
 			{
-LockRelationForExtension(irel, ExclusiveLock);
+LockRelationForExtension(irel, RELEXT_EXCLUSIVE);
 extensionLockHeld = true;
 			}
 			buf = ReadBuffer(irel, P_NEW);
@@ -754,7 +754,7 @@ brin_getinsertbuffer(Relation irel, Buffer oldbuf, Size itemsz,
 }
 
 if (extensionLockHeld)
-	UnlockRelationForExtension(irel, ExclusiveLock);
+	UnlockRelationForExtension(irel, RELEXT_EXCLUSIVE);
 
 ReleaseBuffer(buf);
 return InvalidBuffer;
@@ -764,7 +764,7 @@ brin_getinsertbuffer(Relation irel, Buffer oldbuf, Size itemsz,
 		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
 
 		if (extensionLockHeld)
-			UnlockRelationForExtension(irel, ExclusiveLock);
+			UnlockRelationForExtension(irel, RELEXT_EXCLUSIVE);
 
 		page = BufferGetPage(buf);
 
diff --git a/src/backend/access/brin/brin_revmap.c b/src/backend/access/brin/brin_revmap.c
index 22f2076..4c15b45 100644
--- a/src/backend/access/brin/brin_revmap.c
+++ b/src/backend/access/brin/brin_revmap.c
@@ -570,7 +570,7 @@ revmap_physical_extend(BrinRevmap *revmap)
 	else
 	{
 		if (needLock)
-			LockRelationForExtension(irel, ExclusiveLock);
+			LockRelationForExtension(irel, RELEXT_EXCLUSIVE);
 
 		buf = ReadBuffer(irel, P_NEW);
 		if (BufferGetBlockNumber(buf) != mapBlk)
@@ -582,7 +582,7 @@ revmap_physical_extend(BrinRevmap *revmap)
 			 * page from under whoever is using it.
 			 */
 			if (needLock)
-UnlockRelationForExtension(irel, ExclusiveLock);
+UnlockRelationForExtension(irel, RELEXT_EXCLUSIVE);
 			LockBuffer(revmap->rm_metaBuf, BUFFER_LOCK_UNLOCK);
 			ReleaseBuffer(buf);
 			return;
@@ -591,7 +591,7 @@ revmap_physical_extend(BrinRevmap *revmap)
 		page = BufferGetPage(buf);
 
 		if (needLock)
-			UnlockRelationForExtension(irel, ExclusiveLock);
+			UnlockRelationForExtension(irel, RELEXT_EXCLUSIVE);
 	}
 
 	/* Check that it's a regular block (or an empty page) */
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 136ea27..1690d21 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -325,13 +325,13 @@ GinNewBuffer(Relation index)
 	/* Must extend the file */
 	needLock = !RELATION_IS_LOCAL(index);
 	if (needLock)
-		LockRelationForExtension(index, ExclusiveLock);
+		LockRelationForExtension(index, RELEXT_EXCLUSIVE);
 
 	buffer = ReadBuffer(index, P_NEW);
 	LockBuffer(buffer, GIN_EXCLUSIVE);
 
 	if (needLock)
-		UnlockRelationForExtension(index, ExclusiveLock);
+		UnlockRelationForExtension(index, RELEXT_EXCLUSIVE);
 
 	return buffer;
 }
diff --git a/src/backend/access/gin/ginvacuum.c b/src/backend/access/gin/ginvacuum.c
index 31425e9..e9f84bc 100644
--- a/s

Re: [HACKERS] Block level parallel vacuum WIP

2017-10-23 Thread Masahiko Sawada

On Mon, Oct 23, 2017 at 10:43 AM, Amit Langote
<langote_amit...@lab.ntt.co.jp> wrote:
> On 2017/10/22 5:25, Thomas Munro wrote:
>> On Sun, Oct 22, 2017 at 5:09 AM, Robert Haas <robertmh...@gmail.com> wrote:
>>> On Tue, Sep 19, 2017 at 3:31 AM, Masahiko Sawada <sawada.m...@gmail.com> 
>>> wrote:
>>>>> Down at the bottom of the build log in the regression diffs file you can 
>>>>> see:
>>>>>
>>>>> ! ERROR: cache lookup failed for relation 32893
>>>>>
>>>>> https://travis-ci.org/postgresql-cfbot/postgresql/builds/277165907
>>>>
>>>> Thank you for letting me know.
>>>>
>>>> Hmm, it's an interesting failure. I'll investigate it and post the new 
>>>> patch.
>>>
>>> Did you ever find out what the cause of this problem was?
>>
>> I wonder if it might have been the same issue that commit
>> 19de0ab23ccba12567c18640f00b49f01471018d fixed a week or so later.
>
> Hmm, 19de0ab23ccba seems to prevent the "cache lookup failed for relation
> XXX" error in a different code path though (the code path handling manual
> vacuum).  Not sure if the commit could have prevented that error being
> emitted by DROP SCHEMA ... CASCADE, which seems to be what produced it in
> this case.  Maybe I'm missing something though.
>

Yeah, I was thinking the commit is relevant with this issue but as
Amit mentioned this error is emitted by DROP SCHEMA CASCASE.
I don't find out the cause of this issue yet. With the previous
version patch, autovacuum workers were woking with one parallel worker
but it never drops relations. So it's possible that the error might
not have been relevant with the patch but anywayI'll continue to work
on that.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More stats about skipped vacuums

2017-10-20 Thread Masahiko Sawada

On Tue, Oct 10, 2017 at 7:26 PM, Kyotaro HORIGUCHI
<horiguchi.kyot...@lab.ntt.co.jp> wrote:
> Hello.
> Once in a while I am asked about table bloat. In most cases the
> cause is long lasting transactions and vacuum canceling in some
> cases. Whatever the case users don't have enough clues to why
> they have bloated tables.
>
> At the top of the annoyances list for users would be that they
> cannot know whether autovacuum decided that a table needs vacuum
> or not. I suppose that it could be shown in pg_stat_*_tables.
>
>   n_mod_since_analyze | 2
> + vacuum_requred  | true
>   last_vacuum | 2017-10-10 17:21:54.380805+09
>
> If vacuum_required remains true for a certain time, it means that
> vacuuming stopped halfway or someone is killing it repeatedly.
> That status could be shown in the same view.

Because the table statistics is updated at end of the vacuum I think
that the autovacuum will process the table at the next cycle if it has
stopped halfway or has killed. So you mean that vacuum_required is for
uses who want to reclaim garbage without wait for autovacuum retry?

>   n_mod_since_analyze | 2
> + vacuum_requred  | true
>   last_vacuum | 2017-10-10 17:21:54.380805+09
>   last_autovacuum | 2017-10-10 17:21:54.380805+09
> + last_autovacuum_status  | Killed by lock conflict
>
> Where the "Killed by lock conflict" would be one of the followings.
>
>   - Completed (oldest xmin = 8023)
>   - May not be fully truncated (yielded at 1324 of 6447 expected)
>   - Truncation skipped
>   - Skipped by lock failure
>   - Killed by lock conflict
>
>
> If we want more formal expression, we can show the values in the
> following shape. And adding some more values could be useful.
>
>   n_mod_since_analyze  | 2
> + vacuum_requred   | true
> + last_vacuum_oldest_xid   | 8023
> + last_vacuum_left_to_truncate | 5123
> + last_vacuum_truncated| 387
>   last_vacuum  | 2017-10-10 17:21:54.380805+09
>   last_autovacuum  | 2017-10-10 17:21:54.380805+09
> + last_autovacuum_status   | Killed by lock conflict
> ...
>   autovacuum_count | 128
> + incomplete_autovacuum_count  | 53
>
> # The last one might be needless..

I'm not sure that the above informations will help for users or DBA
but personally I sometimes want to have the number of index scans of
the last autovacuum in the pg_stat_user_tables view. That value
indicates how efficiently vacuums performed and would be a signal to
increase the setting of autovacuum_work_mem for user.

> Where the "Killed by lock conflict" is one of the followings.
>
>- Completed
>- Truncation skipped
>- Partially truncated
>- Skipped
>- Killed by lock conflict
>
> This seems enough to find the cause of a table bloat. The same
> discussion could be applied to analyze but it might be the
> another issue.
>
> There may be a better way to indicate the vacuum soundness. Any
> opinions and suggestions are welcome.
>
> I'm going to make a patch to do the 'formal' one for the time
> being.
>

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fix a typo in libpq/auth.c

2017-10-19 Thread Masahiko Sawada

Hi,

Attached a patch for $subject.

s/RAIDUS/RADIUS/

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_typo_in_auth_c.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] show "aggressive" or not in autovacuum logs

2017-10-19 Thread Masahiko Sawada

On Tue, Sep 5, 2017 at 3:41 PM, Kyotaro HORIGUCHI
<horiguchi.kyot...@lab.ntt.co.jp> wrote:
> Thank you for the opinions.
>
> At Tue, 29 Aug 2017 15:00:57 +0900, Masahiko Sawada <sawada.m...@gmail.com> 
> wrote in

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-10-18 Thread Masahiko Sawada

On Wed, Oct 18, 2017 at 5:32 AM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:
>
> Hello Masahiko-san,
>
>>> Attached the updated version patch.
>>
>>
>> Applies, compiles, make check & tap test ok, doc is fine.
>>
>> All is well for me: the feature is useful, code is simple and clean, it is
>> easy to invoke, easy to extend as well, which I'm planning to do once it
>> gets in.
>>
>> I switched the patch to "Ready for Committers". No doubt they will have
>> their own opinions about it. Let's wait and see.
>
>
> The patch needs a rebase in the documentation because of the xml-ilization
> of the sgml doc.
>

Thank you for the notification! Attached rebased patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgbench_custom_initialization_v16.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2017-10-17 Thread Masahiko Sawada

On Thu, Sep 14, 2017 at 11:00 PM, Craig Ringer <cr...@2ndquadrant.com> wrote:
> On 14 September 2017 at 19:57, Ashutosh Sharma <ashu.coe...@gmail.com>
> wrote:
>
>>
>>
>> Are you planning to work on the review comments from Robert, Moon
>> Insung and supply the new patch. I just had a quick glance into this
>> mail thread (after a long time) and could understand Robert's concern
>> till some extent. I think, he is trying to say that if a tuple is
>> frozen (committed|invalid) then it shouldn't be shown as COMMITTED and
>> INVALID together in fact it should just be displayed as FROZEN tuple.
>
>
> Yes, I'd like to, and should have time for it in this CF.
>
> My plan is to emit raw flags by default, so FROZEN would't be shown at all,
> only COMMITTED|INVALID. If the bool to decode combined flags is set, then
> it'll show things like FROZEN, and hide COMMITTED|INVALID. Similar for other
> combos.
>

FWIW, I agree with this direction. ISTM the showing the raw flags by
default and having an option to show combined flags would be a right
way.
I sometimes wanted to have the same mechanism for lp_flags but maybe
it should be discussed on a separated thread.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-10-16 Thread Masahiko Sawada

On Fri, Oct 13, 2017 at 1:14 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Tue, Oct 10, 2017 at 5:55 AM, Pavel Golub <pa...@microolap.com> wrote:
>> DP> The new status of this patch is: Ready for Committer
>>
>> Seems  like,  we  may  also  going to hit it and it would be cool this
>> vacuum issue solved for next PG version.
>
> Exactly which patch on this thread is someone proposing for commit?
>

I guess that is the patch I proposed. However I think that there still
is room for discussion because the patch cannot skip to cleanup vacuum
when aggressive vacuum, which is one of the situation that I really
wanted to skip.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix a typo in execReplication.c

2017-10-13 Thread Masahiko Sawada

On Thu, Oct 12, 2017 at 11:30 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Thu, Oct 12, 2017 at 6:55 AM, Petr Jelinek
> <petr.jeli...@2ndquadrant.com> wrote:
>> Thanks for the patch, looks correct to me.
>
> Committed and back-patched to v10.
>

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix a typo in execReplication.c

2017-10-11 Thread Masahiko Sawada

On Thu, Oct 12, 2017 at 5:02 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Mon, Oct 9, 2017 at 10:59 PM, Masahiko Sawada <sawada.m...@gmail.com> 
> wrote:
>> Attached a patch for $subject. ISTM these are mistakes of copy-and-paste.
>
> Committed, but isn't the code itself wrong as well in the DELETE case?
>
> /* BEFORE ROW DELETE Triggers */
> if (resultRelInfo->ri_TrigDesc &&
> resultRelInfo->ri_TrigDesc->trig_update_before_row)
> {
> skip_tuple = !ExecBRDeleteTriggers(estate, epqstate, resultRelInfo,
>>tts_tuple->t_self,
>NULL);
> }
>
> Why not trig_delete_before_row?
>

Thank you!
I think you're right. It should be trig_delete_before_row and be
back-patched to 10. Attached patch fixes it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index c26420a..fb538c0 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -511,7 +511,7 @@ ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
 
 	/* BEFORE ROW DELETE Triggers */
 	if (resultRelInfo->ri_TrigDesc &&
-		resultRelInfo->ri_TrigDesc->trig_update_before_row)
+		resultRelInfo->ri_TrigDesc->trig_delete_before_row)
 	{
 		skip_tuple = !ExecBRDeleteTriggers(estate, epqstate, resultRelInfo,
 		   >tts_tuple->t_self,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Slow synchronous logical replication

2017-10-11 Thread Masahiko Sawada

On Mon, Oct 9, 2017 at 4:37 PM, Konstantin Knizhnik
<k.knizh...@postgrespro.ru> wrote:
> Thank you for explanations.
>
> On 08.10.2017 16:00, Craig Ringer wrote:
>>
>> I think it'd be helpful if you provided reproduction instructions,
>> test programs, etc, making it very clear when things are / aren't
>> related to your changes.
>
>
> It will be not so easy to provide some reproducing scenario, because
> actually it involves many components (postgres_fdw, pg_pasthman,
> pg_shardman, LR,...)
> and requires multinode installation.
> But let me try to explain what going on:
> So we have implement sharding - splitting data between several remote tables
> using pg_pathman and postgres_fdw.
> It means that insert or update of parent table  cause insert or update of
> some derived partitions which is forwarded by postgres_fdw to the
> correspondent node.
> Number of shards is significantly larger than number of nodes, i.e. for 5
> nodes we have 50 shards. Which means that at each onde we have 10 shards.
> To provide fault tolerance each shard is replicated using logical
> replication to one or more nodes. Right now we considered only redundancy
> level 1 - each shard has only one replica.
> So from each node we establish 10 logical replication channels.
>
> We want commit to wait until data is actually stored at all replicas, so we
> are using synchronous replication:
> So we set synchronous_commit option to "on" and include all ten 10
> subscriptions in synchronous_standby_names list.
>
> In this setup commit latency is very large (about 100msec and most of the
> time is actually spent in commit) and performance is very bad - pgbench
> shows about 300 TPS for optimal number of clients (about 10, for larger
> number performance is almost the same). Without logical replication at the
> same setup we get about 6000 TPS.
>
> I have checked syncrepl.c file, particularly SyncRepGetSyncRecPtr function.
> Each wal sender independently calculates minimal LSN among all synchronous
> replicas and wakeup backends waiting for this LSN. It means that transaction
> performing update of data in one shard will actually wait confirmation from
> replication channels for all shards.
> If some shard is updated rarely than other or is not updated at all (for
> example because communication channels between this node is broken), then
> all backens will stuck.
> Also all backends are competing for the single SyncRepLock, which also can
> be a contention point.
>

IIUC, I guess you meant to say that in current synchronous logical
replication a transaction has to wait for updated table data to be
replicated even on servers that don't subscribe for the table. If we
change it so that a transaction needs to wait for only the server that
are subscribing for the table it would be more efficiency, for at
least your use case.
We send at least the begin and commit data to all subscriptions and
then wait for the reply from them but can we skip to wait them, for
example, when the walsender actually didn't send any data modified by
the transaction?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [BUGS] 10.0: Logical replication doesn't execute BEFORE UPDATE OF trigger

2017-10-10 Thread Masahiko Sawada

On Tue, Oct 10, 2017 at 11:29 AM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> On Mon, Oct 9, 2017 at 11:13 PM, Aleksander Alekseev
> <a.aleks...@postgrespro.ru> wrote:
>> Hi hackers,
>>
>> I've found something that looks like a bug.
>>
>> Steps to reproduce
>> --
>>
>> There are 3 instances of PostgreSQL 10.0 - inst1, inst2 and inst3. There
>> is a table `test` on every instance:
>>
>> ```
>> CREATE TABLE test(k TEXT PRIMARY KEY, v TEXT);
>> ```
>>
>> Both inst1 and inst2 have `allpub` publication:
>>
>> ```
>> CREATE PUBLICATION allpub FOR ALL TABLES;
>> ```
>>
>> ... and inst3 is subscribed for both publications:
>>
>> ```
>> CREATE SUBSCRIPTION allsub1
>>   CONNECTION 'host=10.128.0.16 user=eax dbname=eax'
>>   PUBLICATION allpub;
>>
>> CREATE SUBSCRIPTION allsub2
>>   CONNECTION 'host=10.128.0.26 user=eax dbname=eax'
>>   PUBLICATION allpub;
>> ```
>>
>> So basically it's two masters, one replica configuration. To resolve
>> insert/update conflicts I've created the following triggers on inst3:
>>
>> ```
>> CREATE OR REPLACE FUNCTION test_before_insert()
>> RETURNS trigger AS $$
>> BEGIN
>>
>> RAISE NOTICE 'test_before_insert trigger executed';
>>
>> IF EXISTS (SELECT 1 FROM test where k = new.k) THEN
>>RAISE NOTICE 'test_before_insert trigger - merging data';
>>UPDATE test SET v = v || ';' || new.v WHERE k = new.k;
>>RETURN NULL;
>> END IF;
>>
>> RETURN NEW;
>>
>> END
>> $$ LANGUAGE plpgsql;
>>
>>
>> CREATE OR REPLACE FUNCTION test_before_update()
>> RETURNS trigger AS $$
>> BEGIN
>>
>> RAISE NOTICE 'test_before_update trigger executed';
>>
>> IF EXISTS (SELECT 1 FROM test where k = new.k) THEN
>>RAISE NOTICE 'test_before_update trigger - merging data';
>>UPDATE test SET v = v || ';' || new.v WHERE k = new.k;
>>DELETE FROM test where k = old.k;
>>RETURN NULL;
>> END IF;
>>
>> RETURN NEW;
>>
>> END
>> $$ LANGUAGE plpgsql;
>>
>> create trigger test_before_insert_trigger
>> before insert on test
>> for each row execute procedure test_before_insert();
>>
>> create trigger test_before_update_trigger
>> before update of k on test
>> for each row execute procedure test_before_update();
>>
>> ALTER TABLE test ENABLE REPLICA TRIGGER test_before_insert_trigger;
>> ALTER TABLE test ENABLE REPLICA TRIGGER test_before_update_trigger;
>> ```
>>
>> The INSERT trigger works just as expected, however the UPDATE trigger
>> doesn't. On inst1:
>>
>> ```
>> insert into test values ('k1', 'v1');
>> ```
>>
>> In inst2:
>>
>> ```
>> insert into test values ('k4', 'v4');
>> update test set k = 'k1' where k = 'k4';
>> ```
>>
>> Now on inst3:
>>
>> ```
>> select * from test;
>> ```
>>
>> Expected result
>> ---
>>
>> Rows are merged to:
>>
>> ```
>>  k  |   v
>> +---
>>  k1 | v1;v4
>> ```
>>
>> This is what would happen if all insert/update queries would have been
>> executed on one instance.
>>
>> Actual result
>> -
>>
>> Replication fails, log contains:
>>
>> ```
>> [3227] ERROR:  duplicate key value violates unique constraint "test_pkey"
>> [3227] DETAIL:  Key (k)=(k1) already exists.
>> [3176] LOG:  worker process: logical replication worker for subscription 
>> 16402 (PID 3227) exited with exit code 1
>> ```
>>
>> What do you think?
>>
>
> I think the cause of this issue is that the apply worker doesn't set
> updatedCols of RangeTblEntry when applying updates. So TriggerEnabled
> always ends up with false. I'll make a patch and submit.
>

Attached patch store the updated columns bitmap set to RangeTblEntry.
In my environment this bug seems to be fixed by the patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


set_updated_columns.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fix a typo in execReplication.c

2017-10-09 Thread Masahiko Sawada

Hi,

Attached a patch for $subject. ISTM these are mistakes of copy-and-paste.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_comment_in_execReplication_c.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [BUGS] 10.0: Logical replication doesn't execute BEFORE UPDATE OF trigger

2017-10-09 Thread Masahiko Sawada

On Mon, Oct 9, 2017 at 11:13 PM, Aleksander Alekseev
<a.aleks...@postgrespro.ru> wrote:
> Hi hackers,
>
> I've found something that looks like a bug.
>
> Steps to reproduce
> --
>
> There are 3 instances of PostgreSQL 10.0 - inst1, inst2 and inst3. There
> is a table `test` on every instance:
>
> ```
> CREATE TABLE test(k TEXT PRIMARY KEY, v TEXT);
> ```
>
> Both inst1 and inst2 have `allpub` publication:
>
> ```
> CREATE PUBLICATION allpub FOR ALL TABLES;
> ```
>
> ... and inst3 is subscribed for both publications:
>
> ```
> CREATE SUBSCRIPTION allsub1
>   CONNECTION 'host=10.128.0.16 user=eax dbname=eax'
>   PUBLICATION allpub;
>
> CREATE SUBSCRIPTION allsub2
>   CONNECTION 'host=10.128.0.26 user=eax dbname=eax'
>   PUBLICATION allpub;
> ```
>
> So basically it's two masters, one replica configuration. To resolve
> insert/update conflicts I've created the following triggers on inst3:
>
> ```
> CREATE OR REPLACE FUNCTION test_before_insert()
> RETURNS trigger AS $$
> BEGIN
>
> RAISE NOTICE 'test_before_insert trigger executed';
>
> IF EXISTS (SELECT 1 FROM test where k = new.k) THEN
>RAISE NOTICE 'test_before_insert trigger - merging data';
>UPDATE test SET v = v || ';' || new.v WHERE k = new.k;
>RETURN NULL;
> END IF;
>
> RETURN NEW;
>
> END
> $$ LANGUAGE plpgsql;
>
>
> CREATE OR REPLACE FUNCTION test_before_update()
> RETURNS trigger AS $$
> BEGIN
>
> RAISE NOTICE 'test_before_update trigger executed';
>
> IF EXISTS (SELECT 1 FROM test where k = new.k) THEN
>RAISE NOTICE 'test_before_update trigger - merging data';
>UPDATE test SET v = v || ';' || new.v WHERE k = new.k;
>DELETE FROM test where k = old.k;
>RETURN NULL;
> END IF;
>
> RETURN NEW;
>
> END
> $$ LANGUAGE plpgsql;
>
> create trigger test_before_insert_trigger
> before insert on test
> for each row execute procedure test_before_insert();
>
> create trigger test_before_update_trigger
> before update of k on test
> for each row execute procedure test_before_update();
>
> ALTER TABLE test ENABLE REPLICA TRIGGER test_before_insert_trigger;
> ALTER TABLE test ENABLE REPLICA TRIGGER test_before_update_trigger;
> ```
>
> The INSERT trigger works just as expected, however the UPDATE trigger
> doesn't. On inst1:
>
> ```
> insert into test values ('k1', 'v1');
> ```
>
> In inst2:
>
> ```
> insert into test values ('k4', 'v4');
> update test set k = 'k1' where k = 'k4';
> ```
>
> Now on inst3:
>
> ```
> select * from test;
> ```
>
> Expected result
> ---
>
> Rows are merged to:
>
> ```
>  k  |   v
> +---
>  k1 | v1;v4
> ```
>
> This is what would happen if all insert/update queries would have been
> executed on one instance.
>
> Actual result
> -
>
> Replication fails, log contains:
>
> ```
> [3227] ERROR:  duplicate key value violates unique constraint "test_pkey"
> [3227] DETAIL:  Key (k)=(k1) already exists.
> [3176] LOG:  worker process: logical replication worker for subscription 
> 16402 (PID 3227) exited with exit code 1
> ```
>
> What do you think?
>

I think the cause of this issue is that the apply worker doesn't set
updatedCols of RangeTblEntry when applying updates. So TriggerEnabled
always ends up with false. I'll make a patch and submit.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Block level parallel vacuum WIP

2017-10-04 Thread Masahiko Sawada

On Tue, Sep 19, 2017 at 4:31 PM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> On Tue, Sep 19, 2017 at 3:33 PM, Thomas Munro
> <thomas.mu...@enterprisedb.com> wrote:
>> On Fri, Sep 8, 2017 at 10:37 PM, Masahiko Sawada <sawada.m...@gmail.com> 
>> wrote:
>>> Since v4 patch conflicts with current HEAD I attached the latest version 
>>> patch.
>>
>> Hi Sawada-san,
>>
>> Here is an interesting failure with this patch:
>>
>> test rowsecurity  ... FAILED
>> test rules... FAILED
>>
>> Down at the bottom of the build log in the regression diffs file you can see:
>>
>> ! ERROR: cache lookup failed for relation 32893
>>
>> https://travis-ci.org/postgresql-cfbot/postgresql/builds/277165907
>>
>
> Thank you for letting me know.
>
> Hmm, it's an interesting failure. I'll investigate it and post the new patch.
>

Since the patch conflicts with current HEAD, I've rebased the patch
and fixed a bug. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


parallel_vacuum_v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with logical replication

2017-10-03 Thread Masahiko Sawada

On Wed, Oct 4, 2017 at 8:35 AM, Petr Jelinek
<petr.jeli...@2ndquadrant.com> wrote:
> On 02/10/17 18:59, Petr Jelinek wrote:
>>>
>>> Now fix the trigger function:
>>> CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$
>>> BEGIN
>>>   RETURN NEW;
>>> END $$ LANGUAGE plpgsql;
>>>
>>> And manually perform at master two updates inside one transaction:
>>>
>>> postgres=# begin;
>>> BEGIN
>>> postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26;
>>> UPDATE 1
>>> postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26;
>>> UPDATE 1
>>> postgres=# commit;
>>> 
>>>
>>> and in replica log we see:
>>> 2017-10-02 18:40:26.094 MSK [2954] LOG:  logical replication apply
>>> worker for subscription "sub" has started
>>> 2017-10-02 18:40:26.101 MSK [2954] ERROR:  attempted to lock invisible
>>> tuple
>>> 2017-10-02 18:40:26.102 MSK [2882] LOG:  worker process: logical
>>> replication worker for subscription 16403 (PID 2954) exited with exit
>>> code 1
>>>
>>> Error happens in trigger.c:
>>>
>>> #3  0x0069bddb in GetTupleForTrigger (estate=0x2e36b50,
>>> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac,
>>> lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at
>>> trigger.c:3103
>>> #4  0x0069b259 in ExecBRUpdateTriggers (estate=0x2e36b50,
>>> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tupleid=0x2dd08ac,
>>> fdw_trigtuple=0x0, slot=0x2dd0240) at trigger.c:2748
>>> #5  0x006d2395 in ExecSimpleRelationUpdate (estate=0x2e36b50,
>>> epqstate=0x7ffc4420eda0, searchslot=0x2dd0358, slot=0x2dd0240)
>>> at execReplication.c:461
>>> #6  0x00820894 in apply_handle_update (s=0x7ffc442163b0) at
>>> worker.c:736
>>
>> We have locked the same tuple in RelationFindReplTupleByIndex() just
>> before this gets called and didn't get the same error. I guess we do
>> something wrong with snapshots. Will need to investigate more.
>>
>
> Okay, so it's not snapshot. It's the fact that we don't set the
> es_output_cid in replication worker which GetTupleForTrigger is using
> when locking the tuple. Attached one-liner fixes it.
>

Thank you for the patch.
This bug can happen even without the trigger and I confirmed tgat the
bug is fixed by the patch. I think the patch fixed it properly.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] cache lookup errors for missing replication origins

2017-10-03 Thread Masahiko Sawada

On Wed, Sep 6, 2017 at 3:51 PM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> On Tue, Sep 5, 2017 at 12:59 PM, Michael Paquier
> <michael.paqu...@gmail.com> wrote:
>> ERROR:  42704: replication slot "%s" does not exist
>
> s/slot/origin/
>
>> As far as I can see, replorigin_by_oid makes no use of its missing_ok
>> = false in the backend code, so letting it untouched would have no
>> impact. replorigin_by_name with missing_ok = false is only used with
>> SQL-callable functions, so it could be changed without any impact
>> elsewhere (without considering externally-maintained replication
>> modules).
>
> As long as I don't forget, attached is a patch added as well to the
> next CF. replorigin_by_oid should have the same switch from elog to
> ereport in my opinion. Additional regression tests are included.

The patch passes the regression test and I found no problems in this
patch. I've marked it as Ready for Committer.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2017-10-02 Thread Masahiko Sawada

On Sat, Sep 30, 2017 at 12:42 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Wed, Sep 27, 2017 at 11:15 PM, Masahiko Sawada <sawada.m...@gmail.com> 
> wrote:
>> I think that making a resolver process have connection caches to each
>> foreign server for a while can reduce the overhead of connection to
>> foreign servers. These connections will be invalidated by DDLs. Also,
>> most of the time we spend to commit a distributed transaction is the
>> interaction between the coordinator and foreign servers using
>> two-phase commit protocal. So I guess the time in signalling to a
>> resolver process would not be a big overhead.
>
> I agree.  Also, in the future, we might try to allow connections to be
> shared across backends.  I did some research on this a number of years
> ago and found that every operating system I investigated had some way
> of passing a file descriptor from one process to another -- so a
> shared connection cache might be possible.

It sounds good idea.

> Also, we might port the whole backend to use threads, and then this
> problem goes way.  But I don't have time to write that patch this
> week.  :-)
>
> It's possible that we might find that neither of the above approaches
> are practical and that the performance benefits of resolving the
> transaction from the original connection are large enough that we want
> to try to make it work anyhow.  However, I think we can postpone that
> work to a future time.  Any general solution to this problem at least
> needs to be ABLE to resolve transactions at a later time from a
> different session, so let's get that working first, and then see what
> else we want to do.
>

I understood and agreed. I'll post the first version patch of new
design to next CF.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2017-09-27 Thread Masahiko Sawada

On Wed, Sep 27, 2017 at 4:05 PM, Ashutosh Bapat
<ashutosh.ba...@enterprisedb.com> wrote:
> On Wed, Sep 27, 2017 at 12:11 PM, Masahiko Sawada <sawada.m...@gmail.com> 
> wrote:
>> On Tue, Sep 26, 2017 at 9:50 PM, Robert Haas <robertmh...@gmail.com> wrote:
>>> On Tue, Sep 26, 2017 at 5:06 AM, Masahiko Sawada <sawada.m...@gmail.com> 
>>> wrote:
>>>> Based on the review comment from Robert, I'm planning to do the big
>>>> change to the architecture of this patch so that a backend process
>>>> work together with a dedicated background worker that is responsible
>>>> for resolving the foreign transactions. For the usage of this feature,
>>>> it will be almost the same as what this patch has been doing except
>>>> for adding a new GUC paramter that controls the number of resovler
>>>> process launch. That is, we can have multiple resolver process to keep
>>>> latency down.
>>>
>>> Multiple resolver processes is useful but gets a bit complicated.  For
>>> example, if process 1 has a connection open to foreign server A and
>>> process 2 does not, and a request arrives that needs to be handled on
>>> foreign server A, what happens?  If process 1 is already busy doing
>>> something else, probably we want process 2 to try to open a new
>>> connection to foreign server A and handle the request.  But if process
>>> 1 and 2 are both idle, ideally we'd like 1 to get that request rather
>>> than 2.  That seems a bit difficult to get working though.  Maybe we
>>> should just ignore such considerations in the first version.
>>
>> I understood. I keep it simple in the first version.
>
> While a resolver process is useful for resolving transaction later, it
> seems performance effective to try to resolve the prepared foreign
> transaction, in post-commit phase, in the same backend which prepared
> those for two reasons 1. the backend already has a connection to that
> foreign server 2. it has just run some commands to completion on that
> foreign server, so it's highly likely that a COMMIT PREPARED would
> succeed too. If we let a resolver process do that, we will spend time
> in 1. signalling resolver process 2. setting up a connection to the
> foreign server and 3. by the time resolver process tries to resolve
> the prepared transaction the foreign server may become unavailable,
> thus delaying the resolution.

I think that making a resolver process have connection caches to each
foreign server for a while can reduce the overhead of connection to
foreign servers. These connections will be invalidated by DDLs. Also,
most of the time we spend to commit a distributed transaction is the
interaction between the coordinator and foreign servers using
two-phase commit protocal. So I guess the time in signalling to a
resolver process would not be a big overhead.

> Said that, I agree that post-commit phase doesn't have a transaction
> of itself, and thus any catalog lookup, error reporting is not
> possible. We will need some different approach here, which may not be
> straight forward. So, we may need to delay this optimization for v2. I
> think we have discussed this before, but I don't find a mail off-hand.
>
>>
>>>> * Resovler processes
>>>> 1. Fetch PGPROC entry from the shmem queue and get its XID (say, XID-a).
>>>> 2. Get the fdw_xact_state entry from shmem hash by XID-a.
>>>> 3. Iterate fdw_xact entries using the index, and resolve the foreign
>>>> transactions.
>>>> 3-a. If even one foreign transaction failed to resolve, raise an error.
>>>> 4. Change the waiting backend state to FDWXACT_COMPLETED and release it.
>>>
>>> Comments:
>>>
>>> - Note that any error we raise here won't reach the user; this is a
>>> background process.  We don't want to get into a loop where we just
>>> error out repeatedly forever -- at least not if there's any other
>>> reasonable choice.
>>
>> Thank you for the comments.
>>
>> Agreed.
>
> We should probably log an error message in the server log, so that
> DBAs are aware of such a failure. Is that something you are
> considering to do?

Yes, a resolver process logs an error message in that case.

>
>>
>>> - I suggest that we ought to track the status for each XID separately
>>> on each server rather than just track the XID status overall.  That
>>> way, if transaction resolution fails on one server, we don't keep
>>> trying to reconnect to the others.
>>
>> Agreed. In the current patch we manage fdw_xact entries that track the
>> status for each X

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2017-09-27 Thread Masahiko Sawada

On Tue, Sep 26, 2017 at 9:50 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Tue, Sep 26, 2017 at 5:06 AM, Masahiko Sawada <sawada.m...@gmail.com> 
> wrote:
>> Based on the review comment from Robert, I'm planning to do the big
>> change to the architecture of this patch so that a backend process
>> work together with a dedicated background worker that is responsible
>> for resolving the foreign transactions. For the usage of this feature,
>> it will be almost the same as what this patch has been doing except
>> for adding a new GUC paramter that controls the number of resovler
>> process launch. That is, we can have multiple resolver process to keep
>> latency down.
>
> Multiple resolver processes is useful but gets a bit complicated.  For
> example, if process 1 has a connection open to foreign server A and
> process 2 does not, and a request arrives that needs to be handled on
> foreign server A, what happens?  If process 1 is already busy doing
> something else, probably we want process 2 to try to open a new
> connection to foreign server A and handle the request.  But if process
> 1 and 2 are both idle, ideally we'd like 1 to get that request rather
> than 2.  That seems a bit difficult to get working though.  Maybe we
> should just ignore such considerations in the first version.

I understood. I keep it simple in the first version.

>> * Resovler processes
>> 1. Fetch PGPROC entry from the shmem queue and get its XID (say, XID-a).
>> 2. Get the fdw_xact_state entry from shmem hash by XID-a.
>> 3. Iterate fdw_xact entries using the index, and resolve the foreign
>> transactions.
>> 3-a. If even one foreign transaction failed to resolve, raise an error.
>> 4. Change the waiting backend state to FDWXACT_COMPLETED and release it.
>
> Comments:
>
> - Note that any error we raise here won't reach the user; this is a
> background process.  We don't want to get into a loop where we just
> error out repeatedly forever -- at least not if there's any other
> reasonable choice.

Thank you for the comments.

Agreed.

> - I suggest that we ought to track the status for each XID separately
> on each server rather than just track the XID status overall.  That
> way, if transaction resolution fails on one server, we don't keep
> trying to reconnect to the others.

Agreed. In the current patch we manage fdw_xact entries that track the
status for each XID separately on each server. I'm going to use the
same mechanism. The resolver process get an target XID from shmem
queue and get the all fdw_xact entries associated with the XID from
the fdw_xact array in shmem. But since the scanning the whole fdw_xact
entries could be slow because the number of entry of fdw_xact array
could be a large number (e.g, max_connections * # of foreign servers),
 I'm considering to have a linked list of the all fdw_xact entries
associated with same XID, and to have a shmem hash pointing to the
first fdw_xact entry of the linked lists for each XID. That way, we
can find the target fdw_xact entries from the array in O(1).

> - If we go to resolve a remote transaction and find that no such
> remote transaction exists, what should we do?  I'm inclined to think
> that we should regard that as if we had succeeded in resolving the
> transaction.  Certainly, if we've retried the server repeatedly, it
> might be that we previously succeeded in resolving the transaction but
> then the network connection was broken before we got the success
> message back from the remote server.  But even if that's not the
> scenario, I think we should assume that the DBA or some other system
> resolved it and therefore we don't need to do anything further.  If we
> assume anything else, then we just go into an infinite error loop,
> which isn't useful behavior.  We could log a message, though (for
> example, LOG: unable to resolve foreign transaction ... because it
> does not exist).

Agreed.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication and statistics

2017-09-26 Thread Masahiko Sawada

On Tue, Sep 26, 2017 at 6:57 PM, Pavel Stehule <pavel.steh...@gmail.com> wrote:
>
>
> 2017-09-26 11:56 GMT+02:00 Pavel Stehule <pavel.steh...@gmail.com>:
>>
>>
>>
>> 2017-09-26 11:51 GMT+02:00 Masahiko Sawada <sawada.m...@gmail.com>:
>>>
>>> On Tue, Sep 26, 2017 at 2:50 AM, Pavel Stehule <pavel.steh...@gmail.com>
>>> wrote:
>>> >
>>> >
>>> > 2017-09-25 19:23 GMT+02:00 Petr Jelinek <petr.jeli...@2ndquadrant.com>:
>>> >>
>>> >> On 25/09/17 19:19, Tom Lane wrote:
>>> >> > Pavel Stehule <pavel.steh...@gmail.com> writes:
>>> >> >> I had two instances on one server with different port. I am sure,
>>> >> >> so
>>> >> >> replication was functional. Only one issue is statistics
>>> >> >
>>> >> >> Master:
>>> >> >
>>> >> >> CREATE TABLE foo(id int primary key, a int);
>>> >> >> CREATE PUBLICATION test_pub FOR TABLE foo;
>>> >> >> INSERT INTO foo VALUES(1, 200);
>>> >> >
>>> >> >> slave
>>> >> >
>>> >> >> CREATE TABLE foo(id int primary key, a int);
>>> >> >> CREATE SUBSCRIPTION test_sub CONNECTION 'port=5432' PUBLICATION
>>> >> >> test_pub;
>>> >> >
>>> >> >> That was all
>>> >> >
>>> >> > In this example, nothing's been done yet by the actual replication
>>> >> > apply process, only by the initial table sync.  Maybe that accounts
>>> >> > for your not seeing stats?
>>> >> >
>>> >>
>>> >> The main replication worker should still be running though. The output
>>> >> of pg_stat_replication should only be empty if there is nothing
>>> >> running.
>>> >>
>>> >
>>> > I did some inserts, updates, ..
>>> >
>>> > I can recheck it - it was done on 10 RC
>>>
>>> I guess CREATE SUBSCRIPTION failed for whatever reason (e.g, wal_level
>>> < logical on the master). Didn't you get errors from CREATE
>>> SUBSCRIPTION?
>>
>>
>> sorry I had wal_level = logical
>
>
> but if I remember - maybe I had this level only on master
>>
>>

Hmm, has a logical replication slot created on the master?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication and statistics

2017-09-26 Thread Masahiko Sawada

On Tue, Sep 26, 2017 at 2:50 AM, Pavel Stehule <pavel.steh...@gmail.com> wrote:
>
>
> 2017-09-25 19:23 GMT+02:00 Petr Jelinek <petr.jeli...@2ndquadrant.com>:
>>
>> On 25/09/17 19:19, Tom Lane wrote:
>> > Pavel Stehule <pavel.steh...@gmail.com> writes:
>> >> I had two instances on one server with different port. I am sure, so
>> >> replication was functional. Only one issue is statistics
>> >
>> >> Master:
>> >
>> >> CREATE TABLE foo(id int primary key, a int);
>> >> CREATE PUBLICATION test_pub FOR TABLE foo;
>> >> INSERT INTO foo VALUES(1, 200);
>> >
>> >> slave
>> >
>> >> CREATE TABLE foo(id int primary key, a int);
>> >> CREATE SUBSCRIPTION test_sub CONNECTION 'port=5432' PUBLICATION
>> >> test_pub;
>> >
>> >> That was all
>> >
>> > In this example, nothing's been done yet by the actual replication
>> > apply process, only by the initial table sync.  Maybe that accounts
>> > for your not seeing stats?
>> >
>>
>> The main replication worker should still be running though. The output
>> of pg_stat_replication should only be empty if there is nothing running.
>>
>
> I did some inserts, updates, ..
>
> I can recheck it - it was done on 10 RC

I guess CREATE SUBSCRIPTION failed for whatever reason (e.g, wal_level
< logical on the master). Didn't you get errors from CREATE
SUBSCRIPTION?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2017-09-26 Thread Masahiko Sawada

 the shmem queue and get its XID (say, XID-a).
2. Get the fdw_xact_state entry from shmem hash by XID-a.
3. Iterate fdw_xact entries using the index, and resolve the foreign
transactions.
3-a. If even one foreign transaction failed to resolve, raise an error.
4. Change the waiting backend state to FDWXACT_COMPLETED and release it.

Also, the resolver process scans over the array of fdw_xact entry
periodically, and tries to resolve in-doubt transactions.
This patch still has the concern in the design and I'm planing to
update the patch for the next commit fest. So I'll mark this as
"Waiting on Author".

Feedback and suggestion are very welcome.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication status in logical replication

2017-09-25 Thread Masahiko Sawada

On Tue, Sep 26, 2017 at 10:36 AM, Vaishnavi Prabakaran
<vaishnaviprabaka...@gmail.com> wrote:
> Hi,
>
> On Wed, Sep 13, 2017 at 9:59 AM, Daniel Gustafsson <dan...@yesql.se> wrote:
>>
>>
>> I’m not entirely sure why this was flagged as "Waiting for Author” by the
>> automatic run, the patch applies for me and builds so resetting back to
>> “Needs
>> review”.
>>
>
> This patch applies and build cleanly and I did a testing with one publisher
> and one subscriber, and confirm that the replication state after restarting
> the server now is "streaming" and not "Catchup".
>
> And, I don't find any issues with code and patch to me is ready for
> committer, marked the same in cf entry.
>

Thank you for the reviewing the patch!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

2017-09-25 Thread Masahiko Sawada

On Mon, Sep 25, 2017 at 8:22 PM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> On Mon, Sep 25, 2017 at 7:57 PM, Masahiko Sawada <sawada.m...@gmail.com> 
> wrote:
>> FWIW, the same thing can happen when specifying an invalid replication
>> origin name to pg_replication_origin_advance() and
>> pg_replication_origin_progress(). These probably should fixed as well.
>
> I have spawned a thread about that stuff three weeks ago:
> https://www.postgresql.org/message-id/CAB7nPqQtPg%2BLKKtzdKN26judHcvPZ0s1gNigzOT4j8CYuuuBYg%40mail.gmail.com
> With a patch registered where it should be:
> https://commitfest.postgresql.org/15/1290/
> So I would suggest to keep the discussions about this problem in its own 
> thread.

Thank you for the information! I'll look at the thread and review the patch.

> (Bonus: ALTER TABLE queries in parallel with multiple sessions and
> enjoy the result)

Will try it :-)

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication and statistics

2017-09-25 Thread Masahiko Sawada

On Mon, Sep 25, 2017 at 12:58 AM, Pavel Stehule <pavel.steh...@gmail.com> wrote:
> Hi
>
> I did trivial example of logical replication (one table, one publication,
> one subscription)
>
> I am little bit surprised so after some work - the replication is working,
> the statistics are empty
>
> #master
> postgres=# select * from pg_stat_replication ;
> (0 rows)
>
> #slave
> postgres=# select * from pg_stat_subscription ;
> -[ RECORD 1 ]-+-
> subid | 16472
> subname   | test_sub
> pid   |
> relid |
> received_lsn  |
> last_msg_send_time|
> last_msg_receipt_time |
> latest_end_lsn|
> latest_end_time   |
>
> Should be some enabled?
>

If the subscription is disabled, the statistics of subscription is
empty and no wal sender processes launch. The test_sub can start the
replication by ALTER SUBSCRIPTION test_sub ENABLE.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] comments improvements

2017-09-25 Thread Masahiko Sawada

On Mon, Sep 25, 2017 at 5:36 AM, Erik Rijkers <e...@xs4all.nl> wrote:
> comments improvements
>

For triggers.sql.20170924.diff file, you need to update
expected/triggers.out file as well.

-- 
Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

2017-09-25 Thread Masahiko Sawada

On Mon, Sep 25, 2017 at 12:10 PM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> On Sat, Sep 23, 2017 at 4:13 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> Somebody inserted this into vacuum.c's get_rel_oids():
>>
>> tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
>> if (!HeapTupleIsValid(tuple))
>> elog(ERROR, "cache lookup failed for relation %u", relid);
>>
>> apparently without having read the very verbose comment two lines above,
>> which points out that we're not taking any lock on the target relation.
>> So, if that relation is concurrently being dropped, you're likely to
>> get "cache lookup failed for relation " rather than anything more
>> user-friendly.
>
> This has been overlooked during the lookups of 3c3bb99, and by
> multiple people including me. elog() should never be things users can
> face, as well as cache lookups.

FWIW, the same thing can happen when specifying an invalid replication
origin name to pg_replication_origin_advance() and
pg_replication_origin_progress(). These probably should fixed as well.

>
>> A minimum-change fix would be to replace the elog() with an ereport
>> that produces the same "relation does not exist" error you'd have
>> gotten from RangeVarGetRelid, had the concurrent DROP TABLE committed
>> a few microseconds earlier.  But that feels like its's band-aiding
>> around the problem.
>
> Yeah, that's not right. This is a cache lookup error at the end.
>
>> What I'm wondering about is changing the RangeVarGetRelid call to take
>> ShareUpdateExclusiveLock rather than no lock.  That would protect the
>> syscache lookup, and it would also make the find_all_inheritors call
>> a lot more meaningful.
>>
>> If we're doing a VACUUM, the ShareUpdateExclusiveLock would be dropped
>> as soon as we close the caller's transaction, and then we'd acquire
>> the same or stronger lock inside vacuum_rel().  So that seems fine.
>> If we're doing an ANALYZE, then the lock would continue to be held
>> and analyze_rel would merely be acquiring it an extra time, so we'd
>> actually be removing a race-condition failure scenario for ANALYZE.
>> This would mean a few more cycles in lock management, but since this
>> only applies to a manual VACUUM or ANALYZE that specifies a table
>> name, I'm not too concerned about that.
>
> I think that I am +1 on that. Testing such a thing I am not seeing
> anything wrong either. The call to find_all_inheritors should also use
> ShareUpdateExclusiveLock, and the lock on top of RangeVarGetRelid()
> needs to be reworked.
>
> Attached is a proposal of patch.

FWIW, I agreed the approach of this patch, and found no problems in the patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-25 Thread Masahiko Sawada

On Fri, Sep 22, 2017 at 5:31 PM, Kyotaro HORIGUCHI
<horiguchi.kyot...@lab.ntt.co.jp> wrote:
> At Fri, 22 Sep 2017 17:21:04 +0900, Masahiko Sawada <sawada.m...@gmail.com> 
> wrote in <cad21aobn9ucgmduinx2ptu8upetohnr-a35abcqyznlfvwd...@mail.gmail.com>
>> On Fri, Sep 22, 2017 at 4:16 PM, Kyotaro HORIGUCHI
>> <horiguchi.kyot...@lab.ntt.co.jp> wrote:
>> > At Fri, 22 Sep 2017 15:00:20 +0900, Masahiko Sawada 
>> > <sawada.m...@gmail.com> wrote in 
>> > <cad21aod6zgb1w6ps1axj0ccab_chdyiitntedpmhkefgg13...@mail.gmail.com>
>> >> On Tue, Sep 19, 2017 at 3:31 PM, Kyotaro HORIGUCHI
>> >> <horiguchi.kyot...@lab.ntt.co.jp> wrote:
>> >> Could you elaborate about this? For example in btree index, the index
>> >> cleanup skips to scan on the index scan if index_bulk_delete has been
>> >> called during vacuuming because stats != NULL. So I think we don't
>> >> need such a flag.
>> >
>> > The flag works so that successive two index full scans don't
>> > happen in a vacuum round. If any rows are fully deleted, just
>> > following btvacuumcleanup does nothing.
>> >
>> > I think what you wanted to solve here was the problem that
>> > index_vacuum_cleanup runs a full scan even if it ends with no
>> > actual work, when manual or anti-wraparound vacuums.  (I'm
>> > getting a bit confused on this..) It is caused by using the
>> > pointer "stats" as the flag to instruct to do that. If the
>> > stats-as-a-flag worked as expected, the GUC doesn't seem to be
>> > required.
>>
>> Hmm, my proposal is like that if a table doesn't changed since the
>> previous vacuum much we skip the cleaning up index.
>>
>> If the table has at least one garbage we do the lazy_vacuum_index and
>> then IndexBulkDeleteResutl is stored, which causes to skip doing the
>> btvacuumcleanup. On the other hand, if the table doesn't have any
>> garbage but some new tuples inserted since the previous vacuum, we
>> don't do the lazy_vacuum_index but do the lazy_cleanup_index. In this
>> case, we always do the lazy_cleanup_index (i.g, we do the full scan)
>> even if only one tuple is inserted. That's why I proposed a new GUC
>> parameter which allows us to skip the lazy_cleanup_index in the case.
>
> I think the problem raised in this thread is that the last index
> scan may leave dangling pages.
>
>> > Addition to that, as Simon and Peter pointed out
>> > index_bulk_delete can leave not-fully-removed pages (so-called
>> > half-dead pages and pages that are recyclable but not registered
>> > in FSM, AFAICS) in some cases mainly by RecentGlobalXmin
>> > interlock. In this case, just inhibiting cleanup scan by a
>> > threshold lets such dangling pages persist in the index. (I
>> > conldn't make such a many dangling pages, though..)
>> >
>> > The first patch in the mail (*1) does that. It seems having some
>> > bugs, though..
>> >
>> >
>> > Since the dangling pages persist until autovacuum decided to scan
>> > the belonging table again, we should run a vacuum round (or
>> > index_vacuum_cleanup itself) even having no dead rows if we want
>> > to clean up such pages within a certain period. The second patch
>> > doesn that.
>> >
>>
>> IIUC half-dead pages are not relevant to this proposal. The proposal
>> has two problems;
>>
>> * By skipping index cleanup we could leave recyclable pages that are
>> not marked as a recyclable.
>
> Yes.
>
>> * we stash an XID when a btree page is deleted, which is used to
>> determine when it's finally safe to recycle the page
>
> Is it a "problem" of this proposal?
>

As Peter explained before[1], the problem is that there is an XID
stored in dead btree pages that is used in the subsequent
RecentGlobalXmin interlock that determines if recycling is safe.

[1] 
https://www.postgresql.org/message-id/CAH2-Wz%3D1%3Dt5fcGGfarQGcAWBqaCh%2BdLMjpYCYHpEyzK8Qg6OrQ%40mail.gmail.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DROP SUBSCRIPTION hangs if sub is disabled in the same transaction

2017-09-24 Thread Masahiko Sawada

On Sat, Sep 23, 2017 at 4:06 AM, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:
> On 9/21/17 04:43, Masahiko Sawada wrote:
>>> Once we got this patch, DROP SUBSCRIPTION is not transactional either
>>> if dropping a replication slot or if the subscription got disabled in
>>> a transaction block. But we disallow to do DROP SUBSCRIPTION in a
>>> transaction block only in the former case. In the latter case, we
>>> adopted such non-transactional behaviour. Since these behaviours would
>>> be complex for users I attached the documentation patch explaining it.
>>>
>> Hmm, isn't there necessary to care and mention about this kind of
>> inconsistent behavior in docs?
>
> I have added documentation that certain forms of CREATE/DROP
> SUBSCRIPTION cannot be run inside a transaction block, which we have
> done for other such commands.

Thank you!

> I don't think we need to go into further detail.  We don't guarantee
> continuous connections.  If a worker is stopped and restarted in the
> background as an implementation detail, then the user is not impacted.

Agreed.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-22 Thread Masahiko Sawada

On Fri, Sep 22, 2017 at 4:16 PM, Kyotaro HORIGUCHI
<horiguchi.kyot...@lab.ntt.co.jp> wrote:
> At Fri, 22 Sep 2017 15:00:20 +0900, Masahiko Sawada <sawada.m...@gmail.com> 
> wrote in <cad21aod6zgb1w6ps1axj0ccab_chdyiitntedpmhkefgg13...@mail.gmail.com>
>> On Tue, Sep 19, 2017 at 3:31 PM, Kyotaro HORIGUCHI
>> <horiguchi.kyot...@lab.ntt.co.jp> wrote:
>> > I was just looking the thread since it is found left alone for a
>> > long time in the CF app.
>> >
>> > At Mon, 18 Sep 2017 16:35:58 -0700, Peter Geoghegan <p...@bowt.ie> wrote 
>> > in <CAH2-WzkhJhAXD+6DdBp7D8WYLfJ3D0m=AZbGsiw=usujtmu...@mail.gmail.com>
>> >> On Wed, Apr 5, 2017 at 3:50 PM, Andres Freund <and...@anarazel.de> wrote:
>> >> > Hi,
>> >> >
>> >> > On 2017-04-01 03:05:07 +0900, Masahiko Sawada wrote:
>> >> >> On Fri, Mar 31, 2017 at 11:44 PM, Robert Haas <robertmh...@gmail.com> 
>> >> >> wrote:
>> >> >> [ lots of valuable discussion ]
>> >> >
>> >> > I think this patch clearly still is in the design stage, and has
>> >> > received plenty feedback this CF.  I'll therefore move this to the next
>> >> > commitfest.
>> >>
>> >> Does anyone have ideas on a way forward here? I don't, but then I
>> >> haven't thought about it in detail in several months.
>> >
>> > Is the additional storage in metapage to store the current status
>> > of vaccum is still unacceptable even if it can avoid useless
>> > full-page scan on indexes especially for stable tables?
>> >
>> > Or, how about additional 1 bit in pg_stat_*_index to indicate
>> > that the index *don't* require vacuum cleanup stage. (default
>> > value causes cleanup)
>>
>> You meant that "the next cycle" is the lazy_cleanup_index() function
>> called by lazy_scan_heap()?
>
> Both finally call btvacuumscan under a certain condition, but
> what I meant by "the next cycle" is the lazy_cleanup_index call
> in the next round of vacuum since abstract layer (index am) isn't
> conscious of the detail of btree.
>
>> > index_bulk_delete (or ambulkdelete) returns the flag in
>> > IndexBulkDeleteResult then lazy_scan_heap stores the flag in
>> > stats and in the next cycle it is looked up to decide the
>> > necessity of index cleanup.
>> >
>>
>> Could you elaborate about this? For example in btree index, the index
>> cleanup skips to scan on the index scan if index_bulk_delete has been
>> called during vacuuming because stats != NULL. So I think we don't
>> need such a flag.
>
> The flag works so that successive two index full scans don't
> happen in a vacuum round. If any rows are fully deleted, just
> following btvacuumcleanup does nothing.
>
> I think what you wanted to solve here was the problem that
> index_vacuum_cleanup runs a full scan even if it ends with no
> actual work, when manual or anti-wraparound vacuums.  (I'm
> getting a bit confused on this..) It is caused by using the
> pointer "stats" as the flag to instruct to do that. If the
> stats-as-a-flag worked as expected, the GUC doesn't seem to be
> required.

Hmm, my proposal is like that if a table doesn't changed since the
previous vacuum much we skip the cleaning up index.

If the table has at least one garbage we do the lazy_vacuum_index and
then IndexBulkDeleteResutl is stored, which causes to skip doing the
btvacuumcleanup. On the other hand, if the table doesn't have any
garbage but some new tuples inserted since the previous vacuum, we
don't do the lazy_vacuum_index but do the lazy_cleanup_index. In this
case, we always do the lazy_cleanup_index (i.g, we do the full scan)
even if only one tuple is inserted. That's why I proposed a new GUC
parameter which allows us to skip the lazy_cleanup_index in the case.

>
> Addition to that, as Simon and Peter pointed out
> index_bulk_delete can leave not-fully-removed pages (so-called
> half-dead pages and pages that are recyclable but not registered
> in FSM, AFAICS) in some cases mainly by RecentGlobalXmin
> interlock. In this case, just inhibiting cleanup scan by a
> threshold lets such dangling pages persist in the index. (I
> conldn't make such a many dangling pages, though..)
>
> The first patch in the mail (*1) does that. It seems having some
> bugs, though..
>
>
> Since the dangling pages persist until autovacuum decided to scan
> the belonging table again, we should run a vacuum round (or
> index_vacuum_cleanup itself) even having no dead rows if we want
> to clean up such pages within a certain period. The second patch
> doesn that.
>

IIUC half-dead pages are not relevant to this proposal. The proposal
has two problems;
* By skipping index cleanup we could leave recyclable pages that are
not marked as a recyclable.
* we stash an XID when a btree page is deleted, which is used to
determine when it's finally safe to recycle the page

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

2017-09-22 Thread Masahiko Sawada

On Fri, Sep 22, 2017 at 3:46 PM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> On Fri, Sep 22, 2017 at 3:24 PM, Masahiko Sawada <sawada.m...@gmail.com> 
> wrote:
>> I have a question. Since WALInsertLockRelease seems not to call
>> LWLockWaitForVar I thought you wanted to mean LWLockReleaseClearVar
>> instead, is that right?
>
> This looks like a copy-pasto from my side. Thanks for spotting it.

Okay, attached the latest version patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_do_pg_abort_backup_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

2017-09-22 Thread Masahiko Sawada

On Fri, Sep 22, 2017 at 3:00 PM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> On Fri, Sep 22, 2017 at 11:34 AM, Masahiko Sawada <sawada.m...@gmail.com> 
> wrote:
>> You're right. I updated the patch so that it exits do_pg_abort_backup
>> if the state is NONE and setting the state to NONE in
>> do_pg_stop_backup before releasing the WAL insert lock.
>
> -/* Clean up session-level lock */
> +/*
> + * Clean up session-level lock. To avoid interrupting before changing
> + * the backup state by LWLockWaitForVar we change it while holding the
> + * WAL insert lock.
> + */
>  sessionBackupState = SESSION_BACKUP_NONE;
> You could just mention directly CHECK_FOR_INTERRUPTS here.

Thank you for the reviewing.
I have a question. Since WALInsertLockRelease seems not to call
LWLockWaitForVar I thought you wanted to mean LWLockReleaseClearVar
instead, is that right?

> +/* Quick exit if we have done the backup */
> +if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE)
> +return;
>
> The patch contents look fine to me. I have also tested things in
> depths but could not find any problems. I also looked again at the
> backup start code, trying to see any flows between the moment the
> session backup lock is changed and the moment the callback to do
> backup cleanup is registered but I have not found any issues. I am
> marking this as ready for committer.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-22 Thread Masahiko Sawada

On Tue, Sep 19, 2017 at 3:31 PM, Kyotaro HORIGUCHI
<horiguchi.kyot...@lab.ntt.co.jp> wrote:
> I was just looking the thread since it is found left alone for a
> long time in the CF app.
>
> At Mon, 18 Sep 2017 16:35:58 -0700, Peter Geoghegan <p...@bowt.ie> wrote in 
> <CAH2-WzkhJhAXD+6DdBp7D8WYLfJ3D0m=AZbGsiw=usujtmu...@mail.gmail.com>
>> On Wed, Apr 5, 2017 at 3:50 PM, Andres Freund <and...@anarazel.de> wrote:
>> > Hi,
>> >
>> > On 2017-04-01 03:05:07 +0900, Masahiko Sawada wrote:
>> >> On Fri, Mar 31, 2017 at 11:44 PM, Robert Haas <robertmh...@gmail.com> 
>> >> wrote:
>> >> [ lots of valuable discussion ]
>> >
>> > I think this patch clearly still is in the design stage, and has
>> > received plenty feedback this CF.  I'll therefore move this to the next
>> > commitfest.
>>
>> Does anyone have ideas on a way forward here? I don't, but then I
>> haven't thought about it in detail in several months.
>
> Is the additional storage in metapage to store the current status
> of vaccum is still unacceptable even if it can avoid useless
> full-page scan on indexes especially for stable tables?
>
> Or, how about additional 1 bit in pg_stat_*_index to indicate
> that the index *don't* require vacuum cleanup stage. (default
> value causes cleanup)

You meant that "the next cycle" is the lazy_cleanup_index() function
called by lazy_scan_heap()?

>
> index_bulk_delete (or ambulkdelete) returns the flag in
> IndexBulkDeleteResult then lazy_scan_heap stores the flag in
> stats and in the next cycle it is looked up to decide the
> necessity of index cleanup.
>

Could you elaborate about this? For example in btree index, the index
cleanup skips to scan on the index scan if index_bulk_delete has been
called during vacuuming because stats != NULL. So I think we don't
need such a flag.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

2017-09-21 Thread Masahiko Sawada

On Thu, Sep 21, 2017 at 5:56 PM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
>
> Yep, but the deficiency is caused by the use before_shmem_exit() in
> the SQL-level functions present in 9.6~ which make the cleanup happen
> potentially twice. If you look at the code of basebackup.c, you would
> notice that the cleanups of the counters only happen within the
> PG_ENSURE_ERROR_CLEANUP() blocks, so a backpatch to 9.6 should be
> enough.

Thank you for explaining. I understood and agreed..

On Fri, Sep 22, 2017 at 8:02 AM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> On Thu, Sep 21, 2017 at 5:56 PM, Michael Paquier
> <michael.paqu...@gmail.com> wrote:
>> On Thu, Sep 21, 2017 at 4:40 PM, Masahiko Sawada <sawada.m...@gmail.com> 
>> wrote:
>>> On Thu, Sep 21, 2017 at 2:25 PM, Michael Paquier
>>> <michael.paqu...@gmail.com> wrote:
>>>> +-  Assert(XLogCtl->Insert.nonExclusiveBackups >= 0);
>>>> +   if (XLogCtl->Insert.nonExclusiveBackups > 0)
>>>> +   XLogCtl->Insert.nonExclusiveBackups--;
>>>> Hm, no, I don't agree. I think that instead you should just leave
>>>> do_pg_abort_backup() immediately if sessionBackupState is set to
>>>> SESSION_BACKUP_NONE. This variable is the link between the global
>>>> counters and the session stopping the backup so I don't think that we
>>>> should touch this assertion of this counter. I think that this method
>>>> would be safe as well for backup start as pg_start_backup_callback
>>>> takes care of any cleanup. Also because the counters are incremented
>>>> before entering in the PG_ENSURE_ERROR_CLEANUP block, and
>>>> sessionBackupState is updated just after leaving the block.
>>>
>>> I think that the assertion failure still can happen if the process
>>> aborts after decremented the counter and before setting to
>>> SESSION_BACKUP_NONE. Am I missing something?
>>
>> The process would stop at the next CHECK_FOR_INTERRUPTS() and trigger
>> the cleanup at this moment. So this happens when waiting for the
>> archives to be done, and the session flag is set to NONE at this
>> point.
>
> And actually, with two non-exclusive backups taken in parallel, it is
> still possible to fail on another assertion in do_pg_stop_backup()
> even with your patch. Imagine the following:
> 1) Two backups are taken, counter for non-exclusive backups is set at 2.
> 2) One backup is stopped, then interrupted. This causes the counter to
> be decremented twice, once in do_pg_stop_backup, and once when
> aborting. Counter is at 0, switching as well forcePageWrites to
> false..
> 3) The second backup stops, a new assertion failure is triggered.
> Without assertions the counter would get at -1.

You're right. I updated the patch so that it exits do_pg_abort_backup
if the state is NONE and setting the state to NONE in
do_pg_stop_backup before releasing the WAL insert lock.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_do_pg_abort_backup_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DROP SUBSCRIPTION hangs if sub is disabled in the same transaction

2017-09-21 Thread Masahiko Sawada

On Sun, Sep 17, 2017 at 2:01 AM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> On Sat, Sep 16, 2017 at 9:52 PM, Peter Eisentraut
> <peter.eisentr...@2ndquadrant.com> wrote:
>> On 9/15/17 13:35, Arseny Sher wrote:
>>> Peter Eisentraut <peter.eisentr...@2ndquadrant.com> writes:
>>>
>>>> Here is a simple patch that fixes this, based on my original proposal
>>>> point #4.
>>>
>>> I checked, it passes the tests and solves the problem. However, isn't
>>> this
>>>
>>> + if (slotname || !subenabled)
>>>
>>> is a truism? Is it possible that subscription has no slot but still
>>> enabled?
>>
>> Yeah, we could just remove the _at_commit() branch entirely.  That would
>> effectively undo the change in 7e174fa793a2df89fe03d002a5087ef67abcdde8,
>> but I don't see any other choice for now.  And the practical impact
>> would be quite limited.
>>
>
> Yeah, we can remove only _at_commit() branch, but other part of the
> commit is still valid for ALTER SUBSCRIPTION DISABLE.
>
>>> Besides, we can avoid stopping the workers if subscription has no
>>> associated replication origin, though this probably means that
>>> subscription was broken by user and is not worth it.
>>
>> Right, it seems not worth addressing this case separately.
>>
>
> Once we got this patch, DROP SUBSCRIPTION is not transactional either
> if dropping a replication slot or if the subscription got disabled in
> a transaction block. But we disallow to do DROP SUBSCRIPTION in a
> transaction block only in the former case. In the latter case, we
> adopted such non-transactional behaviour. Since these behaviours would
> be complex for users I attached the documentation patch explaining it.
>

Hmm, isn't there necessary to care and mention about this kind of
inconsistent behavior in docs?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-09-21 Thread Masahiko Sawada

On Thu, Sep 21, 2017 at 5:23 PM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:
>
> Hello Masahiko-san,
>
>>> ISTM that you probably intended "\(.*\)" (actual parenthesis) instead of
>>> "(.*)" (memorization) in the data generation message check.
>>
>>
>> Thank you, fixed it.
>>
>>> Otherwise all is well for me.
>>>
>>
>> Attached the updated version patch.
>
>
> Applies, compiles, make check & tap test ok, doc is fine.
>
> All is well for me: the feature is useful, code is simple and clean, it is
> easy to invoke, easy to extend as well, which I'm planning to do once it
> gets in.
>
> I switched the patch to "Ready for Committers". No doubt they will have
> their own opinions about it. Let's wait and see.

Thank you for the reviewing this patch!!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

2017-09-21 Thread Masahiko Sawada

On Thu, Sep 21, 2017 at 2:25 PM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> On Thu, Sep 21, 2017 at 1:07 AM, Masahiko Sawada <sawada.m...@gmail.com> 
> wrote:
>> The bug can happen in PostgreSQL 9.1 or higher that non-exclusive
>> backup has been introduced, so we should back-patch to the all
>> supported versions.
>
> There is a typo here right? Non-exclusive backups have been introduced
> in 9.6. Why would a back-patch further down be needed?

I think the non-exclusive backups infrastructure has been introduced
in 9.1 for pg_basebackup. I've not checked yet that it can be
reproduced using pg_basebackup in PG9.1 but I thought it could happen
as far as I checked the code.

>
>> I changed do_pg_abort_backup() so that it decrements
>> nonExclusiveBackups only if > 0. Feedback is very welcome.
>
> +-  Assert(XLogCtl->Insert.nonExclusiveBackups >= 0);
> +   if (XLogCtl->Insert.nonExclusiveBackups > 0)
> +   XLogCtl->Insert.nonExclusiveBackups--;
> Hm, no, I don't agree. I think that instead you should just leave
> do_pg_abort_backup() immediately if sessionBackupState is set to
> SESSION_BACKUP_NONE. This variable is the link between the global
> counters and the session stopping the backup so I don't think that we
> should touch this assertion of this counter. I think that this method
> would be safe as well for backup start as pg_start_backup_callback
> takes care of any cleanup. Also because the counters are incremented
> before entering in the PG_ENSURE_ERROR_CLEANUP block, and
> sessionBackupState is updated just after leaving the block.

I think that the assertion failure still can happen if the process
aborts after decremented the counter and before setting to
SESSION_BACKUP_NONE. Am I missing something?

>
> About the patch:
> +-  Assert(XLogCtl->Insert.nonExclusiveBackups >= 0);
> There is a typo on this line.
>
> Adding that to the next CF would be a good idea so as we don't forget
> about it. Feel free to add me as reviewer.

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-09-20 Thread Masahiko Sawada

On Wed, Sep 20, 2017 at 3:26 PM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:
>
> Hello Masahiko-san,
>
> v14 applies, compiles and works. TAP tests provide good coverage.
>
> ISTM that you probably intended "\(.*\)" (actual parenthesis) instead of
> "(.*)" (memorization) in the data generation message check.
>

Thank you, fixed it.

> Otherwise all is well for me.
>

Attached the updated version patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgbench_custom_initialization_v15.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

2017-09-20 Thread Masahiko Sawada

Hi,

I got an assert failure when executing pg_terminate_backend to the
process that waiting for WAL to be archived in non-exclusive backup
mode.

TRAP: FailedAssertion("!(XLogCtl->Insert.nonExclusiveBackups > 0)",
File: "xlog.c", Line: 11123)

The one reproducing procedure is,
1. Start postgres with enabling the WAL archive
2. Execute pg_start_backup()
3. Revoke the access privileges of archive directory. (e.g., chown
root:root /path/to/archive)
4. Execute pg_stop_backup() and hangs
5. Execute pg_terminate_backend() to the process that is waiting for
WAL to be archived.
Got the assertion failure.

Perhaps we can reproduce it using pg_basebackup as well.

I think the cause of this is that do_pg_abort_backup can be called
even after we decremented XLogCtl->Insert.nonExclusiveBackups in
do_pg_stop_backup(). That is, do_pg_abort_backup can be called with
XLogCtl->Insert.nonExclusiveBackups = 0 when the transaction aborts
after processed the nonExclusiveBackups (e.g, during waiting for WAL
to be archived)
The bug can happen in PostgreSQL 9.1 or higher that non-exclusive
backup has been introduced, so we should back-patch to the all
supported versions.

I changed do_pg_abort_backup() so that it decrements
nonExclusiveBackups only if > 0. Feedback is very welcome.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0513471..b0381e8 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11165,8 +11165,15 @@ void
 do_pg_abort_backup(void)
 {
 	WALInsertLockAcquireExclusive();
-	Assert(XLogCtl->Insert.nonExclusiveBackups > 0);
-	XLogCtl->Insert.nonExclusiveBackups--;
+
+	/*
+	 * This can be called after we decremented nonExclusiveBackups in
+	 * do_pg_stop_backup. So prevent it to be negative value.
+	 */
+-	Assert(XLogCtl->Insert.nonExclusiveBackups >= 0);
+	if (XLogCtl->Insert.nonExclusiveBackups > 0)
+		XLogCtl->Insert.nonExclusiveBackups--;
+
 
 	if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE &&
 		XLogCtl->Insert.nonExclusiveBackups == 0)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-09-19 Thread Masahiko Sawada

On Tue, Sep 19, 2017 at 12:41 PM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:
>
> Hello Masahiko-san,
>
>> Attached the latest version patch incorporated the tap tests.
>> Please review it.
>
>
> Patch applies, compilation & make check ok.
>
> Tests are simple and provide good coverage of new functionalities.
>
> I would suggest to add '--unlogged-tables' so speedup the tests a little.
>

Good idea, added.

> Comment: "# Custom initialization option, including a space"... ISTM that
> there is no space. Space is tested in the next test because of the v's and
> the --no-vacuum which turned them into space, which is enough.

You're right, I removed it.

> Regex are just check for the whole output, so putting twice "qr{vacuum}"
> does not check that vacuum appears twice, it checks twice that vacuum
> appears once. I do not think that it is worth trying to check for the v
> repetition, so I suggest to remove one from the first test. Repetition of '
> ' is checked with the second test.

Agreed.

> Maybe you could check that the data generation message is there.

Added the check.

Attached the latest patch. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgbench_custom_initialization_v14.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Creating backup history files for backups taken from standbys

2017-09-19 Thread Masahiko Sawada

On Tue, Sep 19, 2017 at 2:48 PM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> On Tue, Sep 19, 2017 at 8:33 AM, David Steele <da...@pgmasters.net> wrote:
>> On 9/18/17 7:26 PM, Michael Paquier wrote:
>>> On Tue, Sep 19, 2017 at 8:14 AM, David Steele <da...@pgmasters.net> wrote:
>>>> On 8/31/17 11:56 PM, Michael Paquier wrote:
>>>>> Here is an updated patch with refreshed documentation, as a result of
>>>>> 449338c which was discussed in thread
>>>>> https://www.postgresql.org/message-id/d4d951b9-89c0-6bc1-b6ff-d0b2dd5a8...@pgmasters.net.
>>>>> I am just outlining the fact that a backup history file gets created
>>>>> on standbys and that it is archived.
>>>>
>>>> The patch looks good overall.  It applies cleanly and passes all tests.
>>>>
>>>> One question.  Do we want to create this file all the time (as written),
>>>> or only when archive_mode = always?
>>>>
>>>> It appears that CleanupBackupHistory() will immediately remove the
>>>> history file when archive_mode != always so it seems useless to write
>>>> it.  On the other hand the code is a bit simpler this way.  Thoughts?
>>>
>>> With archive_mode = off, the bytes of the backup history file are
>>> still written to disk, so my take on the matter is to keep the code
>>> simple.
>>
>> I'm OK with that.
>>
>> I'll give Masahiko some time to respond before marking it RFC.
>>
>
> Thanks, I'll review it and send a comment by this Wednesday.
>

I've reviewed and tested the latest patch but fond no problems, so
I've marked this patch to Ready for Committer.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commits don't block for synchronous replication

2017-09-19 Thread Masahiko Sawada

On Tue, Sep 19, 2017 at 2:26 PM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> On Tue, Sep 19, 2017 at 7:50 AM, Xin Zhang <xzh...@pivotal.io> wrote:
>> If primary crashed at that moment, and failover to standby, the foo table is
>> lost, even though the replication is synced according to
>> `pg_stat_replication` view.
>
> GUC parameters are reloaded each time a query is run, and so
> SyncRepConfig is filled with the parsed data of SyncRepStandbyNames
> once the parameter is reloaded for the process. Still, here, a commit
> is waiting for a signal from a WAL sender that the wanted LSN has been
> correctly flushed on a standby so this code path does not care about
> the state of SyncRepConfig saved in the context of the process, we
> want to know what the checkpointer thinks about it. Hence using WAL
> sender data or sync_standbys_defined as a source of truth looks like a
> correct concept to me, making the problem of this bug legit.
>

I agree with the analysis and the approach of this patch.

> The check with SyncRepRequested() still holds truth: max_wal_senders
> needs a restart to be updated. Also, the other caller of
> SyncStandbysDefined() requires SyncRepConfig to be set, so this caller
> is fine.

Yeah, after reloaded the config there might be some wal senders that
don't reflect it yet but I think it cannot be a problem.

> I have looked at your patch and tested it, but found no problems
> associated with it. A backpatch would be required, so I have added an
> entry in the next commit fest with status set to "ready for committer"
> so as this bug does not fall into the cracks.

Also I found no problems in the patch.

>
>> A separate question, is the `pg_stat_replication` view the reliable way to
>> find when to failover to a standby, or there are some other ways to ensure
>> the standby is in-sync with the primary?
>
> It shows at SQL level what is currently present in shared memory by
> scanning all the WAL sender entries, so this report uses the same data
> as the backend themselves, so that's a reliable source. In Postgres
> 10, pg_stat_activity is also able to show to users what are the
> backends waiting for a change to be flushed/applied on the standby
> using the wait event called SyncRep. You could make some use of that
> as well.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Block level parallel vacuum WIP

2017-09-19 Thread Masahiko Sawada

On Tue, Sep 19, 2017 at 3:33 PM, Thomas Munro
<thomas.mu...@enterprisedb.com> wrote:
> On Fri, Sep 8, 2017 at 10:37 PM, Masahiko Sawada <sawada.m...@gmail.com> 
> wrote:
>> Since v4 patch conflicts with current HEAD I attached the latest version 
>> patch.
>
> Hi Sawada-san,
>
> Here is an interesting failure with this patch:
>
> test rowsecurity  ... FAILED
> test rules... FAILED
>
> Down at the bottom of the build log in the regression diffs file you can see:
>
> ! ERROR: cache lookup failed for relation 32893
>
> https://travis-ci.org/postgresql-cfbot/postgresql/builds/277165907
>

Thank you for letting me know.

Hmm, it's an interesting failure. I'll investigate it and post the new patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Creating backup history files for backups taken from standbys

2017-09-18 Thread Masahiko Sawada

On Tue, Sep 19, 2017 at 8:33 AM, David Steele <da...@pgmasters.net> wrote:
> On 9/18/17 7:26 PM, Michael Paquier wrote:
>> On Tue, Sep 19, 2017 at 8:14 AM, David Steele <da...@pgmasters.net> wrote:
>>> On 8/31/17 11:56 PM, Michael Paquier wrote:
>>>> Here is an updated patch with refreshed documentation, as a result of
>>>> 449338c which was discussed in thread
>>>> https://www.postgresql.org/message-id/d4d951b9-89c0-6bc1-b6ff-d0b2dd5a8...@pgmasters.net.
>>>> I am just outlining the fact that a backup history file gets created
>>>> on standbys and that it is archived.
>>>
>>> The patch looks good overall.  It applies cleanly and passes all tests.
>>>
>>> One question.  Do we want to create this file all the time (as written),
>>> or only when archive_mode = always?
>>>
>>> It appears that CleanupBackupHistory() will immediately remove the
>>> history file when archive_mode != always so it seems useless to write
>>> it.  On the other hand the code is a bit simpler this way.  Thoughts?
>>
>> With archive_mode = off, the bytes of the backup history file are
>> still written to disk, so my take on the matter is to keep the code
>> simple.
>
> I'm OK with that.
>
> I'll give Masahiko some time to respond before marking it RFC.
>

Thanks, I'll review it and send a comment by this Wednesday.

-- 
Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-09-18 Thread Masahiko Sawada

On Fri, Sep 8, 2017 at 9:52 AM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> On Thu, Sep 7, 2017 at 4:15 PM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:
>>
>>>> Very very minor comments that I should have noticed before, sorry for
>>>> this
>>>> additional round trip.
>>>
>>>
>>> Thank you for the dedicated review!
>>
>>
>> I'm someone at times pigheaded, I think in the good sense if it is possible,
>> and I like to finish what I start:-)
>>
>> Patch applies, compiles, works, everything is fine from my point of view.
>>
>> I switched it to "Ready for Committer".
>
> Thanks.
>
>>
>> Again, if the pgbench tap test patch get through, it should be tap tested.
>>
>
> Thank you for the remainder, I'll add tap tests once the patch got committed.
>

Attached the latest version patch incorporated the tap tests.
Please review it.

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgbench_custom_initialization_v13.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small code improvement for btree

2017-09-18 Thread Masahiko Sawada

On Tue, Sep 19, 2017 at 5:39 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Doug Doole <dougdo...@gmail.com> writes:
>> Looks good to me.
>> The new status of this patch is: Ready for Committer
>
> Pushed.  Grepping found a few more places that should be changed to
> use these macros rather than referencing btpo_flags directly,
> so I did that.

Thank you!

> I tend to agree with Alvaro that it'd be better to get rid of
> BT_READ/BT_WRITE in favor of using the same buffer flags used
> elsewhere; but I also agree that as long as they're there we
> should use them, so I included that change as well.
>

Agreed. Thanks, again.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DROP SUBSCRIPTION hangs if sub is disabled in the same transaction

2017-09-16 Thread Masahiko Sawada

On Sat, Sep 16, 2017 at 9:52 PM, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:
> On 9/15/17 13:35, Arseny Sher wrote:
>> Peter Eisentraut <peter.eisentr...@2ndquadrant.com> writes:
>>
>>> Here is a simple patch that fixes this, based on my original proposal
>>> point #4.
>>
>> I checked, it passes the tests and solves the problem. However, isn't
>> this
>>
>> + if (slotname || !subenabled)
>>
>> is a truism? Is it possible that subscription has no slot but still
>> enabled?
>
> Yeah, we could just remove the _at_commit() branch entirely.  That would
> effectively undo the change in 7e174fa793a2df89fe03d002a5087ef67abcdde8,
> but I don't see any other choice for now.  And the practical impact
> would be quite limited.
>

Yeah, we can remove only _at_commit() branch, but other part of the
commit is still valid for ALTER SUBSCRIPTION DISABLE.

>> Besides, we can avoid stopping the workers if subscription has no
>> associated replication origin, though this probably means that
>> subscription was broken by user and is not worth it.
>
> Right, it seems not worth addressing this case separately.
>

Once we got this patch, DROP SUBSCRIPTION is not transactional either
if dropping a replication slot or if the subscription got disabled in
a transaction block. But we disallow to do DROP SUBSCRIPTION in a
transaction block only in the former case. In the latter case, we
adopted such non-transactional behaviour. Since these behaviours would
be complex for users I attached the documentation patch explaining it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
diff --git a/doc/src/sgml/ref/drop_subscription.sgml b/doc/src/sgml/ref/drop_subscription.sgml
index f535c00..a7a786e 100644
--- a/doc/src/sgml/ref/drop_subscription.sgml
+++ b/doc/src/sgml/ref/drop_subscription.sgml
@@ -93,6 +93,14 @@ DROP SUBSCRIPTION [ IF EXISTS ] name.
   
+
+  
+   Since dropping a replication slot is not transactional, we cannot run
+   DROP SUBSCRIPTION inside a transaction block if dropping
+   the replication slot. Also, DROP SUBSCRIPTOIN stops the
+   workers if the subscription got disabled in a transaction block even if
+   the transaction rolls back.
+  
  
 
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DROP SUBSCRIPTION hangs if sub is disabled in the same transaction

2017-09-13 Thread Masahiko Sawada

On Thu, Sep 14, 2017 at 12:04 AM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> On Wed, Sep 13, 2017 at 8:00 PM, Arseny Sher <a.s...@postgrespro.ru> wrote:
>> Peter Eisentraut <peter.eisentr...@2ndquadrant.com> writes:
>>> We can break this in any number of ways:
>>>
>>> - (your patch) Kill workers right away after ALTER SUBSCRIPTION DISABLE,
>>> thus breaking the appearance of transactional DDL somewhat.
>>> ...
>>> - Have DROP SUBSCRIPTION attempt to kill workers if the subscription is
>>> disabled (and possibly, was changed in the same transaction), which
>>> would address this scenario very narrowly.
>>
>> Actually, my patch is closer to the last variant. I proposed to kill the
>> workers in DROP SUBSCRIPTION, and only if we are dropping replication
>> origin (which is probably always the case, though). I agree that it is
>> somewhat narrow and still slightly violates transactionality of DROP
>> SUBSCRIPTION, meaning that its changes (stopped workers) are seen before
>> the commit.
>>
>> However, do we care much about that? Is there any chance that users will
>> rely on living apply workers after DROP SUBSCRIPTION, but before the
>> transaction commit? In which situation this might be useful?
>>
>> On the other hand, forbidding to execute disable sub and drop sub in one
>> transaction makes it impossible e.g. to drop subscription in trigger as
>> long as Postgres doesn't have autonomous transactions.
>>
>>
>> Tom Lane <t...@sss.pgh.pa.us> writes:
>>> ISTM the second of those (refuse to drop an in-use subscription) is
>>> by far the least surprising behavior.  However, I wonder if there aren't
>>> race conditions involved here.  What if we haven't yet committed a
>>> DROP SUBSCRIPTION, and some new worker starts up after we look for
>>> workers?
>>
>> We hold a lock on subscription till the end of transaction, so workers
>> won't start.
>>
>>> If there aren't variants of that that will break all four options,
>>> it's not very obvious why not.
>>
>> I see it this way:
>> * We want effect of drop sub invisible till commit, so we can't stop
>>   workers before commit.
>> * Drop of replication origin needs to be executed in one transaction with
>>   drop sub, it writes to WAL and so must be executed before commit.
>> * Apply worker needs RO for its work, it owns origin for the whole
>>   lifetime.
>>
>> Something should be given up here. One more idea that was not yet
>> mentioned is to abandon attempts to drop subs and ROs simultenously and
>> just garbage-collect replication origins periodically, but that doesn't
>> sound as an elegant solution.
>>
>>
>> Masahiko Sawada <sawada.m...@gmail.com> writes:
>>
>>>> I don't think this is reliable -- what if worker suddenly dies without
>>>> accomplishing the job?
>>>
>>> The apply worker will be launched by the launcher later. If DROP
>>> SUBSCRIPTION is issued before the apply worker launches again, DROP
>>> SUBSCRIPTION itself can remove the replication origin.
>>
>> Why launcher would restart the worker if we already destroyed the
>> subscription?
>
> Ah, the apply worker will not launch in that case.
>
>> Consider the sequence of actions:
>>
>> * We check in DROP SUBSCRIPTION that worker alive and don't remove RO.
>> * DROP SUBSCRIPTION commits.
>> * Worker is killed by some villain before it had the chance to drop RO.
>>   It might be killed even before drop sub commit, but after the check,
>>   we are left with orphan RO anyway.
>
> Hmm yes, we could left with orphan the replication origin. It's not a
> good solution.
>
> The second option makes subscription management more complex for
> users. Whenever user wants to drop subscription in use, they have to
> disable it before dropping the subscription, while CREATE SUBSCRIPTION
> starts the logical replication without ALTER SUBSCRIPTION ENABLE. I
> guess disallowing DROP SUBSCRIPTION in a transaction would rather be
> more useful for users because there would not be a lot of users who
> want to manage subscriptions transactionally.
>

Sorry, "The second option" meant the second option of options listed by Peter.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DROP SUBSCRIPTION hangs if sub is disabled in the same transaction

2017-09-13 Thread Masahiko Sawada

On Wed, Sep 13, 2017 at 8:00 PM, Arseny Sher <a.s...@postgrespro.ru> wrote:
> Peter Eisentraut <peter.eisentr...@2ndquadrant.com> writes:
>> We can break this in any number of ways:
>>
>> - (your patch) Kill workers right away after ALTER SUBSCRIPTION DISABLE,
>> thus breaking the appearance of transactional DDL somewhat.
>> ...
>> - Have DROP SUBSCRIPTION attempt to kill workers if the subscription is
>> disabled (and possibly, was changed in the same transaction), which
>> would address this scenario very narrowly.
>
> Actually, my patch is closer to the last variant. I proposed to kill the
> workers in DROP SUBSCRIPTION, and only if we are dropping replication
> origin (which is probably always the case, though). I agree that it is
> somewhat narrow and still slightly violates transactionality of DROP
> SUBSCRIPTION, meaning that its changes (stopped workers) are seen before
> the commit.
>
> However, do we care much about that? Is there any chance that users will
> rely on living apply workers after DROP SUBSCRIPTION, but before the
> transaction commit? In which situation this might be useful?
>
> On the other hand, forbidding to execute disable sub and drop sub in one
> transaction makes it impossible e.g. to drop subscription in trigger as
> long as Postgres doesn't have autonomous transactions.
>
>
> Tom Lane <t...@sss.pgh.pa.us> writes:
>> ISTM the second of those (refuse to drop an in-use subscription) is
>> by far the least surprising behavior.  However, I wonder if there aren't
>> race conditions involved here.  What if we haven't yet committed a
>> DROP SUBSCRIPTION, and some new worker starts up after we look for
>> workers?
>
> We hold a lock on subscription till the end of transaction, so workers
> won't start.
>
>> If there aren't variants of that that will break all four options,
>> it's not very obvious why not.
>
> I see it this way:
> * We want effect of drop sub invisible till commit, so we can't stop
>   workers before commit.
> * Drop of replication origin needs to be executed in one transaction with
>   drop sub, it writes to WAL and so must be executed before commit.
> * Apply worker needs RO for its work, it owns origin for the whole
>   lifetime.
>
> Something should be given up here. One more idea that was not yet
> mentioned is to abandon attempts to drop subs and ROs simultenously and
> just garbage-collect replication origins periodically, but that doesn't
> sound as an elegant solution.
>
>
> Masahiko Sawada <sawada.m...@gmail.com> writes:
>
>>> I don't think this is reliable -- what if worker suddenly dies without
>>> accomplishing the job?
>>
>> The apply worker will be launched by the launcher later. If DROP
>> SUBSCRIPTION is issued before the apply worker launches again, DROP
>> SUBSCRIPTION itself can remove the replication origin.
>
> Why launcher would restart the worker if we already destroyed the
> subscription?

Ah, the apply worker will not launch in that case.

> Consider the sequence of actions:
>
> * We check in DROP SUBSCRIPTION that worker alive and don't remove RO.
> * DROP SUBSCRIPTION commits.
> * Worker is killed by some villain before it had the chance to drop RO.
>   It might be killed even before drop sub commit, but after the check,
>   we are left with orphan RO anyway.

Hmm yes, we could left with orphan the replication origin. It's not a
good solution.

The second option makes subscription management more complex for
users. Whenever user wants to drop subscription in use, they have to
disable it before dropping the subscription, while CREATE SUBSCRIPTION
starts the logical replication without ALTER SUBSCRIPTION ENABLE. I
guess disallowing DROP SUBSCRIPTION in a transaction would rather be
more useful for users because there would not be a lot of users who
want to manage subscriptions transactionally.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DROP SUBSCRIPTION hangs if sub is disabled in the same transaction

2017-09-12 Thread Masahiko Sawada

On Wed, Sep 13, 2017 at 12:48 AM, Arseny Sher <a.s...@postgrespro.ru> wrote:
> Masahiko Sawada <sawada.m...@gmail.com> writes:
>
>> FWIW, perhaps we can change the replication origin management so that
>> DROP SUBSCRIPTION doesn't drop the replication origin and the apply
>> worker itself removes it when exit. When an apply worker exits it
>> removes the replication origin if the corresponding subscription had
>> been removed.
>

After thought, I think we can change it like followings.
* If the replication origin is not acquired, DROP SUBSCRIPTION can drop it.
* If the replication origin is acquired by someone DROP SUBSCRIPTION
takes over a job of dropping it to the apply worker.
* The apply worker drops the replication origin when exit if the apply
worker has to drop it.

> I don't think this is reliable -- what if worker suddenly dies without
> accomplishing the job?

The apply worker will be launched by the launcher later. If DROP
SUBSCRIPTION is issued before the apply worker launches again, DROP
SUBSCRIPTION itself can remove the replication origin.

Attached a very rough patch for reference. It's very ugly but we can
deal with this case.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 2ef414e..9ed773e 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -940,7 +940,7 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
 	snprintf(originname, sizeof(originname), "pg_%u", subid);
 	originid = replorigin_by_name(originname, true);
 	if (originid != InvalidRepOriginId)
-		replorigin_drop(originid, false);
+		replorigin_drop(originid, true, true, true);
 
 	/*
 	 * If there is no slot associated with the subscription, we can finish
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index edc6efb..05423a7 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -129,6 +129,8 @@ typedef struct ReplicationState
 	 */
 	ConditionVariable origin_cv;
 
+	bool	drop_by_worker;
+
 	/*
 	 * Lock protecting remote_lsn and local_lsn.
 	 */
@@ -329,7 +331,7 @@ replorigin_create(char *roname)
  * Needs to be called in a transaction.
  */
 void
-replorigin_drop(RepOriginId roident, bool nowait)
+replorigin_drop(RepOriginId roident, bool nowait, bool need_lock, bool takeover)
 {
 	HeapTuple	tuple;
 	Relation	rel;
@@ -342,7 +344,8 @@ replorigin_drop(RepOriginId roident, bool nowait)
 restart:
 	tuple = NULL;
 	/* cleanup the slot state info */
-	LWLockAcquire(ReplicationOriginLock, LW_EXCLUSIVE);
+	if (need_lock)
+		LWLockAcquire(ReplicationOriginLock, LW_EXCLUSIVE);
 
 	for (i = 0; i < max_replication_slots; i++)
 	{
@@ -355,6 +358,22 @@ restart:
 			{
 ConditionVariable *cv;
 
+if (takeover)
+{
+	ereport(WARNING,
+			(errcode(ERRCODE_OBJECT_IN_USE),
+			 errmsg("could not drop replication origin with OID %d, in use by PID %d, takeover",
+	state->roident,
+	state->acquired_by)));
+	state->drop_by_worker = true;
+	if (need_lock)
+		LWLockRelease(ReplicationOriginLock);
+
+	/* now release lock again */
+	heap_close(rel, ExclusiveLock);
+	return;
+}
+
 if (nowait)
 	ereport(ERROR,
 			(errcode(ERRCODE_OBJECT_IN_USE),
@@ -363,7 +382,8 @@ restart:
 	state->acquired_by)));
 cv = >origin_cv;
 
-LWLockRelease(ReplicationOriginLock);
+if (need_lock)
+	LWLockRelease(ReplicationOriginLock);
 ConditionVariablePrepareToSleep(cv);
 ConditionVariableSleep(cv, WAIT_EVENT_REPLICATION_ORIGIN_DROP);
 ConditionVariableCancelSleep();
@@ -384,10 +404,12 @@ restart:
 			state->roident = InvalidRepOriginId;
 			state->remote_lsn = InvalidXLogRecPtr;
 			state->local_lsn = InvalidXLogRecPtr;
+			state->drop_by_worker = false;
 			break;
 		}
 	}
-	LWLockRelease(ReplicationOriginLock);
+	if (need_lock)
+		LWLockRelease(ReplicationOriginLock);
 
 	tuple = SearchSysCache1(REPLORIGIDENT, ObjectIdGetDatum(roident));
 	if (!HeapTupleIsValid(tuple))
@@ -785,6 +807,7 @@ replorigin_redo(XLogReaderState *record)
 		state->roident = InvalidRepOriginId;
 		state->remote_lsn = InvalidXLogRecPtr;
 		state->local_lsn = InvalidXLogRecPtr;
+		state->drop_by_worker = false;
 		break;
 	}
 }
@@ -987,6 +1010,15 @@ ReplicationOriginExitCleanup(int code, Datum arg)
 		cv = _replication_state->origin_cv;
 
 		session_replication_state->acquired_by = 0;
+
+		if (session_replication_state->drop_by_worker)
+		{
+			replorigin_session_origin = InvalidRepOriginId;
+			StartTransactionCommand();
+			replorigin_drop(session_replication_state->roident, false, false, false);
+			CommitTransactionComm

Re: [HACKERS] DROP SUBSCRIPTION hangs if sub is disabled in the same transaction

2017-09-12 Thread Masahiko Sawada

On Wed, Sep 6, 2017 at 3:28 PM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> On Mon, Sep 4, 2017 at 11:43 PM, Arseny Sher <a.s...@postgrespro.ru> wrote:
>> Arseny Sher <a.s...@postgrespro.ru> writes:
>>
>>> Attached patch fixes this by stopping workers before RO drop, as
>>> already done in case when we drop replication slot.
>>
>> Sorry, here is the patch.
>>
>
> I could reproduce this issue, it's a bug. Added this to the open item.
> The cause of this is commit 7e174fa7 which make subscription ddls kill
> the apply worker only when transaction commit. I didn't look the patch
> deeply yet but I'm not sure the approach that kills apply worker
> before commit would be good.
>

FWIW, perhaps we can change the replication origin management so that
DROP SUBSCRIPTION doesn't drop the replication origin and the apply
worker itself removes it when exit. When an apply worker exits it
removes the replication origin if the corresponding subscription had
been removed.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Block level parallel vacuum WIP

2017-09-08 Thread Masahiko Sawada

On Tue, Aug 15, 2017 at 10:13 AM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> On Wed, Jul 26, 2017 at 5:38 PM, Masahiko Sawada <sawada.m...@gmail.com> 
> wrote:
>> On Sun, Mar 5, 2017 at 4:09 PM, Masahiko Sawada <sawada.m...@gmail.com> 
>> wrote:
>>> On Sun, Mar 5, 2017 at 12:14 PM, David Steele <da...@pgmasters.net> wrote:
>>>> On 3/4/17 9:08 PM, Masahiko Sawada wrote:
>>>>> On Sat, Mar 4, 2017 at 5:47 PM, Robert Haas <robertmh...@gmail.com> wrote:
>>>>>> On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawada <sawada.m...@gmail.com> 
>>>>>> wrote:
>>>>>>> Yes, it's taking a time to update logic and measurement but it's
>>>>>>> coming along. Also I'm working on changing deadlock detection. Will
>>>>>>> post new patch and measurement result.
>>>>>>
>>>>>> I think that we should push this patch out to v11.  I think there are
>>>>>> too many issues here to address in the limited time we have remaining
>>>>>> this cycle, and I believe that if we try to get them all solved in the
>>>>>> next few weeks we're likely to end up getting backed into some choices
>>>>>> by time pressure that we may later regret bitterly.  Since I created
>>>>>> the deadlock issues that this patch is facing, I'm willing to try to
>>>>>> help solve them, but I think it's going to require considerable and
>>>>>> delicate surgery, and I don't think doing that under time pressure is
>>>>>> a good idea.
>>>>>>
>>>>>> From a fairness point of view, a patch that's not in reviewable shape
>>>>>> on March 1st should really be pushed out, and we're several days past
>>>>>> that.
>>>>>>
>>>>>
>>>>> Agreed. There are surely some rooms to discuss about the design yet,
>>>>> and it will take long time. it's good to push this out to CF2017-07.
>>>>> Thank you for the comment.
>>>>
>>>> I have marked this patch "Returned with Feedback."  Of course you are
>>>> welcome to submit this patch to the 2017-07 CF, or whenever you feel it
>>>> is ready.
>>>
>>> Thank you!
>>>
>>
>> I re-considered the basic design of parallel lazy vacuum. I didn't
>> change the basic concept of this feature and usage, the lazy vacuum
>> still executes with some parallel workers. In current design, dead
>> tuple TIDs are shared with all vacuum workers including leader process
>> when table has index. If we share dead tuple TIDs, we have to make two
>> synchronization points: before starting vacuum and before clearing
>> dead tuple TIDs. Before starting vacuum we have to make sure that the
>> dead tuple TIDs are not added no more. And before clearing dead tuple
>> TIDs we have to make sure that it's used no more.
>>
>> For index vacuum, each indexes is assigned to a vacuum workers based
>> on ParallelWorkerNumber. For example, if a table has 5 indexes and
>> vacuum with 2 workers, the leader process and one vacuum worker are
>> assigned to 2 indexes, and another vacuum process is assigned the
>> remaining one. The following steps are how the parallel vacuum
>> processes if table has indexes.
>>
>> 1. The leader process and workers scan the table in parallel using
>> ParallelHeapScanDesc, and collect dead tuple TIDs to shared memory.
>> 2. Before vacuum on table, the leader process sort the dead tuple TIDs
>> in physical order once all workers completes to scan the table.
>> 3. In vacuum on table, the leader process and workers reclaim garbage
>> on table in block-level parallel.
>> 4. In vacuum on indexes, the indexes on table is assigned to
>> particular parallel worker or leader process. The process assigned to
>> a index vacuums on the index.
>> 5. Before back to scanning the table, the leader process clears the
>> dead tuple TIDs once all workers completes to vacuum on table and
>> indexes.
>>
>> Attached the latest patch but it's still PoC version patch and
>> contains some debug codes. Note that this patch still requires another
>> patch which moves the relation extension lock out of heavy-weight
>> lock[1]. The parallel lazy vacuum patch could work even without [1]
>> patch but could fail during vacuum in some cases.
>>
>> Also, I attached the result of performance evaluation. The table size
>> is approximately 300MB ( > shared_buffers) an

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-09-07 Thread Masahiko Sawada

On Fri, Sep 8, 2017 at 7:24 AM, Thomas Munro
<thomas.mu...@enterprisedb.com> wrote:
> On Wed, Aug 16, 2017 at 2:13 PM, Masahiko Sawada <sawada.m...@gmail.com> 
> wrote:
>> The previous patch conflicts with current HEAD, I rebased the patch to
>> current HEAD.
>
> Hi Masahiko-san,
>
> FYI this doesn't build anymore.  I think it's just because the wait
> event enumerators were re-alphabetised in pgstat.h:
>
> ../../../../src/include/pgstat.h:820:2: error: redeclaration of
> enumerator ‘WAIT_EVENT_LOGICAL_SYNC_DATA’
>   WAIT_EVENT_LOGICAL_SYNC_DATA,
>   ^
> ../../../../src/include/pgstat.h:806:2: note: previous definition of
> ‘WAIT_EVENT_LOGICAL_SYNC_DATA’ was here
>   WAIT_EVENT_LOGICAL_SYNC_DATA,
>   ^
> ../../../../src/include/pgstat.h:821:2: error: redeclaration of
> enumerator ‘WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE’
>   WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE,
>   ^
> ../../../../src/include/pgstat.h:807:2: note: previous definition of
> ‘WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE’ was here
>   WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE,
>   ^
>

Thank you for the information! Attached rebased patch.

-- 
Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Moving_extension_lock_out_of_heavyweight_lock_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-09-07 Thread Masahiko Sawada

On Fri, Sep 8, 2017 at 8:25 AM, Thomas Munro
<thomas.mu...@enterprisedb.com> wrote:
> On Fri, Sep 8, 2017 at 10:24 AM, Thomas Munro
> <thomas.mu...@enterprisedb.com> wrote:
>> On Wed, Aug 16, 2017 at 2:13 PM, Masahiko Sawada <sawada.m...@gmail.com> 
>> wrote:
>>> The previous patch conflicts with current HEAD, I rebased the patch to
>>> current HEAD.
>>
>> Hi Masahiko-san,
>
> Hi Sawada-san,
>
> I have just learned from a colleague who is knowledgeable about
> Japanese customs and kind enough to correct me that the appropriate
> term of address for our colleagues in Japan on this mailing list is
> -san.  I was confused about that -- apologies for my
> clumsiness.

Don't worry about it, either is ok. In Japan there is a custom of
writing -san but -san is also not incorrect :-)
(also I think it's hard to distinguish between last name and first
name of Japanese name).

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-09-07 Thread Masahiko Sawada

On Thu, Sep 7, 2017 at 4:15 PM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:
>
>>> Very very minor comments that I should have noticed before, sorry for
>>> this
>>> additional round trip.
>>
>>
>> Thank you for the dedicated review!
>
>
> I'm someone at times pigheaded, I think in the good sense if it is possible,
> and I like to finish what I start:-)
>
> Patch applies, compiles, works, everything is fine from my point of view.
>
> I switched it to "Ready for Committer".

Thanks.

>
> Again, if the pgbench tap test patch get through, it should be tap tested.
>

Thank you for the remainder, I'll add tap tests once the patch got committed.

-- 
Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-09-06 Thread Masahiko Sawada

On Wed, Sep 6, 2017 at 4:01 PM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:
>
> Applies, compiles, works for me.
>
> Very very minor comments that I should have noticed before, sorry for this
> additional round trip.

Thank you for the dedicated review!

>
> In the help line, move -I just after -i, to put short options in
> alphabetical and decreasing importance order. On this line, also add the
> information about the default, something like:
>
>   -i, --ini...   
>   -I, --custom=[...]+  (default "ctgvp")
>  ...

Agreed. Attached latest patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgbench_custom_initialization_v12.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DROP SUBSCRIPTION hangs if sub is disabled in the same transaction

2017-09-06 Thread Masahiko Sawada

On Mon, Sep 4, 2017 at 11:43 PM, Arseny Sher <a.s...@postgrespro.ru> wrote:
> Arseny Sher <a.s...@postgrespro.ru> writes:
>
>> Attached patch fixes this by stopping workers before RO drop, as
>> already done in case when we drop replication slot.
>
> Sorry, here is the patch.
>

I could reproduce this issue, it's a bug. Added this to the open item.
The cause of this is commit 7e174fa7 which make subscription ddls kill
the apply worker only when transaction commit. I didn't look the patch
deeply yet but I'm not sure the approach that kills apply worker
before commit would be good.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-09-05 Thread Masahiko Sawada

On Wed, Sep 6, 2017 at 12:11 AM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:
>
>> Sorry, I don't follow that. You meant I should add a newline before
>> pg_realloc()? That is,
>>
>> +initialize_cmds =
>> +(char *) pg_realloc(initialize_cmds,
>> +sizeof(char) * n_cmds +
>> 1);
>
>
> Yes. Or maybe my terminal was doing tricks, because I had the impression
> that both argument where on the same line with many tabs in between, but
> maybe I just misinterpreted the diff file. My apology if it is the case.
>

I understood. It looks ugly in the patch but can be applied properly
by git apply command. Attached latest patch incorporated the comments
I got so far.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index f5db8d1..48e3581 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -159,6 +159,75 @@ pgbench  options  dbname
  
 
  
+  -I {custom_init_command [...]}
+  --custom-initialize={custom_init_command [...]}
+  
+   
+Specify custom initialization steps.
+custom_init_command specifies custom
+initialization steps for the initialization. The default is
+ctgvp. Each command is invoked in the
+specified order. The supported commands are:
+
+
+ 
+  c (cleanup)
+  
+   
+Destroying existing pgbench tables: pgbench_accounts,
+pgbench_branches, pgbench_history
+and pgbench_tellers.
+   
+  
+ 
+ 
+  t (table creation)
+  
+   
+Create four tables pgbench_accounts,
+pgbench_branches, pgbench_history
+and pgbench_tellers.
+   
+  
+ 
+ 
+  g (generate data)
+  
+   
+Generate data and load it to standard tables.
+   
+  
+ 
+ 
+  v (vacuum)
+  
+   
+Invoke vacuum on standard tables.
+   
+  
+ 
+ 
+  p (primary key)
+  
+   
+Create primary keys on standard tables.
+   
+  
+ 
+ 
+  f (foreign key)
+  
+   
+Create foreign keys constraints between the standard tables.
+   
+  
+ 
+
+   
+  
+ 
+
+ 
   -F fillfactor
   --fillfactor=fillfactor
   
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index e37496c..f6b0452 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -95,6 +95,8 @@ static int	pthread_join(pthread_t th, void **thread_return);
 
 #define MIN_GAUSSIAN_PARAM		2.0 /* minimum parameter for gauss */
 
+#define DEFAULT_INIT_COMMANDS "ctgvp"
+
 int			nxacts = 0;			/* number of transactions per client */
 int			duration = 0;		/* duration in seconds */
 int64		end_time = 0;		/* when to stop in micro seconds, under -T */
@@ -112,9 +114,9 @@ int			scale = 1;
 int			fillfactor = 100;
 
 /*
- * create foreign key constraints on the tables?
+ * no vacuum at all before testing?
  */
-int			foreign_keys = 0;
+bool			is_no_vacuum = false;
 
 /*
  * use unlogged tables?
@@ -458,6 +460,12 @@ static void pgbench_error(const char *fmt,...) pg_attribute_printf(1, 2);
 static void addScript(ParsedScript script);
 static void *threadRun(void *arg);
 static void setalarm(int seconds);
+static void initCleanupTables(PGconn *con);
+static void initCreateTables(PGconn *con);
+static void initLoadData(PGconn *con);
+static void initVacuum(PGconn *con);
+static void initCreatePKeys(PGconn *con);
+static void initCreateFKeys(PGconn *con);
 
 
 /* callback functions for our flex lexer */
@@ -484,6 +492,8 @@ usage(void)
 		   "   create indexes in the specified tablespace\n"
 		   "  --tablespace=TABLESPACE  create tables in the specified tablespace\n"
 		   "  --unlogged-tablescreate tables as unlogged tables\n"
+		   "  -I, --custom-initialize=[ctgvpf]+\n"
+		   "   initialize with custom initialization commands\n"
 		   "\nOptions to select what to run:\n"
 		   "  -b, --builtin=NAME[@W]   add builtin script NAME weighted at W (default: 1)\n"
 		   "   (use \"-b list\" to list available scripts)\n"
@@ -2605,9 +2615,55 @@ disconnect_all(CState *state, int length)
 	}
 }
 
-/* create tables and setup data */
+/* Check custom initialization commands */
+static void
+checkCustomCommands(char *commands)
+{
+	char	*cmd;
+
+	if (commands[0] =

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-09-05 Thread Masahiko Sawada

On Tue, Sep 5, 2017 at 4:06 PM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:
>
>> Attached the latest patch. Please review it.
>
>
> Patch applies and compiles cleanly.
>
> Three very minor points:
>
> Typo added in the documentation: ".Each" -> ". Each".
>
> In "case 8:" there is a very long line which lacks a newline before
> pg_realloc second argument.

Sorry, I don't follow that. You meant I should add a newline before
pg_realloc()? That is,

+initialize_cmds =
+(char *) pg_realloc(initialize_cmds,
+    sizeof(char) * n_cmds + 1);

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-09-04 Thread Masahiko Sawada

On Tue, Sep 5, 2017 at 12:24 PM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> On Tue, Sep 5, 2017 at 12:05 PM, Masahiko Sawada <sawada.m...@gmail.com> 
> wrote:
>> In get_rel_oids() we often switch the memory context to vac_context
>> and switch back. As a result almost code in get_rel_oids() is working
>> in vac_context. Maybe we can switch memory context before and after
>> the calling get_rel_oids?
>
> I thought about that as well, and it seemed to me that the current
> patch approach is less bug-prone for the future if get_rel_oids() gets
> called in some future code paths.

Okay, I agree. Also I found that dedupe_relations() eventually
allocates the list in current memory context that may not be
vac_context and set it to *relations at the end of that function. I
think we should switch the memory context to vac_context before doing
that. Or to more simplify the code maybe we can do the all treatment
of the relations list after switching to vac_context. For example,

oldcontext = MemoryContextSwtichTo(vac_context)
relations = copyObject(relations);
get_rel_oids();
check_colums_exist(relations);
dedupe_relations();
MemoryContextSwtichTo(oldcontext);

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-09-04 Thread Masahiko Sawada

On Tue, Sep 5, 2017 at 10:16 AM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> On Mon, Sep 4, 2017 at 11:47 PM, Bossart, Nathan <bossa...@amazon.com> wrote:
>> I've made this change in v14 of the main patch.
>>
>> In case others had opinions regarding the de-duplication patch, I've
>> attached that again as well.
>
> +   /*
> +* Create the relation list in a long-lived memory context so that it
> +* survives transaction boundaries.
> +*/
> +   old_cxt = MemoryContextSwitchTo(AutovacMemCxt);
> +   rangevar = makeRangeVar(tab->at_nspname, tab->at_relname, -1);
> +   rel = makeVacuumRelation(rangevar, NIL, tab->at_relid);
> +   rel_list = list_make1(rel);
> +   MemoryContextSwitchTo(old_cxt);
> That's way better, thanks for the new patch.
>
> So vacuum_multiple_tables_v14.patch is good for a committer in my
> opinion.

In get_rel_oids() we often switch the memory context to vac_context
and switch back. As a result almost code in get_rel_oids() is working
in vac_context. Maybe we can switch memory context before and after
the calling get_rel_oids?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-09-04 Thread Masahiko Sawada

On Tue, Sep 5, 2017 at 2:33 AM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:
>
> Hello Masahiko-san,
>
>>>> Currently TRUNCATE pgbench_accounts command is executed within a
>>>> transaction started immediately before it. If we move it out of the
>>>> transaction, the table data will be truncated even if the copying data
>>>> failed. Maybe we can do TRUNCATE pgbench_accounts, pgbench_history
>>>> instead. Thought?
>>>
>>>
>>>
>>> Keep the truncate in the transaction, and truncate both (or all?) tables
>>> together.
>>
>>
>> Attached latest patch incorporated the comments I got so far. Please
>> review it.
>
>
> "g" does not work for me yet when after "f", only the message is slightly
> different, it chokes on another dependency, branches instead of accounts.
>
>   sh> ./pgbench -i -I ctpfg
>   cleaning up...
>   creating tables...
>   set primary keys...
>   set foreign keys...
>   ERROR:  cannot truncate a table referenced in a foreign key constraint
>   DETAIL:  Table "pgbench_history" references "pgbench_branches".
>   HINT:  Truncate table "pgbench_history" at the same time, or use TRUNCATE
> ... CASCADE.

Sorry, I'd missed something.

> I think that the whole data generation should be in *one* transaction which
> starts with "truncate pgbench_history, pgbench_branches, pgbench_tellers,
> pgbench_accounts;"

Agreed, and fixed.

>
> In passing, I think that the documentation should tell explicitely what the
> default value is (aka "ctgvp").

Agreed.

Attached the latest patch. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgbench_custom_initialization_v10.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CLUSTER command progress monitor

2017-09-03 Thread Masahiko Sawada

On Mon, Sep 4, 2017 at 11:37 AM, Tatsuro Yamada
<yamada.tats...@lab.ntt.co.jp> wrote:
> Hi Sawada-san, Thomas,
>
> Thanks for sharing the reggression.diff.
> I realized Thomas's comment is right.
>
> Attached patch is fixed version.
> Could you try it?
>

Yeah, in my environment the regression test passed. Thanks.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-09-03 Thread Masahiko Sawada

On Fri, Sep 1, 2017 at 11:29 PM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:
>
>>> I'm wondering whether this truncation should be yet another available
>>> command? Hmmm... maybe not.
>>
>>
>> Currently TRUNCATE pgbench_accounts command is executed within a
>> transaction started immediately before it. If we move it out of the
>> transaction, the table data will be truncated even if the copying data
>> failed. Maybe we can do TRUNCATE pgbench_accounts, pgbench_history
>> instead. Thought?
>
>
> Keep the truncate in the transaction, and truncate both (or all?) tables
> together.

Attached latest patch incorporated the comments I got so far. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgbench_custom_initialization_v9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CLUSTER command progress monitor

2017-09-01 Thread Masahiko Sawada

On Fri, Sep 1, 2017 at 3:38 PM, Tatsuro Yamada
<yamada.tats...@lab.ntt.co.jp> wrote:
> Hi Thomas,
>
>>> Any comments or suggestion are welcome.
>>
>>
>> Although this patch updates src/test/regress/expected/rules.out I
>> think perhaps you included the wrong version?  That regression test
>> fails for me
>
>
> Thanks for the comment.
>
> I use the patch on 7b69b6ce and it's fine.
> Did you use "initdb" command after "make install"?
> The pg_stat_progress_cluster view is created in initdb, probably.
>

I also got a regression test error (applied to abe85ef). Here is
regression.diff file.

*** /home/masahiko/source/postgresql/src/test/regress/expected/rules.out
   2017-09-01 17:27:33.680055612 -0700
--- /home/masahiko/source/postgresql/src/test/regress/results/rules.out
2017-09-01 17:28:10.410055596 -0700
***
*** 1819,1824 
--- 1819,1849 
  pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin,
  pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock
 FROM pg_database d;
+ pg_stat_progress_cluster| SELECT s.pid,
+ s.datid,
+ d.datname,
+ s.relid,
+ CASE s.param1
+ WHEN 0 THEN 'initializing'::text
+ WHEN 1 THEN 'scanning heap'::text
+ WHEN 2 THEN 'sorting tuples'::text
+ WHEN 3 THEN 'writing new heap'::text
+ WHEN 4 THEN 'scan heap and write new heap'::text
+ WHEN 5 THEN 'swapping relation files'::text
+ WHEN 6 THEN 'rebuilding index'::text
+ WHEN 7 THEN 'performing final cleanup'::text
+ ELSE NULL::text
+ END AS phase,
+ CASE s.param2
+ WHEN 0 THEN 'index scan'::text
+ WHEN 1 THEN 'seq scan'::text
+ ELSE NULL::text
+ END AS scan_method,
+ s.param3 AS scan_index_relid,
+ s.param4 AS heap_tuples_total,
+ s.param5 AS heap_tuples_scanned
+FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid,
relid, param1, param2, param3, param4, param5, param6, param7, param8,
param9, param10)
+  LEFT JOIN pg_database d ON ((s.datid = d.oid)));
  pg_stat_progress_vacuum| SELECT s.pid,
  s.datid,
  d.datname,
***
*** 1841,1871 
  s.param7 AS num_dead_tuples
 FROM (pg_stat_get_progress_info('VACUUM'::text) s(pid, datid,
relid, param1, param2, param3, param4, param5, param6, param7, param8,
param9, param10)
   LEFT JOIN pg_database d ON ((s.datid = d.oid)));
- pg_stat_progress_cluster| SELECT
- s.pid,
- s.datid,
- d.datname,
- s.relid,
- CASE s.param1
- WHEN 0 THEN 'initializing'::text
- WHEN 1 THEN 'scanning heap'::text
- WHEN 2 THEN 'sorting tuples'::text
- WHEN 3 THEN 'writing new heap'::text
- WHEN 4 THEN 'scan heap and write new heap'::text
- WHEN 5 THEN 'swapping relation files'::text
- WHEN 6 THEN 'rebuilding index'::text
- WHEN 7 THEN 'performing final cleanup'::text
- ELSE NULL::text
- END AS phase,
- CASE S.param2
- WHEN 0 THEN 'index scan'
- WHEN 1 THEN 'seq scan'
- END AS scan_method,
- s.param3 AS index_relid,
- s.param4 AS heap_blks_total,
- s.param5 AS heap_blks_scanned
-FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid,
relid, param1, param2, param3, param4, param5)
-  LEFT JOIN pg_database d ON ((s.datid = d.oid)));
  pg_stat_replication| SELECT s.pid,
  s.usesysid,
  u.rolname AS usename,
--- 1866,1871 

==

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG 10 release notes

2017-09-01 Thread Masahiko Sawada

Hi all,

On Tue, Aug 1, 2017 at 5:53 AM, Thomas Munro
<thomas.mu...@enterprisedb.com> wrote:
> On Tue, Apr 25, 2017 at 1:31 PM, Bruce Momjian <br...@momjian.us> wrote:
>> I have committed the first draft of the Postgres 10 release notes.  They
>> are current as of two days ago, and I will keep them current.  Please
>> give me any feedback you have.
>
> Hi Bruce,
>
> "Add AFTER trigger transition tables to record changed rows (Kevin Grittner)"
>
> Any chance I could ask for a secondary author credit here?  While I
> started out as a reviewer and I understand that we don't list those, I
> finished up writing quite a lot of lines of committed code for this
> (see commits 1add0b15, 8c55244a, c46c0e52, 501ed02c, f32d57fd,
> 9e6104c6, 29fd3d9d, 304007d9, 5ebeb579) and was already listed as a
> co-author by Kevin in the original commits (59702716, 18ce3a4a).  Of
> course I wish I'd identified and fixed all of those things *before*
> the original commits, but that's how it played out...
>

It might be too late but I found that the following entry is
categorized in Monitoring. But in PostgreSQL 9.6 release note, the
feature related to default role is categorized in Permissions
Management. I think the adding new default roles can be categorized in
the same category to not confuse users and personally it's more
suitable.

"Add default monitoring roles (Dave Page)"

Thought?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-09-01 Thread Masahiko Sawada

On Fri, Sep 1, 2017 at 4:42 PM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:
>
> Hello Masahiko-san,
>
> Patch applies and compiles.
>
> One bug found, and some minor points again. Sorry for this hopefully last
> iteration... I'm kind of an iterative person...
>
> I've generated the doc to look a it.
>
> Short option "-I" does not use a "=", it should be "-I
> custom_init_commands".
>
> Also maybe it would look nicer and clearer if the short mnemonic was outside
> the literal, that is with:
>
>   c (cleanup)
>
> instead of:
>
>   c (cleanup)
>
> But this is debatable. Do it the way you think is best.
>
> Command "g" does not work after "f", something I had not tested before:
>
>  ./pgbench -i -I ctvpfg
>  cleaning up...
>  creating tables...
>  vacuum...
>  set primary keys...
>  set foreign keys...
>  ERROR:  cannot truncate a table referenced in a foreign key constraint
>  DETAIL:  Table "pgbench_history" references "pgbench_accounts".
>  HINT:  Truncate table "pgbench_history" at the same time, or use TRUNCATE
> ... CASCADE.
>
> I think it should work. It probably just mean to TRUNCATE all tables as one
> command, or add the suggested CASCADE. I would favor the first option.
>
> I'm wondering whether this truncation should be yet another available
> command? Hmmm... maybe not.

Currently TRUNCATE pgbench_accounts command is executed within a
transaction started immediately before it. If we move it out of the
transaction, the table data will be truncated even if the copying data
failed. Maybe we can do TRUNCATE pgbench_accounts, pgbench_history
instead. Thought?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-09-01 Thread Masahiko Sawada

On Thu, Aug 31, 2017 at 4:35 PM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:
>
> Hello Masahiko-san,
>
>> [...] Personally I prefer "t" for table creation because "c" for create is
>> a generic word. We might want to have another initialization command that
>> creates something.
>
>
> Ok, good point.
>
>
> About the patch: applies, compiles, works for me. A few minor comments.

Thank you for dedicated reviewing this patch!

> While re-reading the documentation, I think that it should be "Set custom
> initialization steps". It could be "Require ..." when -I implied -i, but
> since -i is still required the sentence does not seem to apply as such.
>
> "Destroying any existing tables: ..." -> "Destroy existing pgbench tables:
> ...".

Fixed.

> I would suggest to add short expanded explanations in the term definition,
> next to the triggering letter, to underline the mnemonic. Something like:
>
>c (cleanup)
>t (table creation)
>g (generate data)
>v (vacuum)
>p (primary key)
>f (foreign key)

Nice idea, agreed.

> Also update the error message in checkCustomCommands to "ctgvpf".

Fixed.

> Cleanup should have a message when it is executed. I suggest "cleaning
> up...".

Fixed.

> Maybe add a comment in front of the array tables to say that the order is
> important, something like "tables in reverse foreign key dependencies
> order"?

Fixed.

>
> case 'I': ISTM that initialize_cmds is necessarily already allocated, thus I
> would not bother to test before pg_free.

Agreed, fixed.


Attached latest patch. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgbench_custom_initialization_v8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-08-31 Thread Masahiko Sawada

On Thu, Aug 31, 2017 at 10:52 AM, Bossart, Nathan <bossa...@amazon.com> wrote:
> On 8/30/17, 5:37 PM, "Michael Paquier" <michael.paqu...@gmail.com> wrote:
>> +VacuumRelation *
>> +makeVacuumRelation(RangeVar *relation, List *va_cols, Oid oid)
>> +{
>> +   VacuumRelation *vacrel = makeNode(VacuumRelation);
>> +   vacrel->relation = relation;
>> +   vacrel->va_cols = va_cols;
>> +   vacrel->oid = oid;
>> +   return vacrel;
>> +}
>> Perhaps in makefuncs.c instead of vacuum.c? That's usually the place
>> used for node constructions like that.
>
> This function is moved in v11.
>
> On 8/30/17, 6:52 PM, "Michael Paquier" <michael.paqu...@gmail.com> wrote:
>> On Thu, Aug 31, 2017 at 8:35 AM, David G. Johnston
>> <david.g.johns...@gmail.com> wrote:
>>> Inspired by the syntax documentation for EXPLAIN:
>>>
>>> VACUUM [ ( option [, ...] ) ] [ table_def [, ...] ]
>>>
>>> where option can be one of:
>>> FULL
>>> FREEZE
>>> VERBOSE
>>> DISABLE_PAGE_SKIPPING
>>>
>>> and where table_def is:
>>> table_name [ ( column_name [, ... ] ) ]
>>
>> Yes, splitting things would be nice with the column list. I need more coffee.
>
> I've made this change in v11 as well.
>
> v2 of the de-duplication patch seems to still apply cleanly, so I haven't
> made any further changes to it.
>

I reviewed these patches and found a issue.

autovacuum worker seems not to work fine. I got an error message;

ERROR:  unrecognized node type: 0
CONTEXT:  automatic analyze of table "postgres.public.hoge"

I think we should set T_RangeVar to rangevar.type in
autovacuum_do_vac_analyze function.

Also, there is a small typo in dedupe_vacuum_relations_v2.patch.

+   /* if already procesed or not equal, skip */
+   if (list_member_int(duplicates, i) ||
relation->oid != nth_rel->oid)
+   continue;

s/procesed/processed/g

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-08-30 Thread Masahiko Sawada

On Wed, Aug 30, 2017 at 3:39 PM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:
>
> Hello,
>
>>> Hmmm. The existing "is_no_vacuum" variable is typed *both* as int (in
>>> "main") and as bool (in "init"), called by main (yuk!). I see no reason
>>> to
>>> choose the bad one for the global:-)
>>
>>
>> Yeah, I think this might be a good timing to re-consider int-for-bool
>> habits in pgbench. If we decided to change is_no_vacuum to bool I want
>> to change other similar variables as well.
>
>
> Franckly I would be fine with that, but committers might get touchy about
> "unrelated changes" in the patch... The "is_no_vacuum" is related to the
> patch and is already a bool -- if you chose the "init" definition as a
> reference -- so it is okay to bool it.

Okay, I changed only is_no_vacuum in this patch and other similar
variables would be changed in another patch.

>
>>> I think that the "-I" it should be added to the "--help" line, as it is
>>> done
>>> with other short & long options.
>>
>>
>> Okay, I'll leave it as of now. Maybe we can discuss later.
>
>
> Maybe we did not understand one another. I'm just suggesting to insert
> -I in the help line, that is change:
>
>"  --custom-initialize=[...]+\n"
>
> to
>
>"  -I, --custom-initialize=[...]+\n"

Fixed.

> I'm not sure it deserves to be discussed in depth later:-)

Sorry, I meant about having short option --custom-initialize.

>
>>> I wonder if this could be avoided easily? Maybe by setting the constraint
>>> name explicitely so that the second one fails on the existing one, which
>>> is
>>> fine, like for primary keys? [...]
>>
>>
>> Good point, I agree with first option.
>
>
> Ok.
>
>>> Maybe the initial cleanup (DROP TABLE) could be made an option added to
>>> the
>>> default, so that cleaning up the database could be achieved with some
>>> "pgbench -i -I c", instead of connecting and droping the tables one by
>>> one
>>> which I have done quite a few times... What do you think?
>>
>>
>> Yeah, I sometimes wanted that. Having the cleaning up tables option
>> would be good idea.
>
>
> Ok.
>
>> I'd say "g" for data generation would be better. Also, I'm inclined to
>> add a command for the unlogged tables. How about this?
>>
>> c - [c]leanup / or [d]rop tables
>> t - create table / [t]able creation or [c]reate table
>> u - create unlogged table
>> g - data generation / [g]enerate data
>> p - [p]rimary key
>> f - [f]oreign key
>> v - [v]acuum
>
>
> I'm okay with that. I also put an alternative with d/c above, without any
> preference from my part.

Personally I prefer "t" for table creation because "c" for create is a
generic word. We might want to have another initialization command
that creates something.

>
> I'm not sure about "u", though. Unlogged, like tablespace, is an orthogonal
> option: other table creation options (I intend to submit one which conforms
> to the TPC-B standard, that is use an INT8 balance as INT4 is not wide
> enough per spec, and always use an INT8 aid) may be also unlogged or
> tablespaced. So that would mean having two ways to trigger them... thus I
> would avoid it and keep only --unlogged.
>

Yeah, I think I had misunderstood it. -I option is for specifying some
particular initialization steps. So we don't need to have a command as
a option for other initializatoin commands.

Attached latest patch. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgbench_custom_initialization_v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-08-29 Thread Masahiko Sawada

On Tue, Aug 29, 2017 at 4:47 PM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:
>
> Hello,
>
> Patch applies and works.
>
>>> I would like to insist a little bit: the name was declared as an int but
>>> passed to init and used as a bool there before the patch. Conceptually
>>> what
>>> is meant is really a bool, and I see no added value bar saving a few
>>> strokes
>>> to have an int. ISTM that recent pgbench changes have started turning old
>>> int-for-bool habits into using bool when bool is meant.
>>
>>
>> Since is_no_vacuum is a existing one, if we follow the habit we should
>> change other similar variables as well: is_init_mode, do_vacuum_accounts and
>> debug. And I think we should change them in a separated patch.
>
>
> Hmmm. The existing "is_no_vacuum" variable is typed *both* as int (in
> "main") and as bool (in "init"), called by main (yuk!). I see no reason to
> choose the bad one for the global:-)
>

Yeah, I think this might be a good timing to re-consider int-for-bool
habits in pgbench. If we decided to change is_no_vacuum to bool I want
to change other similar variables as well.

>> After more thought, I'm bit inclined to not have a short option for
>> --custom-initialize because this option will be used for not primary
>> cases. It would be better to save short options for future
>> enhancements of pgbench. Thought?
>
>
> I like it as is, especially as now the associated value is a simple and
> short string, I think that it makes sense to have a simple and short option
> to trigger it. Moreover -I stands cleanly for "initialization", and the
> capital stands for something a little special which it is. Its to good to
> miss.
>
> I think that the "-I" it should be added to the "--help" line, as it is done
> with other short & long options.

Okay, I'll leave it as of now. Maybe we can discuss later.

>
> Repeating "-I f" results in multiple foreign key constraints:
>
>   Foreign-key constraints:
> "pgbench_tellers_bid_fkey" FOREIGN KEY (bid) REFERENCES
> pgbench_branches(bid)
> "pgbench_tellers_bid_fkey1" FOREIGN KEY (bid) REFERENCES
> pgbench_branches(bid)
> "pgbench_tellers_bid_fkey2" FOREIGN KEY (bid) REFERENCES
> pgbench_branches(bid)
> "pgbench_tellers_bid_fkey3" FOREIGN KEY (bid) REFERENCES
> pgbench_branches(bid)
>
> I wonder if this could be avoided easily? Maybe by setting the constraint
> name explicitely so that the second one fails on the existing one, which is
> fine, like for primary keys? Or adding a DROP CONSTRAINT IF EXISTS before
> the CREATE CONSTRAINT, like for tables? Or doing nothing about it? I would
> prefer the first option.

Good point, I agree with first option.

>
> Maybe the initial cleanup (DROP TABLE) could be made an option added to the
> default, so that cleaning up the database could be achieved with some
> "pgbench -i -I c", instead of connecting and droping the tables one by one
> which I have done quite a few times... What do you think?

Yeah, I sometimes wanted that. Having the cleaning up tables option
would be good idea.

> Before it is definitely engraved, I'm thinking about the letters:
>
>  c - cleanup
>  t - create table
>  d - data
>  p - primary key
>  f - foreign key
>  v - vacuum
>
> I think it is mostly okay, but it is the last time to think about it. Using
> "d" for cleanup (drop) would mean finding another letter for filling in
> data... maybe "g" for data generation? "c" may have been chosen for "create
> table", but then would not be available for "cleanup". Thoughts?

I'd say "g" for data generation would be better. Also, I'm inclined to
add a command for the unlogged tables. How about this?

c - cleanup
t - create table
u - create unlogged table
g - data generation
p - primary key
f - foreign key
v - vacuum

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Separate log file for extension

2017-08-29 Thread Masahiko Sawada

On Fri, Aug 25, 2017 at 4:12 PM, Antonin Houska <a...@cybertec.at> wrote:
> Attached is a draft patch to allow extension to write log messages to a
> separate file.

Does it allow postgres core code to write log messages to multiple log
files as well? I can imagine a use case where we want to write log
messages to separate log files by error level or purpose (server log
and SQL log etc).

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Quorum commit for multiple synchronous replication.

2017-08-29 Thread Masahiko Sawada

On Thu, Aug 24, 2017 at 4:27 PM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> On Thu, Aug 24, 2017 at 3:11 AM, Josh Berkus <j...@berkus.org> wrote:
>> On 08/22/2017 11:04 PM, Masahiko Sawada wrote:
>>> WARNING:  what you did is ok, but you might have wanted to do something else
>>>
>>> First of all, whether or not that can properly be called a warning is
>>> highly debatable.  Also, if you do that sort of thing to your spouse
>>> and/or children, they call it "nagging".  I don't think users will
>>> like it any more than family members do.
>>
>> Realistically, we'll support the backwards-compatible syntax for 3-5
>> years.  Which is fine.
>>
>> I suggest that we just gradually deprecate the old syntax from the docs,
>> and then around Postgres 16 eliminate it.  I posit that that's better
>> than changing the meaning of the old syntax out from under people.
>>
>
> It seems to me that there is no folk who intently votes for making the
> quorum commit the default. There some folks suggest to keep backward
> compatibility in PG10 and gradually deprecate the old syntax. And only
> the issuing from docs can be possible in PG10.
>

According to the discussion so far, it seems to me that keeping
backward compatibility and issuing a warning in docs that old syntax
could be changed or removed in a future release is the most acceptable
way in PG10. There is no objection against that so far and I already
posted a patch to add a warning in docs[1]. I'll wait for the
committer's decision.

[1] 
https://www.postgresql.org/message-id/CAD21AoAe%2BoGSFi3bjZ%2BfW6Q%3DTK7avPdDCZLEr02zM_c-U0JsRA%40mail.gmail.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] show "aggressive" or not in autovacuum logs

2017-08-29 Thread Masahiko Sawada

On Tue, Aug 29, 2017 at 10:16 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Mon, Aug 28, 2017 at 5:26 AM, Kyotaro HORIGUCHI
> <horiguchi.kyot...@lab.ntt.co.jp> wrote:
>> Currently the message shows the '%d skipped-frozen' message but
>> it is insufficient to verify the true effect. This is a patch to
>> show mode as 'aggressive' or 'normal' in the closing message of
>> vacuum. %d frozen-skipped when 'aggressive mode' shows the true
>> effect of ALL_FROZEN.
>>
>> I will add this patch to CF2017-09.
>
> I would be a bit inclined to somehow show aggressive if it's
> aggressive and not insert anything at all otherwise.  That'd probably
> require two separate translatable strings in each case, but maybe
> that's OK.
>
> What do other people think?

FWIW I prefer the Robert's idea; not insert anything if normal vacuum.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-08-28 Thread Masahiko Sawada

On Mon, Aug 28, 2017 at 4:59 PM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:
>
> Hello Masahiko-san,
>
> Patch applies cleanly, compiles, works for me.

Thank you for reviewing!

>
>>> At least: "Required to invoke" -> "Require to invoke".
>>
>>
>> Fixed.
>
>
> I meant the added one about -I, not the pre-existing one about -i.

Fixed.

>>> About the code:
>>>
>>> is_no_vacuum should be a bool?
>>
>>
>> We can change it but I think there is no difference actually. So
>> keeping it would be better.
>
>
> I would like to insist a little bit: the name was declared as an int but
> passed to init and used as a bool there before the patch. Conceptually what
> is meant is really a bool, and I see no added value bar saving a few strokes
> to have an int. ISTM that recent pgbench changes have started turning old
> int-for-bool habits into using bool when bool is meant.

Since is_no_vacuum is a existing one, if we follow the habit we should
change other similar variables as well: is_init_mode,
do_vacuum_accounts and debug. And I think we should change them in a
separated patch.

>
> initialize_cmds initialization: rather use pg_strdup instead of
> pg_malloc/strcpy?

Fixed.

> -I: pg_free before pg_strdup to avoid a small memory leak?

Fixed.

> I'm not sure I would have bothered with sizeof(char), but why not!
>
> I'm still a little bit annoyed by checkCustomCmds: when unhappy, it shows an
> error message and return false, which immediatly results in exit(1). However
> the pattern elsewhere in pgbench is to show the error and exit immediatly. I
> would suggest to simplify by void-ing the function and exiting instead of
> returning false.

Agreed, fixed.

After more thought, I'm bit inclined to not have a short option for
--custom-initialize because this option will be used for not primary
cases. It would be better to save short options for future
enhancements of pgbench. Thought?

Attached latest patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgbench_custom_initialization_v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-08-27 Thread Masahiko Sawada

On Sun, Aug 27, 2017 at 5:12 PM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:
>
> Hello Masahiko-san,
>
>> Attached latest v4 patch. Please review it.

Thank you for reviewing this patch!

>
> Patch applies, compiles.
>
> The messages/options do not seem to work properly:
>
>  sh> ./pgbench -i -I t
>  done.

Fixed this so that it ouptut "creating tables..." as you pointed out.

> Does not seem to have initialized the tables although it was requested...
>
>  sh> ./pgbench -i -I d
>  creating tables...
>  10 of 10 tuples (100%) done (elapsed 0.08 s, remaining 0.00 s)
>  done.
>
> It seems that "d" triggered table creation... In fact it seems that the
> work is done correctly, but the messages are not in the right place.

Fixed, but I just removed "creating tables..." from -I d command. I
think it's not good if we change the output messages by this patch.

> Also another issue:
>
>  sh> ./pgbench -i --foreign-keys
>  creating tables...
>  10 of 10 tuples (100%) done (elapsed 0.09 s, remaining 0.00 s)
>  vacuum...
>  set primary keys...
>  done.
>
> Foreign keys do not seem to have been set... Please check that all really
> work as expected.

Fixed.

>
> About the documentation:
>
> If a native English speaker could review the text, it would be great.
>
> At least: "Required to invoke" -> "Require to invoke".

Fixed.

>
>
> About the code:
>
> is_no_vacuum should be a bool?

We can change it but I think there is no difference actually. So
keeping it would be better.

>
> I'm really hesitating about the out of order processing of options. If the
> user writes
>
>   sh> pgbench -i --no-vacuum -I v
>   done.
>
> Then does it make sense to ignore the last thing the user asked for? ISTM
> that processing options in order and keeping the last resulting spec is more
> natural. Appending contradictory options can happen easily when scripting,
> and usual what is meant is the last one.

Agreed. I changed it so that it processes options in order and keeps
the last resulting spec.

>
> Again, as pointed out in the previous review, I do not like much
> checkCustomCmds implementation: switch/case, fprintf and return on error
> which will trigger another fprintf and error above... ISTM that you should
> either take into account previous comments or explain why you disagree with
> them, but not send the same code without addressing them in any way.

Sorry, I didn't mean to ignore, I'd just missed the comment. Fixed it.

Attached latest patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgbench_custom_initialization_v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Explicit relation name in VACUUM VERBOSE log

2017-08-25 Thread Masahiko Sawada

On Thu, Aug 24, 2017 at 11:35 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvhe...@2ndquadrant.com> writes:
>> Michael Paquier wrote:
>>> Hm. I am not sure what you have in mind here.
>
>> I'm thinking that this data is useful to analyze as a stream of related
>> events, rather than as individual data points.  Grepping logs in order to
>> extract the numbers is lame and slow.
>
> Yes.  And there is a bigger issue here, which is that the output of
> VACUUM VERBOSE is meant to be sent to the client for *human* readability.
> (As you noted, INFO-level messages don't normally go to the server log
> in the first place, and that's not by accident.)  Repeating the full table
> name in every line will be really annoying for that primary use-case.
> I am not sure what we want to do to address Masahiko-san's use-case, but
> ideally his workflow wouldn't involve log-scraping at all.

The use-case I had is that I run vacuumdb *without -j option* and save
all verbose logs into a text file, and then checking it later. I said
vacuumdb with -j option before but it was wrong. It cannot happen two
vacuum verbose logs are emitted mixed together even if -j option is
specified.

I sometimes search a particular table/index from the logs but also in
that case it was hard to distinguish logs. This is still not primary
case though.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ECPG: WHENEVER statement with DO CONTINUE action

2017-08-25 Thread Masahiko Sawada

On Fri, Aug 25, 2017 at 4:27 PM, vinayak
<pokale_vinayak...@lab.ntt.co.jp> wrote:
>
>
> On 2017/08/25 16:18, Masahiko Sawada wrote:
>>
>> On Fri, Aug 25, 2017 at 2:57 PM, vinayak
>> <pokale_vinayak...@lab.ntt.co.jp> wrote:
>>>
>>> Hi Sawada-san,
>>>
>>>
>>> On 2017/08/25 11:07, Masahiko Sawada wrote:
>>>>
>>>> On Fri, Aug 18, 2017 at 5:20 PM, vinayak
>>>> <pokale_vinayak...@lab.ntt.co.jp> wrote:
>>>>>
>>>>> On 2017/06/20 17:35, vinayak wrote:
>>>>>>
>>>>>> Hi Sawada-san,
>>>>>>
>>>>>> On 2017/06/20 17:22, Masahiko Sawada wrote:
>>>>>>>
>>>>>>> On Tue, Jun 20, 2017 at 1:51 PM, vinayak
>>>>>>> <pokale_vinayak...@lab.ntt.co.jp> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2017/06/12 13:09, vinayak wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On 2017/06/10 12:23, Vinayak Pokale wrote:
>>>>>>>>
>>>>>>>> Thank you for your reply
>>>>>>>>
>>>>>>>> On Jun 9, 2017 5:39 PM, "Michael Meskes" <mes...@postgresql.org>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Could you please add a "DO CONTINUE" case to one of the test cases?
>>>>>>>>> Or
>>>>>>>>> add a new one? We would need a test case IMO.
>>>>>>>>>
>>>>>>>> Yes I will add test case and send updated patch.
>>>>>>>>
>>>>>>>> I have added new test case for DO CONTINUE.
>>>>>>>> Please check the attached patch.
>>>>>>>>
>>>>>>>> I have added this in Sept. CF
>>>>>>>> https://commitfest.postgresql.org/14/1173/
>>>>>>>>
>>>>>>> --
>>>>>>> In whenever_do_continue.pgc file, the following line seems not to be
>>>>>>> processed successfully by ecpg but should we fix that?
>>>>>>>
>>>>>>> +
>>>>>>> +   exec sql whenever sqlerror continue;
>>>>>>> +
>>>>>>>
>>>>>>> Also, you wrote the test case using "WHENEVER sqlerror DO CONTINUE"
>>>>>>> action but that seems not to emit sqlerror, so "DO CONTINUE" is not
>>>>>>> executed. I think the test case for DO CONTINUE should be a C code
>>>>>>> that executes the "continue" clause.
>>>>>>
>>>>>> Thank you for testing the patch.
>>>>>> I agreed with your comments. I will update the patch.
>>>>>
>>>>> Please check the attached updated patch.
>>>>>
>>>> Thank you for updating.
>>>>
>>>> The regression test failed after applied latest patch by git am.
>>>>
>>>> ***
>>>> /tmp/pg/src/interfaces/ecpg/test/expected/preproc-whenever_do_continue.c
>>>>  2017-08-24 20:01:10.023201132 -0700
>>>> ---
>>>> /tmp/pg/src/interfaces/ecpg/test/results/preproc-whenever_do_continue.c
>>>>   2017-08-24 20:22:54.308200853 -0700
>>>> ***
>>>> *** 140,147 
>>>>   printf("%s %7.2f %9.2f\n", emp.ename, emp.sal,
>>>> emp.comm);
>>>>   }
>>>>
>>>> !   /* This 'CONTINUE' shuts off the 'DO CONTINUE' and allow the
>>>> program to
>>>> !proceed if any further errors do occur. */
>>>>   /* exec sql whenever sqlerror  continue ; */
>>>> #line 53 "whenever_do_continue.pgc"
>>>>
>>>> --- 140,147 
>>>>   printf("%s %7.2f %9.2f\n", emp.ename, emp.sal,
>>>> emp.comm);
>>>>   }
>>>>
>>>> !   /* This 'CONTINUE' shuts off the 'DO CONTINUE' and allow the
>>>> program to
>>>> !   proceed if any further errors do occur. */
>>>>   /* exec sql whenever sqlerror  continue ; */
>>>> #line 53 "whenever_do_continue.pgc"
>>>>
>>>> ==
>>>>
>>>> +   /* This 'CONTINUE' shuts off the 'DO CONTINUE' and allow the
>>>> program to
>>>> +   proceed if any further errors do occur. */
>>>>
>>>> I think this comment should obey the coding style guide.
>>>
>>> Thank you for testing.
>>>
>>> I have updated the patch.
>>> PFA.
>>>
>> Thank you for updating the patch. It seems not to incorporate my
>> second review comment. Attached an updated patch including a fix of a
>> comment style in whenever_do_continue.pgc file. Please find an
>> attached file.
>
> Sorry, I missed it.
> Thank you for fixing the comment style.
>

The v3 patch looks good to me. I've changed this patch status to Ready
for Committer.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ECPG: WHENEVER statement with DO CONTINUE action

2017-08-25 Thread Masahiko Sawada

On Fri, Aug 25, 2017 at 2:57 PM, vinayak
<pokale_vinayak...@lab.ntt.co.jp> wrote:
> Hi Sawada-san,
>
>
> On 2017/08/25 11:07, Masahiko Sawada wrote:
>>
>> On Fri, Aug 18, 2017 at 5:20 PM, vinayak
>> <pokale_vinayak...@lab.ntt.co.jp> wrote:
>>>
>>> On 2017/06/20 17:35, vinayak wrote:
>>>>
>>>> Hi Sawada-san,
>>>>
>>>> On 2017/06/20 17:22, Masahiko Sawada wrote:
>>>>>
>>>>> On Tue, Jun 20, 2017 at 1:51 PM, vinayak
>>>>> <pokale_vinayak...@lab.ntt.co.jp> wrote:
>>>>>>
>>>>>>
>>>>>> On 2017/06/12 13:09, vinayak wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On 2017/06/10 12:23, Vinayak Pokale wrote:
>>>>>>
>>>>>> Thank you for your reply
>>>>>>
>>>>>> On Jun 9, 2017 5:39 PM, "Michael Meskes" <mes...@postgresql.org>
>>>>>> wrote:
>>>>>>>
>>>>>>> Could you please add a "DO CONTINUE" case to one of the test cases?
>>>>>>> Or
>>>>>>> add a new one? We would need a test case IMO.
>>>>>>>
>>>>>> Yes I will add test case and send updated patch.
>>>>>>
>>>>>> I have added new test case for DO CONTINUE.
>>>>>> Please check the attached patch.
>>>>>>
>>>>>> I have added this in Sept. CF
>>>>>> https://commitfest.postgresql.org/14/1173/
>>>>>>
>>>>> --
>>>>> In whenever_do_continue.pgc file, the following line seems not to be
>>>>> processed successfully by ecpg but should we fix that?
>>>>>
>>>>> +
>>>>> +   exec sql whenever sqlerror continue;
>>>>> +
>>>>>
>>>>> Also, you wrote the test case using "WHENEVER sqlerror DO CONTINUE"
>>>>> action but that seems not to emit sqlerror, so "DO CONTINUE" is not
>>>>> executed. I think the test case for DO CONTINUE should be a C code
>>>>> that executes the "continue" clause.
>>>>
>>>> Thank you for testing the patch.
>>>> I agreed with your comments. I will update the patch.
>>>
>>> Please check the attached updated patch.
>>>
>> Thank you for updating.
>>
>> The regression test failed after applied latest patch by git am.
>>
>> ***
>> /tmp/pg/src/interfaces/ecpg/test/expected/preproc-whenever_do_continue.c
>> 2017-08-24 20:01:10.023201132 -0700
>> ---
>> /tmp/pg/src/interfaces/ecpg/test/results/preproc-whenever_do_continue.c
>>  2017-08-24 20:22:54.308200853 -0700
>> ***
>> *** 140,147 
>>  printf("%s %7.2f %9.2f\n", emp.ename, emp.sal, emp.comm);
>>  }
>>
>> !   /* This 'CONTINUE' shuts off the 'DO CONTINUE' and allow the
>> program to
>> !proceed if any further errors do occur. */
>>  /* exec sql whenever sqlerror  continue ; */
>>#line 53 "whenever_do_continue.pgc"
>>
>> --- 140,147 
>>  printf("%s %7.2f %9.2f\n", emp.ename, emp.sal, emp.comm);
>>  }
>>
>> !   /* This 'CONTINUE' shuts off the 'DO CONTINUE' and allow the
>> program to
>> !   proceed if any further errors do occur. */
>>  /* exec sql whenever sqlerror  continue ; */
>>#line 53 "whenever_do_continue.pgc"
>>
>> ==
>>
>> +   /* This 'CONTINUE' shuts off the 'DO CONTINUE' and allow the
>> program to
>> +   proceed if any further errors do occur. */
>>
>> I think this comment should obey the coding style guide.
>
> Thank you for testing.
>
> I have updated the patch.
> PFA.
>

Thank you for updating the patch. It seems not to incorporate my
second review comment. Attached an updated patch including a fix of a
comment style in whenever_do_continue.pgc file. Please find an
attached file.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


0001-WHENEVER-statement-DO-CONTINUE-support_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ECPG: WHENEVER statement with DO CONTINUE action

2017-08-24 Thread Masahiko Sawada

On Fri, Aug 18, 2017 at 5:20 PM, vinayak
<pokale_vinayak...@lab.ntt.co.jp> wrote:
>
> On 2017/06/20 17:35, vinayak wrote:
>>
>> Hi Sawada-san,
>>
>> On 2017/06/20 17:22, Masahiko Sawada wrote:
>>>
>>> On Tue, Jun 20, 2017 at 1:51 PM, vinayak
>>> <pokale_vinayak...@lab.ntt.co.jp> wrote:
>>>>
>>>>
>>>> On 2017/06/12 13:09, vinayak wrote:
>>>>
>>>> Hi,
>>>>
>>>> On 2017/06/10 12:23, Vinayak Pokale wrote:
>>>>
>>>> Thank you for your reply
>>>>
>>>> On Jun 9, 2017 5:39 PM, "Michael Meskes" <mes...@postgresql.org> wrote:
>>>>>
>>>>> Could you please add a "DO CONTINUE" case to one of the test cases? Or
>>>>> add a new one? We would need a test case IMO.
>>>>>
>>>> Yes I will add test case and send updated patch.
>>>>
>>>> I have added new test case for DO CONTINUE.
>>>> Please check the attached patch.
>>>>
>>>> I have added this in Sept. CF
>>>> https://commitfest.postgresql.org/14/1173/
>>>>
>>> --
>>> In whenever_do_continue.pgc file, the following line seems not to be
>>> processed successfully by ecpg but should we fix that?
>>>
>>> +
>>> +   exec sql whenever sqlerror continue;
>>> +
>>>
>>> Also, you wrote the test case using "WHENEVER sqlerror DO CONTINUE"
>>> action but that seems not to emit sqlerror, so "DO CONTINUE" is not
>>> executed. I think the test case for DO CONTINUE should be a C code
>>> that executes the "continue" clause.
>>
>> Thank you for testing the patch.
>> I agreed with your comments. I will update the patch.
>
> Please check the attached updated patch.
>

Thank you for updating.

The regression test failed after applied latest patch by git am.

*** /tmp/pg/src/interfaces/ecpg/test/expected/preproc-whenever_do_continue.c
   2017-08-24 20:01:10.023201132 -0700
--- /tmp/pg/src/interfaces/ecpg/test/results/preproc-whenever_do_continue.c
2017-08-24 20:22:54.308200853 -0700
***
*** 140,147 
printf("%s %7.2f %9.2f\n", emp.ename, emp.sal, emp.comm);
}

!   /* This 'CONTINUE' shuts off the 'DO CONTINUE' and allow the program to
!proceed if any further errors do occur. */
/* exec sql whenever sqlerror  continue ; */
  #line 53 "whenever_do_continue.pgc"

--- 140,147 
    printf("%s %7.2f %9.2f\n", emp.ename, emp.sal, emp.comm);
}

!   /* This 'CONTINUE' shuts off the 'DO CONTINUE' and allow the program to
!   proceed if any further errors do occur. */
/* exec sql whenever sqlerror  continue ; */
  #line 53 "whenever_do_continue.pgc"

==

+   /* This 'CONTINUE' shuts off the 'DO CONTINUE' and allow the program to
+   proceed if any further errors do occur. */

I think this comment should obey the coding style guide.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Quorum commit for multiple synchronous replication.

2017-08-24 Thread Masahiko Sawada

On Thu, Aug 24, 2017 at 3:11 AM, Josh Berkus <j...@berkus.org> wrote:
> On 08/22/2017 11:04 PM, Masahiko Sawada wrote:
>> WARNING:  what you did is ok, but you might have wanted to do something else
>>
>> First of all, whether or not that can properly be called a warning is
>> highly debatable.  Also, if you do that sort of thing to your spouse
>> and/or children, they call it "nagging".  I don't think users will
>> like it any more than family members do.
>
> Realistically, we'll support the backwards-compatible syntax for 3-5
> years.  Which is fine.
>
> I suggest that we just gradually deprecate the old syntax from the docs,
> and then around Postgres 16 eliminate it.  I posit that that's better
> than changing the meaning of the old syntax out from under people.
>

It seems to me that there is no folk who intently votes for making the
quorum commit the default. There some folks suggest to keep backward
compatibility in PG10 and gradually deprecate the old syntax. And only
the issuing from docs can be possible in PG10.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-08-24 Thread Masahiko Sawada

On Tue, Aug 22, 2017 at 5:15 PM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:
>
>>> Why not allow -I as a short option for --custom-initialize?
>>
>>
>> Other options for similar purpose such as --foreign-keys also have
>> only a long option. Since I think --custom-initialize option is in the
>> same context as other options I didn't add short option to it for now.
>> Because the options name is found by a prefix searching we can use a
>> short name --cu for now.
>
>
> Hmmm. I like short options:-)

Okay, I added -I option for custom initialization :)

>
>> I'm inclined to remove the restriction so that we can specify
>> --foreign-keys, --no-vacuum and --custom-initialize at the same time.
>
>
> Ok. I favor that as well.

If building foreign keys command is not specified in -I but
--foreign-keys options is specified (e.g. pgbench -i -I tpd
--foreign-keys) I think we can add 'f' to the end of the
initialization commands.

>
>> I think a list of char would be better here rather than a single
>> malloced string to remove particular initialization step easily.
>> Thought?
>
>
> My thought is not to bother with a list of char.
>
> To remove a char from a string, I suggest to allow ' ' to stand for nothing
> and be skipped, so that substituting a letter by space would simply remove
> an initialization phase.

I think we should not add/remove a command of initialization commands
during parsing pgbench options in order to not depend on its order.
Therefore, if -I,  --foreign-keys and --no-vacuum are specified at the
same time, what we do is removing some 'v' commands if novacuum and
adding a 'f' command if foreignkey. Also we expect that the length of
initialization steps would not long. Using malloced string would less
the work. Ok, I changed the patch so.

Attached latest v4 patch. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgbench_custom_initialization_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Combination with priority-based and quorum-based synchronous replications

2017-08-23 Thread Masahiko Sawada

Hi　all,

PostgreSQL 9.6 introduced the priority-based multiple synchronous
replication and PostgreSQL 10 introduced the quorum-based one.
Initially I was thinking to use both synchronous replication ways in
combination but it's not supported yet for now. It's useful for some
use cases where for example we have three replicas: nodeA, nodeB and
nodeC. Two of them (nodeA and nodeC) are for read replica and another
one (nodeB) is for the backup. In this case we would want to replicate
the data synchronously to nodeB while replicating data to the nodeA
and nodeC using quorum-based synchronous replication. To cover such a
use case we need a feature allowing us to use both in combination.
IIUC other use cases are also mentioned on earlier discussion.

To implement there are two ideas.
1. Use two synchronous replication ways in combination. For above
example, we can set s_s_names = 'First 2 (nodeB, Any1(nodeA, nodeC))'.
This approach allows us to set a nested set of nodes in s_s_names. We
can consider supporting the more nested solution but one nested level
would be enough for most cases. Also, it would be useful if one
synchronization method can have multiple another synchronization
method. For example, I can imagine a use case where each two backup
data centers have two replicas and we want to send the data
synchronously either one replica on each data center. We can set
s_s_name = 'First 2( Any 1(nodeA, nodeB), Any 1(nodeC, nodeD))'. It
might be over engineering but it would be worth to consider it.

2. Extend quorum-based synchronous replication so that we can specify
the node that we want to replicate the data synchronously. For above
example, if we can set s_s_names = 'Any 2 (nodeA, *nodeB, nodeC)' then
the master server wait for nodeB and either nodeA or nodeC. That is, a
special mark '*' means that the marked nodes is preferentially
selected as synchronous server. This approach is more simpler than #1
approach but we cannot nest more than one and a synchronization method
cannot have an another method..

Feedback and comment are very welcome.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 3 4 5 6 7 8 >

1 - 100 of 730 matches

Mail list logo