date:20170418

On Wed, Apr 19, 2017 at 1:52 PM, Masahiko Sawada  wrote:
> On Wed, Apr 19, 2017 at 12:34 PM, Noah Misch  wrote:
>> On Sun, Apr 16, 2017 at 07:25:28PM +0900, Fujii Masao wrote:
>>> On Sun, Apr 16, 2017 at 1:19 PM, Noah Misch  wrote:
>>> > On Fri, Apr 14, 2017 at 11:58:23PM -0400, Noah Misch wrote:
>>> >> On Wed, Apr 05, 2017 at 09:51:02PM -0400, Noah Misch wrote:
>>> >> > On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote:
>>> >>
>>> >> > > > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
>>> >> > > >> (3)
>>> >> > > >> The priority value is assigned to each standby listed in s_s_names
>>> >> > > >> even in quorum commit though those priority values are not used 
>>> >> > > >> at all.
>>> >> > > >> Users can see those priority values in pg_stat_replication.
>>> >> > > >> Isn't this confusing? If yes, it might be better to always assign 
>>> >> > > >> 1 as
>>> >> > > >> the priority, for example.
>>
>>> >> This PostgreSQL 10 open item is past due for your status update.  Kindly 
>>> >> send
>>> >> a status update within 24 hours, and include a date for your subsequent 
>>> >> status
>>> >> update.  Refer to the policy on open item ownership:
>>> >> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
>>
>>> >> > Since you do want (3) to change, please own it like any other open 
>>> >> > item,
>>> >> > including the mandatory status updates.
>>> >>
>>> >> Likewise.
>>>
>>> As I told firstly this is not a bug. There are some proposals for better 
>>> design
>>> of priority column in pg_stat_replication, but we've not reached the 
>>> consensus
>>> yet. So I think that it's better to move this open item to "Design 
>>> Decisions to
>>> Recheck Mid-Beta" section so that we can hear more opinions.
>>
>> I'm reading that some people want to report NULL priority, some people want 
>> to
>> report a constant 1 priority, and nobody wants the current behavior.  Is that
>> an accurate summary?
>
> Yes, I think that's correct.

Just adding that I am the only one advocating for switching the
priority number to NULL for async standbys, and that this proposal is
visibly outvoted as it breaks backward-compatibility with the
0-priority setting.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure in REL9_5_STABLE

2017-04-18 Thread Andrew Gierth

> "Pavan" == Pavan Deolasee  writes:

 Pavan> I am attaching a patch that throws a similar ERROR during
 Pavan> planning even for 9.5. AFAICS in presence of grouping sets, we
 Pavan> always decide to use sort-based implementation for grouping, but
 Pavan> do not check if the columns support ordering or not.

Hmmm. This is obviously my fault somewhere... the patch looks good, I'll
deal with it shortly.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication and PANIC during shutdown checkpoint in publisher

On Tue, Apr 18, 2017 at 3:27 AM, Peter Eisentraut
 wrote:
> I'd imagine the postmaster would tell the walsender that it has started
> shutdown, and then the walsender would reject $certain_things.  But I
> don't see an existing way for the walsender to know that shutdown has
> been initiated.  SIGINT is still free ...

The WAL sender receives SIGUSR2 from the postmaster when shutdown is
initiated, so why not just rely on that and issue an ERROR when a
client attempts to create or drop a new slot, setting up
walsender_ready_to_stop unconditionally? It seems to me that the issue
here is the delay between the moment SIGTERM is acknowledged by the
WAL sender and the moment CREATE_SLOT is treater. An idea with the
attached...

> The alternative of shutting down logical walsenders earlier also doesn't
> look straightforward, since the postmaster doesn't know directly what
> kind of walsender a certain process is.  So you'd also need additional
> signal types or something there.

Yup, but is a switchover between a publisher and a subscriber
something that can happen?
-- 
Michael

walsender-shutdown-fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Assertion failure in REL9_5_STABLE

2017-04-18 Thread Pavan Deolasee

Hi All,

I stumbled upon this test case from the master that sets an assertion
failure (see stack trace at the end) on REL9_5_STABLE.

create temp table gstest4(id integer, v integer,
  unhashable_col bit(4), unsortable_col xid);
insert into gstest4
values (1,1,b'','1'), (2,2,b'0001','1'),
   (3,4,b'0010','2'), (4,8,b'0011','2'),
   (5,16,b'','2'), (6,32,b'0001','2'),
   (7,64,b'0010','1'), (8,128,b'0011','1');

-- mixed hashable/sortable cases
select unhashable_col, unsortable_col,
   grouping(unhashable_col, unsortable_col),
   count(*), sum(v)
  from gstest4 group by grouping sets ((unhashable_col),(unsortable_col))
 order by 3, 5;

I tested this with REL9_6_STABLE and it throws a more useful error message,
presumably because the code was refactored quite heavily for upper planner
changes.

ERROR:  could not implement GROUP BY
DETAIL:  Some of the datatypes only support hashing, while others only
support sorting.

I am attaching a patch that throws a similar ERROR during planning even for
9.5. AFAICS in presence of grouping sets, we always decide to use
sort-based implementation for grouping, but do not check if the columns
support ordering or not.

I did not test it for other older branches because grouping sets were
introduced in 9.5.

Thanks,
Pavan


(lldb) bt
* thread #1: tid = 0x, 0x7fff9c1dfdda
libsystem_kernel.dylib`__pthread_kill + 10, stop reason = signal SIGSTOP
  * frame #0: 0x7fff9c1dfdda libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x7fff9c2cb787 libsystem_pthread.dylib`pthread_kill + 90
frame #2: 0x7fff9c145420 libsystem_c.dylib`abort + 129
frame #3: 0x00010f61a3f0
postgres`ExceptionalCondition(conditionName="!(sortOperators[i] != 0)",
errorType="BadArgument", fileName="tuplesort.c", lineNumber=657) + 128 at
assert.c:54
frame #4: 0x00010f65a07d
postgres`tuplesort_begin_heap(tupDesc=0x7fb1528194d8, nkeys=1,
attNums=0x7fb15280e9e0, sortOperators=0x7fb15280ea00,
sortCollations=0x7fb15280ea20, nullsFirstFlags="", workMem=4096,
randomAccess='\0') + 509 at tuplesort.c:657
frame #5: 0x00010f2e871d
postgres`initialize_phase(aggstate=0x7fb152817828, newphase=0) + 477 at
nodeAgg.c:456
frame #6: 0x00010f2e73f0
postgres`ExecInitAgg(node=0x7fb1528062e8, estate=0x7fb152817440,
eflags=16) + 2656 at nodeAgg.c:2223
frame #7: 0x00010f2d18e1
postgres`ExecInitNode(node=0x7fb1528062e8, estate=0x7fb152817440,
eflags=16) + 865 at execProcnode.c:296
frame #8: 0x00010f301231
postgres`ExecInitSort(node=0x7fb152806400, estate=0x7fb152817440,
eflags=16) + 225 at nodeSort.c:202
frame #9: 0x00010f2d18a9
postgres`ExecInitNode(node=0x7fb152806400, estate=0x7fb152817440,
eflags=16) + 809 at execProcnode.c:286
frame #10: 0x00010f2ccf20
postgres`InitPlan(queryDesc=0x7fb152803c40, eflags=16) + 1520 at
execMain.c:957
frame #11: 0x00010f2cc81f
postgres`standard_ExecutorStart(queryDesc=0x7fb152803c40, eflags=16) +
591 at execMain.c:237
frame #12: 0x00010f2cc5be
postgres`ExecutorStart(queryDesc=0x7fb152803c40, eflags=0) + 62 at
execMain.c:139
frame #13: 0x00010f48b112
postgres`PortalStart(portal=0x7fb151836c40, params=0x,
eflags=0, snapshot=0x) + 722 at pquery.c:533
frame #14: 0x00010f4871c1
postgres`exec_simple_query(query_string="select unhashable_col,
unsortable_col,\n   grouping(unhashable_col, unsortable_col),\n
count(*), sum(v)\n  from gstest4 group by grouping sets
((unhashable_col),(unsortable_col))\n order by 3, 5;") + 945 at
postgres.c:1065
frame #15: 0x00010f486525 postgres`PostgresMain(argc=1,
argv=0x7fb151801f00, dbname="postgres", username="pavan") + 2245 at
postgres.c:4051
frame #16: 0x00010f3ef392
postgres`BackendRun(port=0x7fb151700540) + 674 at postmaster.c:4254
frame #17: 0x00010f3ee64d
postgres`BackendStartup(port=0x7fb151700540) + 365 at postmaster.c:3928
frame #18: 0x00010f3ed705 postgres`ServerLoop + 597 at
postmaster.c:1698
frame #19: 0x00010f3eb066 postgres`PostmasterMain(argc=3,
argv=0x7fb151403740) + 5414 at postmaster.c:1306
frame #20: 0x00010f32bddf postgres`main(argc=3,
argv=0x7fb151403740) + 751 at main.c:228
frame #21: 0x7fff9c0b1255 libdyld.dylib`start + 1
frame #22: 0x7fff9c0b1255 libdyld.dylib`start + 1


-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pg95_assertion_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Quorum commit for multiple synchronous replication.

On Wed, Apr 19, 2017 at 12:34 PM, Noah Misch  wrote:
> On Sun, Apr 16, 2017 at 07:25:28PM +0900, Fujii Masao wrote:
>> On Sun, Apr 16, 2017 at 1:19 PM, Noah Misch  wrote:
>> > On Fri, Apr 14, 2017 at 11:58:23PM -0400, Noah Misch wrote:
>> >> On Wed, Apr 05, 2017 at 09:51:02PM -0400, Noah Misch wrote:
>> >> > On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote:
>> >>
>> >> > > > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
>> >> > > >> (3)
>> >> > > >> The priority value is assigned to each standby listed in s_s_names
>> >> > > >> even in quorum commit though those priority values are not used at 
>> >> > > >> all.
>> >> > > >> Users can see those priority values in pg_stat_replication.
>> >> > > >> Isn't this confusing? If yes, it might be better to always assign 
>> >> > > >> 1 as
>> >> > > >> the priority, for example.
>
>> >> This PostgreSQL 10 open item is past due for your status update.  Kindly 
>> >> send
>> >> a status update within 24 hours, and include a date for your subsequent 
>> >> status
>> >> update.  Refer to the policy on open item ownership:
>> >> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
>
>> >> > Since you do want (3) to change, please own it like any other open item,
>> >> > including the mandatory status updates.
>> >>
>> >> Likewise.
>>
>> As I told firstly this is not a bug. There are some proposals for better 
>> design
>> of priority column in pg_stat_replication, but we've not reached the 
>> consensus
>> yet. So I think that it's better to move this open item to "Design Decisions 
>> to
>> Recheck Mid-Beta" section so that we can hear more opinions.
>
> I'm reading that some people want to report NULL priority, some people want to
> report a constant 1 priority, and nobody wants the current behavior.  Is that
> an accurate summary?

Yes, I think that's correct.

FWIW the reason of current behavior is that it would be useful for the
user who is willing to switch from ANY to FIRST. They can know which
standbys will become sync or potential.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should pg_current_wal_location() become pg_current_wal_lsn()

On Wed, Apr 19, 2017 at 12:36 PM, David Rowley
 wrote:
> In favour of "location" -> "lsn": Tom, Stephen, David Steel
> In favour of "lsn" -> "location": Peter, Kyotaro

I vote for "location" -> "lsn". I would expect complains about the
current inconsistency at some point, and the function names have been
already changed for this release..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fixup some misusage of appendStringInfo and friends

2017-04-18 Thread David Rowley

The attached cleans up a few small misusages of appendStringInfo and
related functions.

Some similar work was done in,

f92d6a540ac443f85f0929b284edff67da14687a
d02f16470f117db3038dbfd87662d5f0eb5a2a9b
cacbdd78106526d7c4f11f90b538f96ba8696fb0

so the majority of these should all be new misuseages.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


appendStringInfo_fixes.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inadequate parallel-safety check for SubPlans

Amit Kapila  writes:
> On Wed, Apr 19, 2017 at 1:27 AM, Tom Lane  wrote:
>> I think leaving that sort of thing out is just creating a latent bug
>> that is certain to bite you later.  It's true that as long as the args
>> list contains only Vars, it would never be parallel-unsafe --- but
>> primnodes.h is pretty clear that one shouldn't assume that it will
>> stay that way.

> Sure, but the point I was trying to make was whenever subplan has
> args, I think it won't be parallel-safe as those args are used to pass
> params.

Right now, yes, but surely we're going to be trying to relax that sometime
soon.  And at that point it would be a latent bug for this function to
not recurse into the args list.  Whatever the restrictions are on the
tree as a whole, they'll apply to that subtree too.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should pg_current_wal_location() become pg_current_wal_lsn()

2017-04-18 Thread David Rowley

On 19 April 2017 at 15:31, Tom Lane  wrote:
> David Rowley  writes:
>> OK, so I've read over this thread again and I think it's time to
>> summarise the votes:
>> ...
>> In favour of "location" -> "lsn": Stephen, David Steel,
>> In favour of "lsn" -> "location": Peter, Tom, Kyotaro
>
> FWIW, I was not voting in favor of "location"; I was just saying that
> I wanted consistency.  If we're voting which way to move, please count
> me as a vote for "lsn".

Updated votes:

In favour of "location" -> "lsn": Tom, Stephen, David Steel
In favour of "lsn" -> "location": Peter, Kyotaro

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Quorum commit for multiple synchronous replication.

2017-04-18 Thread Noah Misch

On Sun, Apr 16, 2017 at 07:25:28PM +0900, Fujii Masao wrote:
> On Sun, Apr 16, 2017 at 1:19 PM, Noah Misch  wrote:
> > On Fri, Apr 14, 2017 at 11:58:23PM -0400, Noah Misch wrote:
> >> On Wed, Apr 05, 2017 at 09:51:02PM -0400, Noah Misch wrote:
> >> > On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote:
> >>
> >> > > > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
> >> > > >> (3)
> >> > > >> The priority value is assigned to each standby listed in s_s_names
> >> > > >> even in quorum commit though those priority values are not used at 
> >> > > >> all.
> >> > > >> Users can see those priority values in pg_stat_replication.
> >> > > >> Isn't this confusing? If yes, it might be better to always assign 1 
> >> > > >> as
> >> > > >> the priority, for example.

> >> This PostgreSQL 10 open item is past due for your status update.  Kindly 
> >> send
> >> a status update within 24 hours, and include a date for your subsequent 
> >> status
> >> update.  Refer to the policy on open item ownership:
> >> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

> >> > Since you do want (3) to change, please own it like any other open item,
> >> > including the mandatory status updates.
> >>
> >> Likewise.
> 
> As I told firstly this is not a bug. There are some proposals for better 
> design
> of priority column in pg_stat_replication, but we've not reached the consensus
> yet. So I think that it's better to move this open item to "Design Decisions 
> to
> Recheck Mid-Beta" section so that we can hear more opinions.

I'm reading that some people want to report NULL priority, some people want to
report a constant 1 priority, and nobody wants the current behavior.  Is that
an accurate summary?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should pg_current_wal_location() become pg_current_wal_lsn()

David Rowley  writes:
> OK, so I've read over this thread again and I think it's time to
> summarise the votes:
> ...
> In favour of "location" -> "lsn": Stephen, David Steel,
> In favour of "lsn" -> "location": Peter, Tom, Kyotaro

FWIW, I was not voting in favor of "location"; I was just saying that
I wanted consistency.  If we're voting which way to move, please count
me as a vote for "lsn".

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inadequate parallel-safety check for SubPlans

2017-04-18 Thread Amit Kapila

On Wed, Apr 19, 2017 at 1:27 AM, Tom Lane  wrote:
> Amit Kapila  writes:
>> I have ended up doing something along the lines suggested by you (or
>> at least what I have understood from your e-mail).  Basically, pass
>> the safe-param-ids list to parallel safety function and decide based
>> on that if Param node reference in input expression is safe.
>
> I did not like changing the signature of is_parallel_safe() for this:
> that's getting a lot of call sites involved in something they don't care
> about, and I'm not even sure that it's formally correct.  The key thing
> here is that a Param could only be considered parallel-safe in context.
> That is, if you looked at the testexpr alone and asked if it could be
> pushed down to a worker, the answer would have to be "no".  Only when
> you look at the whole SubPlan tree and ask if it can be pushed down
> should the answer be "yes".  Therefore, I feel pretty strongly that
> what we want is for the safe_param_ids whitelist to be mutated
> internally in is_parallel_safe() as it descends the tree.  That way
> it can give the right answer when considering a fragment of a plan.
> I've committed a patch that does it like that.
>

Thanks.

>> ... Also, I think we don't need to check
>> parallel-safety for splan->args as that also is related to correlated
>> queries for which parallelism is not supported as of now.
>
> I think leaving that sort of thing out is just creating a latent bug
> that is certain to bite you later.  It's true that as long as the args
> list contains only Vars, it would never be parallel-unsafe --- but
> primnodes.h is pretty clear that one shouldn't assume that it will
> stay that way.

Sure, but the point I was trying to make was whenever subplan has
args, I think it won't be parallel-safe as those args are used to pass
params.

>  So it's better to recurse over the whole tree rather
> than ignoring parts of it, especially if you're not going to document
> the assumption you're making anywhere.
>

No problem.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should pg_current_wal_location() become pg_current_wal_lsn()

2017-04-18 Thread David Rowley

On 19 April 2017 at 03:33, David Steele  wrote:
> +1, and I'm in favor of using "lsn" wherever applicable.  It seems to me
> that if a user calls a function (or queries a table) that returns an lsn
> then they should be aware of what an lsn is.

OK, so I've read over this thread again and I think it's time to
summarise the votes:

It seem that;

Robert mentions he'd be inclined not to tinker with this, but mentions
nothing of inconsistency.
Bruce mentions he'd like consistency but does not mention which he'd prefer.
Stephen wants consistency and votes to change "location" to "lsn" in
regards to a pg_lsn.
Peter would rather see use of "location", mentions about changing the
new in v10 stuff, but not about the existing 9.6 usages of lsn in
column names
Tom would not object to use of "location" over "lsn".
Kyotaro would rather see the use of "location" for all apart from the
pg_lsn operator functions. Unsure how we handle pg_wal_location_diff()
David Steel would rather see this changed to use "lsn" instead of location.

So in summary, nobody apart from Robert appeared to have any concerns
over breaking backward compatibility.

In favour of "location" -> "lsn": Stephen, David Steel,
In favour of "lsn" -> "location": Peter, Tom, Kyotaro

I left Bruce out since he just voted for consistency.

So "lsn" -> "location" seems to be winning

Is that enough to proceed?

Anyone else?

The patch to do this should likely also include a regression test to
ensure nothing new creeps in which breaks the new standard.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Failed recovery with new faster 2PC code

On Wed, Apr 19, 2017 at 11:09 AM, Jeff Janes  wrote:
> Would this bug have been seen in a replica server in the absence of crashes,
> or was it only vulnerable during crash recovery rather than streaming
> replication?

This issue could have been seen on a streaming standby as well,
letting around a TwoPhaseState impacts as well their linked PGPROC so
CLOG truncation would have been messed up as well. That's also the
case of the first issue involving as well incorrect XID updates,
though the chances to see it are I think lower as only the beginning
of recovery was impacted.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Use sync commit for logical replication apply in TAP tests

Hi,

The commit 887227a1c changed the defaults for subscriptions to do async
commit. But since the tests often wait for disk flush and there is no
concurrent activity this has increased the amount of time needed for
each test. So the attached patch changes the subscriptions create in tab
tests to use sync commit which improves performance there (because we
also replicate only very few transactions in the tests).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From a4cebdf95d1f8a58d12f2f3824dffaae1dbf435d Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Wed, 19 Apr 2017 03:55:17 +0200
Subject: [PATCH] Change logical replication TAP tests to use sync commit

This speeds up the test run since we are often waiting on flush to disk.
---
 src/test/subscription/t/001_rep_changes.pl | 2 +-
 src/test/subscription/t/002_types.pl   | 2 +-
 src/test/subscription/t/003_constraints.pl | 2 +-
 src/test/subscription/t/004_sync.pl| 8 
 src/test/subscription/t/005_encoding.pl| 2 +-
 5 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/test/subscription/t/001_rep_changes.pl 
b/src/test/subscription/t/001_rep_changes.pl
index d1817f5..8f791f1 100644
--- a/src/test/subscription/t/001_rep_changes.pl
+++ b/src/test/subscription/t/001_rep_changes.pl
@@ -48,7 +48,7 @@ $node_publisher->safe_psql('postgres',
 
 my $appname = 'tap_sub';
 $node_subscriber->safe_psql('postgres',
-   "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr 
application_name=$appname' PUBLICATION tap_pub, tap_pub_ins_only");
+   "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr 
application_name=$appname' PUBLICATION tap_pub, tap_pub_ins_only WITH 
(synchronous_commit = local)");
 
 # Wait for subscriber to finish initialization
 my $caughtup_query =
diff --git a/src/test/subscription/t/002_types.pl 
b/src/test/subscription/t/002_types.pl
index ad15e85..2f89fdc 100644
--- a/src/test/subscription/t/002_types.pl
+++ b/src/test/subscription/t/002_types.pl
@@ -103,7 +103,7 @@ $node_publisher->safe_psql('postgres',
 
 my $appname = 'tap_sub';
 $node_subscriber->safe_psql('postgres',
-   "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr 
application_name=$appname' PUBLICATION tap_pub WITH (SLOT NAME = 
tap_sub_slot)");
+   "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr 
application_name=$appname' PUBLICATION tap_pub WITH (SLOT NAME = tap_sub_slot, 
synchronous_commit = local)");
 
 # Wait for subscriber to finish initialization
 my $caughtup_query =
diff --git a/src/test/subscription/t/003_constraints.pl 
b/src/test/subscription/t/003_constraints.pl
index 11b8254..2b43263 100644
--- a/src/test/subscription/t/003_constraints.pl
+++ b/src/test/subscription/t/003_constraints.pl
@@ -34,7 +34,7 @@ $node_publisher->safe_psql('postgres',
 
 my $appname = 'tap_sub';
 $node_subscriber->safe_psql('postgres',
-   "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr 
application_name=$appname' PUBLICATION tap_pub WITH (NOCOPY DATA)");
+   "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr 
application_name=$appname' PUBLICATION tap_pub WITH (NOCOPY DATA, 
synchronous_commit = local)");
 
 # Wait for subscriber to finish initialization
 my $caughtup_query =
diff --git a/src/test/subscription/t/004_sync.pl 
b/src/test/subscription/t/004_sync.pl
index 87541a0..57c9e4c 100644
--- a/src/test/subscription/t/004_sync.pl
+++ b/src/test/subscription/t/004_sync.pl
@@ -32,7 +32,7 @@ $node_publisher->safe_psql('postgres',
 
 my $appname = 'tap_sub';
 $node_subscriber->safe_psql('postgres',
-   "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr 
application_name=$appname' PUBLICATION tap_pub");
+   "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr 
application_name=$appname' PUBLICATION tap_pub WITH (synchronous_commit = 
local)");
 
 # Wait for subscriber to finish initialization
 my $caughtup_query =
@@ -58,7 +58,7 @@ $node_publisher->safe_psql('postgres',
 
 # recreate the subscription, it will try to do initial copy
 $node_subscriber->safe_psql('postgres',
-   "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr 
application_name=$appname' PUBLICATION tap_pub");
+   "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr 
application_name=$appname' PUBLICATION tap_pub WITH (synchronous_commit = 
local)");
 
 # but it will be stuck on data copy as it will fail on constraint
 my $started_query =
@@ -81,7 +81,7 @@ is($result, qq(20), 'initial data synced for second sub');
 
 # now check another subscription for the same node pair
 $node_subscriber->safe_psql('postgres',
-   "CREATE SUBSCRIPTION tap_sub2 CONNECTION '$publisher_connstr 
application_name=$appname' PUBLICATION tap_pub WITH (NOCOPY DATA)");
+   "CREATE SUBSCRIPTION tap_sub2 CONNECTION '$publisher_connstr 
application_name=$appname' PUBLICATION tap_pub WITH

Re: [HACKERS] some review comments on logical rep code

On 18/04/17 19:27, Fujii Masao wrote:
> On Wed, Apr 19, 2017 at 1:35 AM, Petr Jelinek
>  wrote:
>> Thank you for working on this!
>>
>> On 18/04/17 10:16, Masahiko Sawada wrote:
>>> On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
>>>  wrote:
>>
 10.
>
> SpinLockAcquire(>relmutex);
> MyLogicalRepWorker->relstate =
>   GetSubscriptionRelState(MyLogicalRepWorker->subid,
> MyLogicalRepWorker->relid,
> >relstate_lsn,
> false);
> SpinLockRelease(>relmutex);
>
> Non-"short-term" function like GetSubscriptionRelState() should not
> be called while spinlock is being held.
>

 One option is adding new LWLock for the relation state change but this
 lock is used at many locations where the operation actually doesn't
 take time. I think that the discussion would be needed to get
 consensus, so patch for it is not attached.
>>>
>>> From the point of view of time, I agree that it doesn't seem to
>>> harm. Bt I thing it as more significant problem that
>>> potentially-throwable function is called within the mutex
>>> region. It potentially causes a kind of dead lock.  (That said,
>>> theoretically dosn't occur and I'm not sure what happens by the
>>> dead lock..)
>>>
>>
>> Hmm, I think doing what I attached should be fine here.
> 
> Thanks for the patch!
> 
>> We don't need to
>> hold spinlock for table read, just for changing the
>> MyLogicalRepWorker->relstate.
> 
> Yes, but the update of MyLogicalRepWorker->relstate_lsn also should
> be protected with the spinlock.
> 

Yes, sorry tired/blind, fixed.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/replication/logical/tablesync.c 
b/src/backend/replication/logical/tablesync.c
index 108326b..7262709 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -703,19 +703,22 @@ copy_table(Relation rel)
 char *
 LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
 {
-   char   *slotname;
-   char   *err;
+   char   *slotname;
+   char   *err;
+   charrelstate;
+   XLogRecPtr  relstate_lsn;
 
/* Check the state of the table synchronization. */
StartTransactionCommand();
+   relstate = GetSubscriptionRelState(MyLogicalRepWorker->subid,
+  
MyLogicalRepWorker->relid,
+  
_lsn, false);
+   CommitTransactionCommand();
+
SpinLockAcquire(>relmutex);
-   MyLogicalRepWorker->relstate =
-   GetSubscriptionRelState(MyLogicalRepWorker->subid,
-   
MyLogicalRepWorker->relid,
-   
>relstate_lsn,
-   false);
+   MyLogicalRepWorker->relstate = relstate;
+   MyLogicalRepWorker->relstate_lsn = relstate_lsn;
SpinLockRelease(>relmutex);
-   CommitTransactionCommand();
 
/*
 * To build a slot name for the sync work, we are limited to 
NAMEDATALEN -

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tablesync patch broke the assumption that logical rep depends on?

On 17/04/17 20:09, Peter Eisentraut wrote:
> On 4/16/17 22:01, Noah Misch wrote:
>> This PostgreSQL 10 open item is past due for your status update.  Kindly send
>> a status update within 24 hours, and include a date for your subsequent 
>> status
>> update.  Refer to the policy on open item ownership:
>> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> 
> I think we're not really sure yet what to do about this.  Discussion is
> ongoing.  I'll report back on Wednesday.
> 

So my idea was to add some kind of inuse flag. This turned out to be bit
more complicated in terms of how to clean it than I would have hoped.
This is due to the fact that there is no way to reliably tell if worker
has failed to start if the parent worker crashed while waiting.

My solution to that is to use similar logic to autovacuum where we use
timeout for worker to attach to shmem. We do this only if there is no
free slot found when launch of replication worker was requested.

While working on this patch I also noticed other subtle concurrency
issues and fixed them as well - stopping worker that hasn't finished
starting yet wasn't completely concurrency safe and limiting sync
workers per subscription theoretically wasn't either (although I don't
think it could happen in practice).

I do wonder now though if it's such a good idea to have the
BackgroundWorkerHandle private to the bgworker.c given that this is the
3rd time (twice before was outside of postgres core) I had to write
similar generation mechanism that it uses for unique bgworker
authentication inside the process which started it. It would have been
much easier if I could just save the BackgroundWorkerHandle itself to
shmem so it could be used across processes instead of having to reinvent
it every time.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From 6d32379cda7d9f69c42185b9684d32570554f4e3 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Wed, 19 Apr 2017 03:48:53 +0200
Subject: [PATCH] Fix various concurrency issues in logical replication worker
 launching

The code was originally written with assumption that launcher is the
only proccess starting the worker. However that hasn't been true since
commit 7c4f52409 which failed to modify the worker management code
adequately.

This patch adds in_use field to LogicalRepWorker struct to indicate if
the worker slot is being used and uses proper locking everywhere this
flag is set or read.

However if the parent process which dies while the new worker is
starting and the new worker failes to attach to shared memory, this flag
would never get cleared. We solve this rare corner case by adding sort
of garbage collector for in_use slots. This uses another filed in the
LogicalRepWorker struct named launch_time which contains timestamp of
when the worker has been started. If any request to start a new worker
does not find free slot, we'll check for workers that were supposed to
start but took long to actually do so and reuse their slot.

In passing also fix possible race conditions when stopping worker that
hasn't finished starting yet.
---
 src/backend/replication/logical/launcher.c | 167 +++--
 src/include/replication/worker_internal.h  |   9 ++
 2 files changed, 141 insertions(+), 35 deletions(-)

diff --git a/src/backend/replication/logical/launcher.c 
b/src/backend/replication/logical/launcher.c
index f6ae610..41f8cfe 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -38,6 +38,7 @@
 #include "replication/logicallauncher.h"
 #include "replication/logicalworker.h"
 #include "replication/slot.h"
+#include "replication/walreceiver.h"
 #include "replication/worker_internal.h"

 #include "storage/ipc.h"
@@ -76,6 +77,7 @@ static void ApplyLauncherWakeup(void);
 static void logicalrep_launcher_onexit(int code, Datum arg);
 static void logicalrep_worker_onexit(int code, Datum arg);
 static void logicalrep_worker_detach(void);
+static void logicalrep_worker_cleanup(LogicalRepWorker *worker);

 /* Flags set by signal handlers */
 volatile sig_atomic_t got_SIGHUP = false;
@@ -154,15 +156,19 @@ get_subscription_list(void)
 /*
  * Wait for a background worker to start up and attach to the shmem context.
  *
- * This is like WaitForBackgroundWorkerStartup(), except that we wait for
- * attaching, not just start and we also just exit if postmaster died.
+ * This is only needed for cleaning up the shared memory in case the worker
+ * fails to attach.
  */
-static bool
+static void
 WaitForReplicationWorkerAttach(LogicalRepWorker *worker,

BackgroundWorkerHandle *handle)
 {
BgwHandleStatus status;
int rc;
+   uint16  generation;
+
+   /* Remember generation for future identification. */
+   generation =

Re: [HACKERS] Failed recovery with new faster 2PC code

2017-04-18 Thread Jeff Janes

On Tue, Apr 18, 2017 at 1:17 AM, Nikhil Sontakke 
wrote:

> Hi,
>
> There was a bug in the redo 2PC remove code path. Because of which,
> autovac would think that the 2PC is gone and cause removal of the
> corresponding clog entry earlier than needed.
>
> Please find attached, the bug fix: 2pc_redo_remove_bug.patch.
>
> I have been testing this on top of Michael's 2pc-restore-fix.patch and
> things seem to be ok for the past one+ hour. Will keep it running for long.
>
> Jeff, thanks for these very useful scripts. I am going to make a habit to
> run these scripts on my side from now on. Do you have any other script that
> I could try against these patches? Please let me know.
>

This script is the only one I have that specifically targets 2PC.  I wrote
it last year when the previous round of speed-up code (which avoided
writing the files upon "PREPARE" by delaying them until the next
checkpoint) was developed.  I just decided to dust that test off to try
again here.  I don't know how to change it to make it more targeted towards
this set of patches.  Would this bug have been seen in a replica server in
the absence of crashes, or was it only vulnerable during crash recovery
rather than streaming replication?

Cheers,

Jeff

Re: [HACKERS] Failed recovery with new faster 2PC code

2017-04-18 Thread Jeff Janes

On Tue, Apr 18, 2017 at 5:38 AM, Simon Riggs  wrote:

> On 18 April 2017 at 13:12, Michael Paquier 
> wrote:
> > On Tue, Apr 18, 2017 at 7:54 PM, Simon Riggs 
> wrote:
> >> Yeh, this is better. Pushed.
> >
> > I have been outraced on this one, the error is obvious once you see it ;)
>
> Didn't realise you were working on it, nothing competitive about it.
>
> It's clear this needed fixing, whether or not it fixes Jeff's report.
>
> I do think it explains the report, so I'm hopeful.
>


The git HEAD code (c727f12) has been surviving so far, with both test cases.

Thanks,

Jeff

[HACKERS] [PATCH] Push limit to sort through a subquery

2017-04-18 Thread Douglas Doole

We've hit a case where pass_down_bound() isn't pushing the row count limit
from limit into sort. The issue is that we're getting a subquery scan node
between the limit and the sort. The subquery is only doing column
projection and has no quals or SRFs so it should be safe to push the limit
into the sort.

The query that hit the problem can be simplified to:

   SELECT * FROM (SELECT A,B FROM T ORDER BY C) LIMIT 5

(Yeah, the query's a little screwy in that the ORDER BY should really be
outside the subselect, but it came from a query generator, so that's a
different conversation.)

Proposed patch is attached.

- Doug
Salesforce
*** a/src/backend/executor/nodeLimit.c
--- b/src/backend/executor/nodeLimit.c
***
*** 304,309  recompute_limits(LimitState *node)
--- 304,311 
   * since the MergeAppend surely need read no more than that many tuples from
   * any one input.  We also have to be prepared to look through a Result,
   * since the planner might stick one atop MergeAppend for projection purposes.
+  * We can also accept a subquery that has no quals or SRFs (that is, the
+  * subquery is just projecting columns) between the LIMIT and any of the above.
   *
   * This is a bit of a kluge, but we don't have any more-abstract way of
   * communicating between the two nodes; and it doesn't seem worth trying
***
*** 316,321  recompute_limits(LimitState *node)
--- 318,343 
  static void
  pass_down_bound(LimitState *node, PlanState *child_node)
  {
+ 	/*
+ 	 * If the child is a subquery that does no filtering (no predicates)
+ 	 * and does not have any SRFs in the target list then we can potentially
+ 	 * push the limit through the subquery.
+ 	 */
+ 	if (IsA(child_node, SubqueryScanState))
+ 	{
+ 		SubqueryScanState *subqueryScanState = (SubqueryScanState *) child_node;
+ 
+ 		/*
+ 		 * Non-empty predicates or an SRF means we cannot push down the limit.
+ 		 */
+ 		if (subqueryScanState->ss.ps.qual != NULL ||
+ 			expression_returns_set((Node *) child_node->plan->targetlist))
+ 			return;
+ 
+ 		/* Use the child in the following checks */
+ 		child_node = subqueryScanState->subplan;
+ 	}
+ 
  	if (IsA(child_node, SortState))
  	{
  		SortState  *sortState = (SortState *) child_node;
*** a/src/test/regress/expected/subselect.out
--- b/src/test/regress/expected/subselect.out
***
*** 1041,1043  NOTICE:  x = 9, y = 13
--- 1041,1077 
  (3 rows)
  
  drop function tattle(x int, y int);
+ -
+ --TEST LIMIT pushdown through subquery scan node
+ -
+ create table sq_limit (pk int primary key, c1 int, c2 int);
+ insert into sq_limit values
+ (1, 1, 1),
+ (2, 2, 2),
+ (3, 3, 3),
+ (4, 4, 4),
+ (5, 1, 1),
+ (6, 2, 2),
+ (7, 3, 3),
+ (8, 4, 4);
+ explain (analyze, summary off, timing off, costs off)
+ select * from (select pk,c2 from sq_limit order by c1,pk) as x limit 3;
+QUERY PLAN   
+ 
+  Limit (actual rows=3 loops=1)
+->  Subquery Scan on x (actual rows=3 loops=1)
+  ->  Sort (actual rows=3 loops=1)
+Sort Key: sq_limit.c1, sq_limit.pk
+Sort Method: top-N heapsort  Memory: 25kB
+->  Seq Scan on sq_limit (actual rows=8 loops=1)
+ (6 rows)
+ 
+ select * from (select pk,c2 from sq_limit order by c1,pk) as x limit 3;
+  pk | c2 
+ +
+   1 |  1
+   5 |  1
+   2 |  2
+ (3 rows)
+ 
+ drop table sq_limit;
*** a/src/test/regress/sql/subselect.sql
--- b/src/test/regress/sql/subselect.sql
***
*** 540,542  select * from
--- 540,563 
where tattle(x, u);
  
  drop function tattle(x int, y int);
+ 
+ -
+ --TEST LIMIT pushdown through subquery scan node
+ -
+ create table sq_limit (pk int primary key, c1 int, c2 int);
+ insert into sq_limit values
+ (1, 1, 1),
+ (2, 2, 2),
+ (3, 3, 3),
+ (4, 4, 4),
+ (5, 1, 1),
+ (6, 2, 2),
+ (7, 3, 3),
+ (8, 4, 4);
+ 
+ explain (analyze, summary off, timing off, costs off)
+ select * from (select pk,c2 from sq_limit order by c1,pk) as x limit 3;
+ 
+ select * from (select pk,c2 from sq_limit order by c1,pk) as x limit 3;
+ 
+ drop table sq_limit;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Continuous buildfarm failures on hamster with bin-check

On Wed, Apr 19, 2017 at 12:48 AM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Tom Lane wrote:
>>> FWIW, I'm a bit suspicious of relocating the temp stats directory as
>>> being a reliable fix for this.
>
>> It's an SD card (the kind typically used in cameras and phones), not SSD.
>> Saying it's slow is an understatement.  It's *excruciatingly* slow.
>
> Oh, I misread it ... but still, the modern definition of "excruciatingly
> slow" doesn't seem all that far off what 90s-era hard drives could do.
> It is clear from googling though that there's an enormous performance
> range in SD cards' random write performance, eg wikipedia's entry has
> a link to
>
> http://goughlui.com/2014/01/16/testing-sd-card-performance-round-up/
>
> Seems like it's hard to judge this without knowing exactly which
> SD card Michael has got in that thing.

Professional micro SD HC 8GB, with class 10, which is utterly slow in
the things I have tested up to now and of course I cannot find its
read/write properties. I had a SanDisk class 10 in it that died some
months ago, and that was way faster. RIP.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki

2017-04-18 Thread Fabien COELHO



Hello,


There are different flavour of markdown, maybe you should document which
one is targetted. Should it be CommonMark? Another variant? Why?


This should be pandoc pipe table. It's because it is similar to aligned
format. I need add this to documentation (i have it in recent TODO)


I still do not understand "why" this variant vs CommonMark or whatever 
other version.


ISTM that the md format lacks escaping for special md characters [...] 
I'd say that you need to do escaping more or less similar to html?


There is problem with markown and newlines. Replacing newline by br was
only solution that I was able to find.


Fine. That does not answer the question about escaping other special md 
characters.



Also, it seems that you use distinct vertical bar characters in the
format? Or is this a trick of my terminal?? It seems that your patch
introduces U+2502 (BOX DRAWINGS LIGHT VERTICAL) instead of the usual pipe
in some places. Maybe you copy-pasted things from the unicode linestyle.


Main of the functionality is used from aligned format. I tested returned
tables in retext and it works. If i have another character than standart
pipe, it shouldn`t work.


Sure. ISTM that you are currently using U+2502 instead of pipe, hence my 
point.



Why are *_newline variants added for length and formatting? Would it be
possible to do without, say by relying on the line count computed by the
standard function for instance?


It`s because newlines in markdown, If I need to do it without copy this
function, i had to add parameter for markdown to this functions.


Then maybe it is an option to consider to avoid duplicating code.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Passing values to a dynamic background worker

2017-04-18 Thread Keith Fiske

On Tue, Apr 18, 2017 at 12:34 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 4/17/17 16:19, Keith Fiske wrote:
> > I've reached a roadblock in that bgw_main_arg can only accept a single
> > argument that must be passed by value for a dynamic bgw. I already
> > worked around this for passing the database name to the my existing use
> > of a bgw with doing partition maintenance (pass a simple integer to use
> > as an index array value). But I'm not sure how to do this for passing
> > multiple values in.
>
> You can also store this kind of information in a table.
>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

True, but that seemed like the easy way out. :)  Trying to find ways to
learn internals better through projects I'm actively working on.

Keith

Re: [HACKERS] Inadequate parallel-safety check for SubPlans

Amit Kapila  writes:
> I have ended up doing something along the lines suggested by you (or
> at least what I have understood from your e-mail).  Basically, pass
> the safe-param-ids list to parallel safety function and decide based
> on that if Param node reference in input expression is safe.

I did not like changing the signature of is_parallel_safe() for this:
that's getting a lot of call sites involved in something they don't care
about, and I'm not even sure that it's formally correct.  The key thing
here is that a Param could only be considered parallel-safe in context.
That is, if you looked at the testexpr alone and asked if it could be
pushed down to a worker, the answer would have to be "no".  Only when
you look at the whole SubPlan tree and ask if it can be pushed down
should the answer be "yes".  Therefore, I feel pretty strongly that
what we want is for the safe_param_ids whitelist to be mutated
internally in is_parallel_safe() as it descends the tree.  That way
it can give the right answer when considering a fragment of a plan.
I've committed a patch that does it like that.

> ... Also, I think we don't need to check
> parallel-safety for splan->args as that also is related to correlated
> queries for which parallelism is not supported as of now.

I think leaving that sort of thing out is just creating a latent bug
that is certain to bite you later.  It's true that as long as the args
list contains only Vars, it would never be parallel-unsafe --- but
primnodes.h is pretty clear that one shouldn't assume that it will
stay that way.  So it's better to recurse over the whole tree rather
than ignoring parts of it, especially if you're not going to document
the assumption you're making anywhere.

> I have also explored it to do it by checking parallel-safety of
> testexpr before PARAM_EXEC params are replaced in testexpr, however
> that won't work because we add PARAM_SUBLINK params at the time of
> parse-analysis in transformSubLink().

AFAICS we don't do parallel-safety checks early enough to need to bother
with that, so the existing handling of SubLink and PARAM_SUBLINK params
should be fine.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Quals not pushed down into lateral

2017-04-18 Thread David Fetter

On Thu, Apr 13, 2017 at 01:39:07PM -0700, Andres Freund wrote:
> On 2017-04-13 16:34:12 -0400, Robert Haas wrote:
> > On Thu, Mar 16, 2017 at 4:45 AM, Andres Freund  wrote:
> > > During citus development we noticed that restrictions aren't pushed down
> > > into lateral subqueries, even if they semantically could.  For example,
> > > in this dumbed down example:
> > >
> > > postgres[31776][1]=# CREATE TABLE t_2(id serial primary key);
> > > postgres[31776][1]=# CREATE TABLE t_1(id serial primary key);
> > >
> > > Comparing:
> > >
> > > postgres[31776][1]=# EXPLAIN SELECT * FROM t_1 JOIN (SELECT * FROM t_2 
> > > GROUP BY id) s ON (t_1.id = s.id) WHERE t_1.id = 3;
> > > postgres[31776][1]=# EXPLAIN SELECT * FROM t_1, LATERAL (SELECT * FROM 
> > > t_2 WHERE t_1.id = t_2.id GROUP BY id) s WHERE t_1.id = 3;
> > 
> > Interesting.  That does seem like we are missing a trick.
> 
> Yea.
> 
> > Not exactly related, but I think we need to improve optimization
> > around CTEs, too.  AFAICT, what we've got right now, almost everybody
> > hates.
> 
> That's certainly an issue, but it's a lot harder to resolve because
> we've, for years, told people to intentionally use CTEs as optimization
> barriers :(

If we can get better performance by removing the barriers, we can
certainly explain the new hack, assuming there is or needs to be
one, in the release notes.  We haven't promised to keep the current
behavior forever, nor should we.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why does logical replication launcher set application_name?

On 18/04/17 19:18, Tom Lane wrote:
> Petr Jelinek  writes:
>> On 18/04/17 18:24, Peter Eisentraut wrote:
>>> I don't see why we need to do that.  It is showing the correct
>>> information, isn't it?
> 
>> It does, but it's also one of the things Tom complained about and I
>> think he is right in that at least values for launcher should be
>> filtered out there as there is not much meaning in what is shown for
>> launcher. The ugly part is that we can't tell it's launcher in any other
>> way than comparing bgw_library_name and bgw_function_name to specific
>> values.
> 
> I think you're thinking about it wrong.  To my mind the issue is that
> there should be some generic way to determine that a bgworker process
> is or is not laboring on behalf of an identifiable user.  It's great
> that we can tell which user it is when there is one, but clearly some
> bgworkers will be providing general services that aren't associated with
> a single user.  So it should be possible to set the userID to zero or
> some such when the bgworker is one that isn't associated with a
> particular user.  Maybe the owning user needs to become an additional
> parameter passed in struct BackgroundWorker.
> 

We can already do that. In fact after I wrote the above I thought we
could add some kind of boolean in the style of am_bootstrap_superuser as
BOOTSTRAP_SUPERUSER is what those bgworkers get assigned. I don't like
the name much though (am_bootstrap_superuser) as this should not be
associated with bootstrap IMHO.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] New command to monitor progression of long running queries

2017-04-18 Thread Maksim Milyutin

On 18.04.2017 17:39, Remi Colinet wrote:

Hello Maksim,

The core implementation I suggested for the new PROGRESS command uses
different functions from the one used by EXPLAIN for its core
implementation.
Some source code is shared with EXPLAIN command. But this shared code is
only related to quals, properties, children, subPlans and few other nodes.

All other code for PROGRESS is new code.

I don't believe explain.c code can be fully shared with the one of the
new PROGRESS command. These 2 commands have different purposes.
The core code used for the new PROGRESS command is very different from
the core code used for EXPLAIN.

Perhaps you will be forced to duplicate significant snippets of
functionality from explain.c into your progress.c.

Regarding the queryDesc state of SQL query upon receiving a request to
report its execution progress, it does not bring any issue. The request
is noted when the signal is received by the monitored backend. Then, the
backend continues its execution code path. When interrupts are checked
in the executor code, the request will be dealt.

Yes, interrupts are checked in the CHECK_FOR_INTERRUPTS entries.

When the request is being dealt, the monitored backend will stop its
execution and report the progress of the SQL query. Whatever is the
status of the SQL query, progress.c code checks the status and report
either that the SQL query does not have a valid status, or otherwise the
current execution state of the SQL query.

SQL query status checking is about:
- idle transaction
- out of transaction status
- null planned statement
- utility command
- self monitoring

Other tests can be added if needed to exclude some SQL query state. Such
checking is done in void HandleProgressRequest(void).
I do not see why a SQL query progression would not be possible in this
context. Even when the queryDescc is NULL, we can just report a output. This is currently the case with the patch suggested.

It's interesting question - how much the active query is in a usable
state on the stage of execution. Tom Lane noticed that
CHECK_FOR_INTERRUPTS doesn't give us 100% guarantee about full
consistency [1].

So far, I've found this new command very handy. It allows to evaluate
the time needed to complete a SQL query.

Could you explain how you get the percent of execution for nodes of plan
tree and overall for query?

A further improvement would be to report the memory, disk and time
resources used by the monitored backend. An overuse of memory, disk and
time resources can prevent the SQL query to complete.

This functionality is somehow implemented in explain.c. You can see my
patch to this file [2]. I only manipulate runtime statistics (data in
the structure 'Instrumentation') to achieve the printing state of
running query.

1. https://www.postgresql.org/message-id/24182.1472745492%40sss.pgh.pa.us
2.
https://github.com/postgrespro/pg_query_state/blob/master/runtime_explain.patch

--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Quorum commit for multiple synchronous replication.

2017-04-18 Thread Fujii Masao

On Tue, Apr 18, 2017 at 7:02 PM, Masahiko Sawada  wrote:
> On Tue, Apr 18, 2017 at 6:40 PM, Kyotaro HORIGUCHI
>  wrote:
>> At Tue, 18 Apr 2017 14:58:50 +0900, Masahiko Sawada  
>> wrote in 
>>> On Tue, Apr 18, 2017 at 3:04 AM, Fujii Masao  wrote:
>>> > On Wed, Apr 12, 2017 at 2:36 AM, Masahiko Sawada  
>>> > wrote:
>>> >> On Thu, Apr 6, 2017 at 4:17 PM, Masahiko Sawada  
>>> >> wrote:
>>> >>> On Thu, Apr 6, 2017 at 10:51 AM, Noah Misch  wrote:
>>>  On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote:
>>> > On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch  wrote:
>>> > > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
>>> > >> Regarding this feature, there are some loose ends. We should work 
>>> > >> on
>>> > >> and complete them until the release.
>>> > >>
>>> > >> (1)
>>> > >> Which synchronous replication method, priority or quorum, should be
>>> > >> chosen when neither FIRST nor ANY is specified in s_s_names? Right 
>>> > >> now,
>>> > >> a priority-based sync replication is chosen for keeping backward
>>> > >> compatibility. However some hackers argued to change this decision
>>> > >> so that a quorum commit is chosen because they think that most 
>>> > >> users
>>> > >> prefer to a quorum.
>>> > >>
>>> > >> (2)
>>> > >> There will be still many source comments and documentations that
>>> > >> we need to update, for example, in high-availability.sgml. We need 
>>> > >> to
>>> > >> check and update them throughly.
>>> > >>
>>> > >> (3)
>>> > >> The priority value is assigned to each standby listed in s_s_names
>>> > >> even in quorum commit though those priority values are not used at 
>>> > >> all.
>>> > >> Users can see those priority values in pg_stat_replication.
>>> > >> Isn't this confusing? If yes, it might be better to always assign 
>>> > >> 1 as
>>> > >> the priority, for example.
>>> > >
>>> > > [Action required within three days.  This is a generic 
>>> > > notification.]
>>> > >
>>> > > The above-described topic is currently a PostgreSQL 10 open item.  
>>> > > Fujii,
>>> > > since you committed the patch believed to have created it, you own 
>>> > > this open
>>> > > item.  If some other commit is more relevant or if this does not 
>>> > > belong as a
>>> > > v10 open item, please let us know.  Otherwise, please observe the 
>>> > > policy on
>>> > > open item ownership[1] and send a status update within three 
>>> > > calendar days of
>>> > > this message.  Include a date for your subsequent status update.  
>>> > > Testers may
>>> > > discover new open items at any time, and I want to plan to get them 
>>> > > all fixed
>>> > > well in advance of shipping v10.  Consequently, I will appreciate 
>>> > > your efforts
>>> > > toward speedy resolution.  Thanks.
>>> > >
>>> > > [1] 
>>> > > https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
>>> >
>>> > Thanks for the notice!
>>> >
>>> > Regarding the item (2), Sawada-san told me that he will work on it 
>>> > after
>>> > this CommitFest finishes. So we would receive the patch for the item 
>>> > from
>>> > him next week. If there will be no patch even after the end of next 
>>> > week
>>> > (i.e., April 14th), I will. Let's wait for Sawada-san's action at 
>>> > first.
>>> 
>>>  Sounds reasonable; I will look for your update on 14Apr or earlier.
>>> 
>>> > The items (1) and (3) are not bugs. So I don't think that they need 
>>> > to be
>>> > resolved before the beta release. After the feature freeze, many users
>>> > will try and play with many new features including quorum-based 
>>> > syncrep.
>>> > Then if many of them complain about (1) and (3), we can change the 
>>> > code
>>> > at that timing. So we need more time that users can try the feature.
>>> 
>>>  I've moved (1) to a new section for things to revisit during beta.  If 
>>>  someone
>>>  feels strongly that the current behavior is Wrong and must change, 
>>>  speak up as
>>>  soon as you reach that conclusion.  Absent such arguments, the 
>>>  behavior won't
>>>  change.
>>> 
>>> > BTW, IMO (3) should be fixed so that pg_stat_replication reports NULL
>>> > as the priority if quorum-based sync rep is chosen. It's less 
>>> > confusing.
>>> 
>>>  Since you do want (3) to change, please own it like any other open 
>>>  item,
>>>  including the mandatory status updates.
>>> >>>
>>> >>> I agree to report NULL as the

[HACKERS] SASL minor docs typo

2017-04-18 Thread Jaime Casanova

Hi,

reading SASL docs found this typo:

in protocol.sgml:1356
"""
To begin a SASL authentication exchange, the server an AuthenticationSASL
  message.
"""

I guess it should say "the server sends an AuthenticationSASL message"

-- 
Jaime Casanova  www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f)

Andres Freund  writes:
> On 2017-04-17 19:26:11 -0400, Tom Lane wrote:
>> If we are going to go down this road, I think it would be a good idea
>> to try to provide a cursor position for the "can't accept a set" error
>> message, because otherwise it will be really unclear what's wrong with
>> something like

> Looks good to me.

Thanks for reviewing.

I did some further testing and noted that plperl and pltcl fail to pass
through the error cursor, although plpgsql and plpython seem OK.  That's
a pre-existing issue though --- they don't pass through plain syntax
error cursors either.  I'm fine with leaving that for later.  Otherwise,
it seemed to work, so pushed.

> I couldn't find any place in the docs that actually documents our SRF
> behaviour in any sort of detail ([1], [2]), particularly not documenting
> where SRFs are legal, and how nested SRFs are supposed to behave.  I'm
> not sure in how much detail we want to go?  If we want do document that,
> where?  The closest seems to be "4.2.14. Expression Evaluation Rules"
> [3].

Looks like you didn't notice xfunc.sgml?  There's a large amount of info
there, particularly section 37.4.8:

https://www.postgresql.org/docs/devel/static/xfunc-sql.html#xfunc-sql-functions-returning-set

I've never been totally happy with this presentation, since (a) it's
buried pretty far in the back of the manual, and (b) structurally it
looks like it applies only to SQL-language functions, despite the
disclaimer that says it applies to all languages.  Still, right now
is probably not the time to do massive docs surgery, and in any case
I'm not sure that bringing all that detail forward into 4.2 would
represent an improvement.  Maybe a link or two from chapter
4 would be the ticket.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki

2017-04-18 Thread Jan Michálek

2017-04-18 12:06 GMT+02:00 Fabien COELHO :

>
> Hello Jan,
>
> It seems that the patch does not apply anymore on head due to changes in
>>> psql non regression tests. Could you rebase?
>>>
>>
>> This should work on current master (all test passed).
>>
>
> Patch applies, compiles and make check is ok.
>
> There are different flavour of markdown, maybe you should document which
> one is targetted. Should it be CommonMark? Another variant? Why?
>

This should be pandoc pipe table. It's because it is similar to aligned
format. I need add this to documentation (i have it in recent TODO)


>
> ISTM that the md format lacks escaping for special md characters:
>
>  fabien=# SELECT E'\\n\n' AS foo;
>  │ foo  │
>  |--|
>  │ \n
>
> I'd say that you need to do escaping more or less similar to html?
>

There is problem with markown and newlines. Replacing newline by br was
only solution that I was able to find.

>
> Also, it seems that you use distinct vertical bar characters in the
> format? Or is this a trick of my terminal?? It seems that your patch
> introduces U+2502 (BOX DRAWINGS LIGHT VERTICAL) instead of the usual pipe
> in some places. Maybe you copy-pasted things from the unicode linestyle.
>

Main of the functionality is used from aligned format. I tested returned
tables in retext and it works. If i have another character than standart
pipe, it shouldn`t work.


>
> Why are *_newline variants added for length and formatting? Would it be
> possible to do without, say by relying on the line count computed by the
> standard function for instance?
>

It`s because newlines in markdown, If I need to do it without copy this
function, i had to add parameter for markdown to this functions.


>
> The help line is too long, I would suggest not to add the new formats, the
> list is already truncated with "..." for other formats.
>

OK


>
> In the sgml documentation, you introduce tab characters, where only spaces
> should be used.
>

OK, I modified vimrc as it is in documentation, maybe i do something wrong.
I will correct this.


>
> pg_markdown obsahuje falešný prostor mezi čárkou a nový řádek.
>

I will look on this.


>
> --
> Fabien.




-- 
Jelen
Starší čeledín datovýho chlíva

Re: [HACKERS] some review comments on logical rep code

2017-04-18 Thread Fujii Masao

On Wed, Apr 19, 2017 at 1:35 AM, Petr Jelinek
 wrote:
> Thank you for working on this!
>
> On 18/04/17 10:16, Masahiko Sawada wrote:
>> On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
>>  wrote:
>>>
>>> 3.

 ApplyLauncherWakeup() should be static function.
>>>
>>> Attached 003 patch fixes it (and also fixes #5 review comment).
>>>
>>> 4.

 Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
 subcommands (e.g., ENABLED one) should wake the launcher up on commit.
>>>
>>> This is also reported by Horiguchi-san on another thread[1], and patch 
>>> exists.
>>>
>>> 5.

 Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?
>>>
>>> I also guess it's necessary. This change is included in #3 patch.
>>
>> IsBackendPid() is not free, I suppose that just ignoring failure
>> with ESRCH is enough.
>
> After thought, since LogicalRepCtx->launcher_pid could be 0 should, we
> check if launcher_pid != 0?
>>>
>>> Yes. For example, code to send a signal to autovacuum launcher
>>> looks as the follows. I don't think there's a reason to do
>>> different thing here.
>>>
>>> |   avlauncher_needs_signal = false;
>>> |   if (AutoVacPID != 0)
>>> | kill(AutoVacPID, SIGUSR2);
>>>
>>
>> Fixed.
>
> Yes the IsBackendPid was needed only because we didn't reset
> launcher_pid and it was causing issues otherwise obviously (and I didn't
> realize we are not resetting it...)
>
>>
>>>
>>> 6.

 Normal statements that the walsender for logical rep runs are logged
 by log_replication_commands. So if log_statement is also set,
 those commands are logged twice.
>>>
>>> Attached 006 patch fixes it by adding  "-c log_statement = none" to
>>> connection parameter of libpqrcv if logical = true.
>>
>> The control by log_replication_commands is performed on
>> walsender, so this also shuld be done on the same place. Anyway
>> log_statement is irrelevant to walsender.
>
> Patch 006 emits all logs executed by the apply worker as a log of
> log_replication_command but perhaps you're right. It would be better
> to leave the log of log_statement if the command executed by the apply
> worker is a SQL command. Will fix.
>>>
>>> Here, we can choose whether a SQL command is a part of
>>> replication commands or not. The previous fix thought it is and
>>> this doesn't. (They are emitted in different forms) Is this ok?
>>
>> Yes, v2 patch logs other than T_SQLCmd as a replication command, and
>> T_SQLCmd is logged later in exec_simple_query. The
>> log_replication_command logs only replication commands[1], it doesn't
>> mean to log commands executed on replication connection. I think such
>> behavior is right.
>>
>>> I'm not sure why you have moved the location of the code, but if
>>> it should show all received commands, it might be better to be
>>> placed before CHECK_FOR_INTERRUPTS..
>>
>> Hmm, the reason why I've moved it is that we cannot know whether given
>> cmd_string is a SQL command or a replication command before parsing.
>>
>
> Well the issue is that then the query is not logged if there was parsing
> issue even when logging was specifically requested. I don't know what's
> good solution here, maybe teaching exec_simple_query to not log the
> query if it came from walsender.
>
> BTW reading that function in walsender, there is :
>>   /*
>>* CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot until the 
>> next
>>* command arrives. Clean up the old stuff if there's anything.
>>*/
>>   SnapBuildClearExportedSnapshot();
>
> and then
>
>>   /*
>>* CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot. If it was
>>* called outside of transaction the snapshot should be cleared here.
>>*/
>>   if (!IsTransactionBlock())
>>   SnapBuildClearExportedSnapshot();
>
> The first one should not be there, it looks like a result of some merge
> conflict being solved wrongly. Maybe your patch could fix that too?
>
>
>>> 10.

 SpinLockAcquire(>relmutex);
 MyLogicalRepWorker->relstate =
   GetSubscriptionRelState(MyLogicalRepWorker->subid,
 MyLogicalRepWorker->relid,
 >relstate_lsn,
 false);
 SpinLockRelease(>relmutex);

 Non-"short-term" function like GetSubscriptionRelState() should not
 be called while spinlock is being held.

>>>
>>> One option is adding new LWLock for the relation state change but this
>>> lock is used at many locations where the operation actually doesn't
>>> take time. I think that the discussion would be needed to get
>>> consensus, so patch for it is not attached.
>>
>> From

Re: [HACKERS] Why does logical replication launcher set application_name?

Petr Jelinek  writes:
> On 18/04/17 18:24, Peter Eisentraut wrote:
>> I don't see why we need to do that.  It is showing the correct
>> information, isn't it?

> It does, but it's also one of the things Tom complained about and I
> think he is right in that at least values for launcher should be
> filtered out there as there is not much meaning in what is shown for
> launcher. The ugly part is that we can't tell it's launcher in any other
> way than comparing bgw_library_name and bgw_function_name to specific
> values.

I think you're thinking about it wrong.  To my mind the issue is that
there should be some generic way to determine that a bgworker process
is or is not laboring on behalf of an identifiable user.  It's great
that we can tell which user it is when there is one, but clearly some
bgworkers will be providing general services that aren't associated with
a single user.  So it should be possible to set the userID to zero or
some such when the bgworker is one that isn't associated with a
particular user.  Maybe the owning user needs to become an additional
parameter passed in struct BackgroundWorker.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why does logical replication launcher set application_name?

On 18/04/17 18:24, Peter Eisentraut wrote:
> On 4/18/17 12:13, Petr Jelinek wrote:
>> We can definitely easily detect that the bgworker is internal one by
>> library_name equals 'postgres' so we can easily remove the usesysid and
>> usename based on that.
> 
> I don't see why we need to do that.  It is showing the correct
> information, isn't it?
>

It does, but it's also one of the things Tom complained about and I
think he is right in that at least values for launcher should be
filtered out there as there is not much meaning in what is shown for
launcher. The ugly part is that we can't tell it's launcher in any other
way than comparing bgw_library_name and bgw_function_name to specific
values.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Interval for launching the table sync worker

On 18/04/17 18:14, Peter Eisentraut wrote:
> On 4/18/17 11:59, Petr Jelinek wrote:
>> Hmm if we create hashtable for this, I'd say create hashtable for the
>> whole table_states then. The reason why it's list now was that it seemed
>> unnecessary to have hashtable when it will be empty almost always but
>> there is no need to have both hashtable + list IMHO.
> 
> The difference is that we blow away the list of states when the catalog
> changes, but we keep the hash table with the start times around.  We
> need two things with different life times.
> 

Why can't we just update the hashtable based on the catalog? I mean once
the record is not needed in the list, the table has been synced so there
is no need for the timestamp either since we'll not try to start the
worker again.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why does logical replication launcher set application_name?

Peter Eisentraut  writes:
> I think showing bgw_name as backend_type always sounds reasonable.  No
> need to treat external implementations differently.

That's definitely an approach we could use.  It would encourage people
to use short bgw_names, which is a constraint that wasn't especially
apparent before, but I don't think that's a bad thing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] some review comments on logical rep code

Thank you for working on this!

On 18/04/17 10:16, Masahiko Sawada wrote:
> On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
>  wrote:
>>
>> 3.
>>>
>>> ApplyLauncherWakeup() should be static function.
>>
>> Attached 003 patch fixes it (and also fixes #5 review comment).
>>
>> 4.
>>>
>>> Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
>>> subcommands (e.g., ENABLED one) should wake the launcher up on commit.
>>
>> This is also reported by Horiguchi-san on another thread[1], and patch 
>> exists.
>>
>> 5.
>>>
>>> Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?
>>
>> I also guess it's necessary. This change is included in #3 patch.
>
> IsBackendPid() is not free, I suppose that just ignoring failure
> with ESRCH is enough.

 After thought, since LogicalRepCtx->launcher_pid could be 0 should, we
 check if launcher_pid != 0?
>>
>> Yes. For example, code to send a signal to autovacuum launcher
>> looks as the follows. I don't think there's a reason to do
>> different thing here.
>>
>> |   avlauncher_needs_signal = false;
>> |   if (AutoVacPID != 0)
>> | kill(AutoVacPID, SIGUSR2);
>>
> 
> Fixed.

Yes the IsBackendPid was needed only because we didn't reset
launcher_pid and it was causing issues otherwise obviously (and I didn't
realize we are not resetting it...)

> 
>>
>> 6.
>>>
>>> Normal statements that the walsender for logical rep runs are logged
>>> by log_replication_commands. So if log_statement is also set,
>>> those commands are logged twice.
>>
>> Attached 006 patch fixes it by adding  "-c log_statement = none" to
>> connection parameter of libpqrcv if logical = true.
>
> The control by log_replication_commands is performed on
> walsender, so this also shuld be done on the same place. Anyway
> log_statement is irrelevant to walsender.

 Patch 006 emits all logs executed by the apply worker as a log of
 log_replication_command but perhaps you're right. It would be better
 to leave the log of log_statement if the command executed by the apply
 worker is a SQL command. Will fix.
>>
>> Here, we can choose whether a SQL command is a part of
>> replication commands or not. The previous fix thought it is and
>> this doesn't. (They are emitted in different forms) Is this ok?
> 
> Yes, v2 patch logs other than T_SQLCmd as a replication command, and
> T_SQLCmd is logged later in exec_simple_query. The
> log_replication_command logs only replication commands[1], it doesn't
> mean to log commands executed on replication connection. I think such
> behavior is right.
> 
>> I'm not sure why you have moved the location of the code, but if
>> it should show all received commands, it might be better to be
>> placed before CHECK_FOR_INTERRUPTS..
> 
> Hmm, the reason why I've moved it is that we cannot know whether given
> cmd_string is a SQL command or a replication command before parsing.
> 

Well the issue is that then the query is not logged if there was parsing
issue even when logging was specifically requested. I don't know what's
good solution here, maybe teaching exec_simple_query to not log the
query if it came from walsender.

BTW reading that function in walsender, there is :
>   /*
>* CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot until the next
>* command arrives. Clean up the old stuff if there's anything.
>*/
>   SnapBuildClearExportedSnapshot();

and then

>   /*
>* CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot. If it was
>* called outside of transaction the snapshot should be cleared here.
>*/
>   if (!IsTransactionBlock())
>   SnapBuildClearExportedSnapshot();

The first one should not be there, it looks like a result of some merge
conflict being solved wrongly. Maybe your patch could fix that too?


>> 10.
>>>
>>> SpinLockAcquire(>relmutex);
>>> MyLogicalRepWorker->relstate =
>>>   GetSubscriptionRelState(MyLogicalRepWorker->subid,
>>> MyLogicalRepWorker->relid,
>>> >relstate_lsn,
>>> false);
>>> SpinLockRelease(>relmutex);
>>>
>>> Non-"short-term" function like GetSubscriptionRelState() should not
>>> be called while spinlock is being held.
>>>
>>
>> One option is adding new LWLock for the relation state change but this
>> lock is used at many locations where the operation actually doesn't
>> take time. I think that the discussion would be needed to get
>> consensus, so patch for it is not attached.
>
> From the point of view of time, I agree that it doesn't seem to
> harm. Bt I thing it as more significant problem that
> potentially-throwable function is called within the mutex
> region. It potentially causes a kind

Re: [HACKERS] Passing values to a dynamic background worker

On 4/17/17 16:19, Keith Fiske wrote:
> I've reached a roadblock in that bgw_main_arg can only accept a single
> argument that must be passed by value for a dynamic bgw. I already
> worked around this for passing the database name to the my existing use
> of a bgw with doing partition maintenance (pass a simple integer to use
> as an index array value). But I'm not sure how to do this for passing
> multiple values in.

You can also store this kind of information in a table.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication and synchronous replication

On Tue, Apr 18, 2017 at 9:05 PM, Craig Ringer  wrote:
> On 18 April 2017 at 18:55, Masahiko Sawada  wrote:
>> Hi,
>>
>> As doc of logical decoding said as a note[1], logical replication can
>> support the synchronous replication with some restriction. But In
>> addition to this, IIUC in logical replication decoded data is sent to
>> subscribers only when the commit WAL record is decoded (calls
>> ReorderBufferCommit) .
>
> Correct.
>
>> It means that the local SQL execution and
>> applying the decoded data on subscriber side are always executed in a
>> sequential order, and the response time can simply be doubled or even
>> more (OTOH a good point is that decoded data of aborted transaction is
>> never sent to subscriber). I think there will be a lot of peoples who
>> want to use logical synchronous replication but this is a big
>> restriction for such user. I think we should document it or deal with
>> it.
>> Thought?
>
> Definitely should be documented. I think it's covered under logical
> decoding, but needs at least an xref.

Yes, I think so too. I'll send a patch for that this week, and maybe
will propose a improvement patch for logical sync replication in the
next release cycle.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication launcher uses wal_retrieve_retry_interval

On 4/18/17 12:00, Petr Jelinek wrote:
> As for apply_worker_launch_interval, I think we want different
> name so that it can be used for tablesync rate limiting as well.

But that's a mechanism we don't have yet, so maybe we should design that
when we get there?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why does logical replication launcher set application_name?

On 4/18/17 12:13, Petr Jelinek wrote:
> We can definitely easily detect that the bgworker is internal one by
> library_name equals 'postgres' so we can easily remove the usesysid and
> usename based on that.

I don't see why we need to do that.  It is showing the correct
information, isn't it?

> But that does not solve the issue of identifying
> the processes in pg_stat_activity as logical replication laucher/worker.
> I wonder if it would be okay to set backend_type to bgw_name for
> internal workers and just leave the external ones as it is (or solve
> them in v11 with some proper API) as we can control the length of name
> there (it will still be longer than the values for other things but
> maybe not too much).

I think showing bgw_name as backend_type always sounds reasonable.  No
need to treat external implementations differently.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] scram and \password

On 18 April 2017 at 09:25, Heikki Linnakangas  wrote:
> On 04/18/2017 11:15 AM, Simon Riggs wrote:
>>
>> As a potential open item, if we treat "md5" as ">= md5"
>> should we not also treat "password" as ">=password"?
>>
>> It seems strange that we still support "password" and yet tell
>> everyonenot to use it.
>>
>> I'd like PG10 to be the version where I don't have to tell people not
>> to use certain things, hash indexes, "password" etc.
>
>
> Between md5 and scram, the choice is easy, because a user can only have an
> MD5 hashed or SCRAM "hashed" password in pg_authid. So you present the
> client an MD5 challenge or a SCRAM challenge, depending on what the user has
> in pg_authid, or you error out without even trying. But "password"
> authentication can be used with any kind of a verifier in pg_authid.
> "password" authentication can be useful, for example, if a user has a SCRAM
> verifier in pg_authid, but the client doesn't support SCRAM.

Which would be a little strange and defeat the purpose of SCRAM.

> You could argue that you shouldn't use it even in that situation, you should
> upgrade the client, or use SSL certs or an ssh tunnel or something else
> instead. But that's a very different argument than the one for treating
> "md5" as ">= md5".
>
> Also note that LDAP and RADIUS authentication look identical to "password"
> authentication, on the wire. The only difference is that instead of checking
> the password against pg_authid, the server checks it against an LDAP or
> RADIUS server.

So the argument is multiple things are dangerous so we do nothing...

We have an opportunity to change things because its PG10, so lets not
waste the opportunity.


Thanks very much for working on SCRAM, its a good feature.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] logical replication fixes

2017-04-18 Thread Euler Taveira

Hi,

While inspecting the logical replication code, I found a bug that could
pick the wrong remote relation if they have the same name but different
schemas. Also, I did some spelling/cosmetic changes and fixed an oversight
in the ALTER SUBSCRIPTION documentation. Patches attached.


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

From da4dc2807566958dd89cc9f05bf6a83a996e99c7 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Wed, 12 Apr 2017 21:45:34 -0300
Subject: [PATCH 1/3] Cosmetic and spelling fixes

---
 src/backend/catalog/pg_publication.c|  2 +-
 src/backend/catalog/pg_subscription.c   |  4 ++--
 src/backend/replication/logical/message.c   |  2 +-
 src/backend/replication/logical/origin.c| 18 +-
 src/backend/replication/logical/proto.c |  4 ++--
 src/backend/replication/logical/tablesync.c |  8 
 6 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/src/backend/catalog/pg_publication.c b/src/backend/catalog/pg_publication.c
index 9330e23..1987401 100644
--- a/src/backend/catalog/pg_publication.c
+++ b/src/backend/catalog/pg_publication.c
@@ -312,7 +312,7 @@ GetAllTablesPublicationRelations(void)
 /*
  * Get publication using oid
  *
- * The Publication struct and it's data are palloced here.
+ * The Publication struct and its data are palloc'ed here.
  */
 Publication *
 GetPublication(Oid pubid)
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index a183850..22587a4 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -403,7 +403,7 @@ RemoveSubscriptionRel(Oid subid, Oid relid)
 /*
  * Get all relations for subscription.
  *
- * Returned list is palloced in current memory context.
+ * Returned list is palloc'ed in current memory context.
  */
 List *
 GetSubscriptionRelations(Oid subid)
@@ -450,7 +450,7 @@ GetSubscriptionRelations(Oid subid)
 /*
  * Get all relations for subscription that are not in a ready state.
  *
- * Returned list is palloced in current memory context.
+ * Returned list is palloc'ed in current memory context.
  */
 List *
 GetSubscriptionNotReadyRelations(Oid subid)
diff --git a/src/backend/replication/logical/message.c b/src/backend/replication/logical/message.c
index 0dc3a9b..ef7d6c5 100644
--- a/src/backend/replication/logical/message.c
+++ b/src/backend/replication/logical/message.c
@@ -20,7 +20,7 @@
  * Non-transactional messages are sent to the plugin at the time when the
  * logical decoding reads them from XLOG. This also means that transactional
  * messages won't be delivered if the transaction was rolled back but the
- * non-transactional one will be delivered always.
+ * non-transactional one will always be delivered.
  *
  * Every message carries prefix to avoid conflicts between different decoding
  * plugins. The plugin authors must take extra care to use unique prefix,
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index 5eaf863..cf1a692 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -24,7 +24,7 @@
  * two bytes allow us to be more space efficient.
  *
  * Replication progress is tracked in a shared memory table
- * (ReplicationStates) that's dumped to disk every checkpoint. Entries
+ * (ReplicationState) that's dumped to disk every checkpoint. Entries
  * ('slots') in this table are identified by the internal id. That's the case
  * because it allows to increase replication progress during crash
  * recovery. To allow doing so we store the original LSN (from the originating
@@ -48,7 +48,7 @@
  *	 pg_replication_slot is required for the duration. That allows us to
  *	 safely and conflict free assign new origins using a dirty snapshot.
  *
- * * When creating an in-memory replication progress slot the ReplicationOirgin
+ * * When creating an in-memory replication progress slot the ReplicationOrigin
  *	 LWLock has to be held exclusively; when iterating over the replication
  *	 progress a shared lock has to be held, the same when advancing the
  *	 replication progress of an individual backend that has not setup as the
@@ -162,8 +162,8 @@ static ReplicationState *replication_states;
 static ReplicationStateCtl *replication_states_ctl;
 
 /*
- * Backend-local, cached element from ReplicationStates for use in a backend
- * replaying remote commits, so we don't have to search ReplicationStates for
+ * Backend-local, cached element from ReplicationState for use in a backend
+ * replaying remote commits, so we don't have to search ReplicationState for
  * the backends current RepOriginId.
  */
 static ReplicationState *session_replication_state = NULL;
@@ -441,7 +441,7 @@ ReplicationOriginShmemSize(void)
 	/*
 	 * XXX: max_replication_slots is arguably

Re: [HACKERS] Interval for launching the table sync worker

On 4/18/17 11:59, Petr Jelinek wrote:
> Hmm if we create hashtable for this, I'd say create hashtable for the
> whole table_states then. The reason why it's list now was that it seemed
> unnecessary to have hashtable when it will be empty almost always but
> there is no need to have both hashtable + list IMHO.

The difference is that we blow away the list of states when the catalog
changes, but we keep the hash table with the start times around.  We
need two things with different life times.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Interval for launching the table sync worker

On Tue, Apr 18, 2017 at 11:22 PM, Peter Eisentraut
 wrote:
> On 4/13/17 06:23, Masahiko Sawada wrote:
>> Attached the latest patch. It didn't actually necessary to change
>> GetSubscriptionNotReadyRelations. I just changed the logic refreshing
>> the sync table state list.
>
> I think this was the right direction, but then I got worried about
> having a loop within a loop to copy over the last start times.  If you
> have very many tables, that could be a big nested loop.
>
> Here is an alternative proposal to store the last start times in a hash
> table.
>

If we use wal_retrieve_retry_interval for the table sync worker, I
think we need to update the documentation as well. Currently the
documentation mentions that a bit, but since
wal_retrieve_retry_interval will be used at three different places for
different reason it would confuse the user.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why does logical replication launcher set application_name?

On 16/04/17 22:27, Petr Jelinek wrote:
> On 16/04/17 18:47, Tom Lane wrote:
>> Craig Ringer  writes:
>>> On 12 April 2017 at 13:34, Kuntal Ghosh  wrote:
 For backend_type=background worker, application_name shows the name of
 the background worker (BackgroundWorker->bgw_name). I think we need
 some way to distinguish among different background workers. But,
 application_name may not be the correct field to show the information.
>>
>>> It's better than (ab)using 'query' IMO.
>>
>>> I'd rather an abbreviated entry to address Tom's concerns about
>>> format. 'lrlaunch' or whatever.
>>
>> Basically the problem I've got with the LR launcher is that it looks
>> utterly unlike any other background process in pg_stat_activity.
>> Leaving out all-null columns to make my point:
>>
>> regression=# select 
>> pid,usesysid,usename,application_name,backend_start,wait_event_type,wait_event,backend_type
>>  from pg_stat_activity where application_name != 'psql';
>>   pid  | usesysid | usename  |   application_name   | 
>> backend_start | wait_event_type | wait_event  |
>> backend_type 
>> ---+--+--+--+---+-+-+-
>>  25416 |  |  |  | 2017-04-16 
>> 12:32:46.987076-04 | Activity| AutoVacuumMain  | autovacuum 
>> launcher
>>  25418 |   10 | postgres | logical replication launcher | 2017-04-16 
>> 12:32:46.988859-04 | Activity| LogicalLauncherMain | background 
>> worker
>>  25414 |  |  |  | 2017-04-16 
>> 12:32:46.986745-04 | Activity| BgWriterHibernate   | background 
>> writer
>>  25413 |  |  |  | 2017-04-16 
>> 12:32:46.986885-04 | Activity| CheckpointerMain| checkpointer
>>  25415 |  |  |  | 2017-04-16 
>> 12:32:46.9871-04   | Activity| WalWriterMain   | walwriter
>> (5 rows)
>>
>> Why has it got non-null entries for usesysid and usename, never mind
>> application_name?  Why does it not follow the well-established convention
>> that backend_type is what identifies background processes?
>>
>> I'm sure the answer to those questions is "it's an implementation artifact
>> from using the generic bgworker infrastructure", but that does not make it
>> look any less like sloppy, half-finished work.
>>
>> If it is a limitation of the bgworker infrastructure that we can't make
>> the LR processes look more like the other kinds of built-in processes,
>> then I think we need to fix that limitation.  And I further assert that
>> we need to do it for v10, because once we ship v10 people will adjust
>> their tools for this bogus output, and we'll face complaints about
>> backwards compatibility if we fix it later.
>>
> 
> It's indeed how bgworker infrastructure is reporting it. That being
> said, since LR processes are in-core, we can add exception for them in
> pgstat_bestart() so that they are reported more like other builtin
> processes. We could also try to add api for bgworker processes to change
> how they are reported so that any future workers (and all the external
> workers) can be reported properly as well, but that seems better fit for
> v11 at this point since it would be good if we had some discussion for
> how that should look like.
> 

Hmm so I took a look at code today with intention to implement this, but
it does not seem as straightforward as I'd hoped due to pgstat_bestart()
being called quite early in startup.

We can definitely easily detect that the bgworker is internal one by
library_name equals 'postgres' so we can easily remove the usesysid and
usename based on that. But that does not solve the issue of identifying
the processes in pg_stat_activity as logical replication laucher/worker.
I wonder if it would be okay to set backend_type to bgw_name for
internal workers and just leave the external ones as it is (or solve
them in v11 with some proper API) as we can control the length of name
there (it will still be longer than the values for other things but
maybe not too much).

Does that sound reasonable enough for v10?

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] dtrace probes

2017-04-18 Thread Jesper Pedersen


Hi,

The lwlock dtrace probes define LWLockMode as int, and the 
TRACE_POSTGRESQL_LWLOCK methods are called using both a variable and 
constant definition.


This leads to a mix of argument definitions depending on the call site, 
as seen in probes.txt file.


A fix is to explicit cast 'mode' to int such that all call sites will 
use the


 argument #2 4 signed   bytes

definition. Attached patch does this.

I have verified all dtraces probes for their type, and only the lock__ 
methods doesn't aligned with its actual types. However, that would 
require a change to probes.d, and therefore PostgreSQL 11 material.


Depending on the feedback I can add this patch to the open item list in 
order to fix it for PostgreSQL 10.


Best regards,
 Jesper
>From d6f5c119c057c7ff8c84f88bb4122a1ca245a7d4 Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Tue, 18 Apr 2017 11:44:18 -0400
Subject: [PATCH] Explicit cast LWLockMode, which an enum, to int in order to
 match the dtrace definition of the lwlock methods. Thereby all call sites
 will have the same definition instead of a mix between signed and unsigned.

Author: Jesper Pedersen 
---
 src/backend/storage/lmgr/lwlock.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 3e13394..abf5fbb 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -1226,7 +1226,7 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
 #endif
 
 		LWLockReportWaitStart(lock);
-		TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), (int)mode);
 
 		for (;;)
 		{
@@ -1248,7 +1248,7 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
 		}
 #endif
 
-		TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), (int)mode);
 		LWLockReportWaitEnd();
 
 		LOG_LWDEBUG("LWLockAcquire", lock, "awakened");
@@ -1257,7 +1257,7 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
 		result = false;
 	}
 
-	TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), mode);
+	TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), (int)mode);
 
 	/* Add lock to list of locks held by this backend */
 	held_lwlocks[num_held_lwlocks].lock = lock;
@@ -1308,14 +1308,14 @@ LWLockConditionalAcquire(LWLock *lock, LWLockMode mode)
 		RESUME_INTERRUPTS();
 
 		LOG_LWDEBUG("LWLockConditionalAcquire", lock, "failed");
-		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(T_NAME(lock), (int)mode);
 	}
 	else
 	{
 		/* Add lock to list of locks held by this backend */
 		held_lwlocks[num_held_lwlocks].lock = lock;
 		held_lwlocks[num_held_lwlocks++].mode = mode;
-		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(T_NAME(lock), (int)mode);
 	}
 	return !mustwait;
 }
@@ -1387,7 +1387,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 #endif
 
 			LWLockReportWaitStart(lock);
-			TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
+			TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), (int)mode);
 
 			for (;;)
 			{
@@ -1405,7 +1405,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 Assert(nwaiters < MAX_BACKENDS);
 			}
 #endif
-			TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
+			TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), (int)mode);
 			LWLockReportWaitEnd();
 
 			LOG_LWDEBUG("LWLockAcquireOrWait", lock, "awakened");
@@ -1435,7 +1435,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 		/* Failed to get lock, so release interrupt holdoff */
 		RESUME_INTERRUPTS();
 		LOG_LWDEBUG("LWLockAcquireOrWait", lock, "failed");
-		TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_FAIL(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_FAIL(T_NAME(lock), (int)mode);
 	}
 	else
 	{
@@ -1443,7 +1443,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 		/* Add lock to list of locks held by this backend */
 		held_lwlocks[num_held_lwlocks].lock = lock;
 		held_lwlocks[num_held_lwlocks++].mode = mode;
-		TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT(T_NAME(lock), (int)mode);
 	}
 
 	return !mustwait;
-- 
2.7.4

/usr/local/bin/postgres postgresql:clog__checkpoint__start [sema 0xe2351a]
  location #1 0x4fc706
argument #1 1 signed   bytes @ 0
  location #2 0x4fc72d
argument #1 1 signed   bytes @ 1
/usr/local/bin/postgres postgresql:clog__checkpoint__done [sema 0xe2351c]
  location #1 0x4fc725
argument #1 1 signed   bytes @ 0
  location #2 0x4fc74c
argument #1 1 signed   bytes @ 1
/usr/local/bin/postgres postgresql:multixact__checkpoint__start [sema 0xe23522]
  location #1 0x500d07
argument #1 1 signed   bytes @ 0
  location #2 0x500dbc
argument #1 1 signed   bytes @ 1
/usr/local/bin/postgres postgresql:multixact__checkpoint__done [sema 0xe23524]

Re: [HACKERS] Logical replication launcher uses wal_retrieve_retry_interval

On 18/04/17 16:24, Peter Eisentraut wrote:
> On 4/16/17 22:40, Masahiko Sawada wrote:
>> Attached two patches add new GUCs apply_worker_timeout and
>> apply_worker_launch_interval which are used instead of
>> wal_receiver_timeout and wal_retrieve_retry_timeout. These new
>> parameters are not settable at worker-level so far.
> 
> Under what circumstances are these needed?  Does anyone ever set these?
> 

Personally I don't see need for apply_worker_timeout, no idea why that
can't use wal_receiver_timeout, the mechanics are exactly same, and it's
IMHO only needed because default tcp keepalive settings are usually too
generous. As for apply_worker_launch_interval, I think we want different
name so that it can be used for tablesync rate limiting as well.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Interval for launching the table sync worker

On 18/04/17 16:22, Peter Eisentraut wrote:
> On 4/13/17 06:23, Masahiko Sawada wrote:
>> Attached the latest patch. It didn't actually necessary to change
>> GetSubscriptionNotReadyRelations. I just changed the logic refreshing
>> the sync table state list.
> 
> I think this was the right direction, but then I got worried about
> having a loop within a loop to copy over the last start times.  If you
> have very many tables, that could be a big nested loop.
> 
> Here is an alternative proposal to store the last start times in a hash
> table.
> 

Hmm if we create hashtable for this, I'd say create hashtable for the
whole table_states then. The reason why it's list now was that it seemed
unnecessary to have hashtable when it will be empty almost always but
there is no need to have both hashtable + list IMHO.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] identity columns

On 4/18/17 02:07, Noah Misch wrote:
> This PostgreSQL 10 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

I am in disagreement with the submitter that this is a problem or what
the solution should be.  The discussion is ongoing.  Note that the
current state isn't actually broken, so it doesn't have to hold anything up.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] identity columns

On 4/7/17 01:26, Vitaly Burovoy wrote:
> I've implement SET GENERATED ... IF NOT EXISTS. It must be placed
> before other SET options but fortunately it conforms with the
> standard.
> Since that form always changes the sequence behind the column, I
> decided to explicitly write "[NO] CACHE" in pg_dump.
> 
> As a plus now it is possible to rename the sequence behind the column
> by specifying SEQUENCE NAME in SET GENERATED.
> 
> I hope it is still possible to get rid of the "ADD GENERATED" syntax.

I am still not fond of this change.  There is precedent all over the
place for having separate commands for creating a structure, changing a
structure, and removing a structure.  I don't understand what the
problem with that is.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Continuous buildfarm failures on hamster with bin-check

Alvaro Herrera  writes:
> Tom Lane wrote:
>> FWIW, I'm a bit suspicious of relocating the temp stats directory as
>> being a reliable fix for this.

> It's an SD card (the kind typically used in cameras and phones), not SSD.
> Saying it's slow is an understatement.  It's *excruciatingly* slow.

Oh, I misread it ... but still, the modern definition of "excruciatingly
slow" doesn't seem all that far off what 90s-era hard drives could do.
It is clear from googling though that there's an enormous performance
range in SD cards' random write performance, eg wikipedia's entry has
a link to

http://goughlui.com/2014/01/16/testing-sd-card-performance-round-up/

Seems like it's hard to judge this without knowing exactly which
SD card Michael has got in that thing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should pg_current_wal_location() become pg_current_wal_lsn()

2017-04-18 Thread David Steele

On 4/15/17 12:56 PM, Robert Haas wrote:
> On Fri, Apr 14, 2017 at 6:52 PM, Tom Lane  wrote:
>> Peter Eisentraut  writes:
>>> If we're talking about making things easier to understand, wouldn't a
>>> random user rather know what a WAL "location" is instead of a WAL "LSN"?
>> I wouldn't object to standardizing on "location" instead of "lsn" in the
>> related function and column names.  What I don't like is using different
>> words for the same thing.
> The case mentioned in the subject of this thread has been using the
> word "location" since time immemorial.  It's true that we've already
> renamed it (xlog -> wal) in this release, so if we want to standardize
> on lsn, now's certainly the time to do it.  I'm worried that
> pg_current_wal_lsn() is an identifier composed almost entirely of
> abbreviations and therefore possibly just as impenetrable as
> qx_current_pfq_dnr(), but maybe we should assume that LSN is a term of
> art with which knowledgeable users are required to be familiar, much
> as we are doing for "WAL".

+1, and I'm in favor of using "lsn" wherever applicable.  It seems to me
that if a user calls a function (or queries a table) that returns an lsn
then they should be aware of what an lsn is.

-- 
-David
da...@pgmasters.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Continuous buildfarm failures on hamster with bin-check

2017-04-18 Thread Alvaro Herrera

Tom Lane wrote:
> Michael Paquier  writes:
> > That's the point I am trying to make upthread: slow buildfarm animals
> > should have minimal impact on core code modifications. We could for
> > example have one environment variable that lists all the parameters to
> > modify in a single string and appends them at the end of
> > postgresql.conf. But honestly I don't think that this is necessary if
> > there is only one variable able to define a base directory for
> > temporary statistics as the real bottleneck comes from there at least
> > in the case of hamster.
> 
> FWIW, I'm a bit suspicious of relocating the temp stats directory as
> being a reliable fix for this.  It looks to me like hamster isn't that
> much slower than gaur/pademelon, ie the same machine that was my primary
> development machine for well over a decade, and on which I have NEVER
> seen a "stats collector not responding" failure.  Plus, if hamster's
> main storage is SSD, that seems unlikely to be slower than the spinning
> rust in gaur/pademelon.  So I can't escape the suspicion that something
> else is going on there.  Seemingly-unexplainable stats collector problems
> have been a bugaboo for a long time ...

It's an SD card (the kind typically used in cameras and phones), not SSD.
Saying it's slow is an understatement.  It's *excruciatingly* slow.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Continuous buildfarm failures on hamster with bin-check

Michael Paquier  writes:
> That's the point I am trying to make upthread: slow buildfarm animals
> should have minimal impact on core code modifications. We could for
> example have one environment variable that lists all the parameters to
> modify in a single string and appends them at the end of
> postgresql.conf. But honestly I don't think that this is necessary if
> there is only one variable able to define a base directory for
> temporary statistics as the real bottleneck comes from there at least
> in the case of hamster.

FWIW, I'm a bit suspicious of relocating the temp stats directory as
being a reliable fix for this.  It looks to me like hamster isn't that
much slower than gaur/pademelon, ie the same machine that was my primary
development machine for well over a decade, and on which I have NEVER
seen a "stats collector not responding" failure.  Plus, if hamster's
main storage is SSD, that seems unlikely to be slower than the spinning
rust in gaur/pademelon.  So I can't escape the suspicion that something
else is going on there.  Seemingly-unexplainable stats collector problems
have been a bugaboo for a long time ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] scram and \password


On 04/18/2017 08:44 AM, Noah Misch wrote:

On Tue, Apr 11, 2017 at 10:07:12PM +0300, Heikki Linnakangas wrote:

Thanks! I've been busy on the other thread on future-proofing the protocol
with negotiating the SASL mechanism, I'll come back to this once we get that
settled. By the end of the week, I presume.


This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


Took a bit longer than I predicted, but the other scram open items have 
now been resolved, and I'll start reviewing this now. I expect there 
will be 1-2 iterations on this before it's committed. I'll send another 
status update by Friday, if this isn't committed by then.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] New command to monitor progression of long running queries

2017-04-18 Thread Remi Colinet

Hello Maksim,

The core implementation I suggested for the new PROGRESS command uses
different functions from the one used by EXPLAIN for its core
implementation.
Some source code is shared with EXPLAIN command. But this shared code is
only related to quals, properties, children, subPlans and few other nodes.

All other code for PROGRESS is new code.

I don't believe explain.c code can be fully shared with the one of the new
PROGRESS command. These 2 commands have different purposes.
The core code used for the new PROGRESS command is very different from the
core code used for EXPLAIN.

I only extracted some common code from explain.c and put it in report.c
which is used by progress.c.This code is valid for Plan and PlanState.

The code shared is:

ReportPreScanNode() renamed from ExplainPreScanNode()
ReportBeginOutput() renamed from ExplainBeginOutput()
ReportEndOutput() renamed from ExplainEndOutput()
ReportOpenGroup() ...
ReportProperties() ...
ReportPropertyText() ...
ReportHasChildren() ...
ReportSubPlans() ...
ReportMemberNodes() ...
ReportCustomChildren() ...
ReportCloseGroup() ...

ExplainState has been renamed ReportState.

Regarding the queryDesc state of SQL query upon receiving a request to
report its execution progress, it does not bring any issue. The request is
noted when the signal is received by the monitored backend. Then, the
backend continues its execution code path. When interrupts are checked in
the executor code, the request will be dealt.

When the request is being dealt, the monitored backend will stop its
execution and report the progress of the SQL query. Whatever is the status
of the SQL query, progress.c code checks the status and report either that
the SQL query does not have a valid status, or otherwise the current
execution state of the SQL query.

SQL query status checking is about:
- idle transaction
- out of transaction status
- null planned statement
- utility command
- self monitoring

Other tests can be added if needed to exclude some SQL query state. Such
checking is done in void HandleProgressRequest(void).
I do not see why a SQL query progression would not be possible in this
context. Even when the queryDescc is NULL, we can just report a  output. This is currently the case with the patch suggested.

So far, I've found this new command very handy. It allows to evaluate the
time needed to complete a SQL query.

A further improvement would be to report the memory, disk and time
resources used by the monitored backend. An overuse of memory, disk and
time resources can prevent the SQL query to complete.

Best regards
Remi

2017-04-18 15:00 GMT+02:00 Maksim Milyutin :

> Hi!
>
>
> On 17.04.2017 15:09, Remi Colinet wrote:
>
>> Hello,
>>
>> I've implemented a new command named PROGRESS to monitor progression of
>> long running SQL queries in a backend process.
>>
>>
>> Use case
>> ===
>>
>> A use case is shown in the below example based on a table named t_10m
>> with 10 millions rows.
>>
>> The table has been created with :
>>
>> CREATE TABLE T_10M ( id integer, md5 text);
>> INSERT INTO T_10M SELECT generate_series(1,1000) AS id,
>> md5(random()::text) AS md5;
>>
>> 1/ Start a first psql session to run long SQL queries:
>>
>> [pgadm@rco ~]$ psql -A -d test
>> psql (10devel)
>> Type "help" for help.
>> test=#
>>
>> The option -A is used to allow rows to be output without formatting work.
>>
>> Redirect output to a file in order to let the query run without terminal
>> interaction:
>> test=# \o out
>>
>> Start a long running query:
>> test=# select * from t_10M order by md5;
>>
>> 2/ In a second psql session, list the backend pid and their SQL query
>>
>> [pgadm@rco ~]$ psql -d test
>> psql (10devel)
>> Type "help" for help.
>>
>> test=# select pid, query from pg_stat_activity ;
>>   pid  |   query
>> ---+---
>>  19081 |
>>  19084 |
>>  19339 | select pid, query from pg_stat_activity ;
>>  19341 | select * from t_10m order by md5;
>>  19727 | select * from t_10m order by md5;
>>  19726 | select * from t_10m order by md5;
>>  19079 |
>>  19078 |
>>  19080 |
>> (9 rows)
>>
>> test=#
>>
>> Chose the pid of the backend running the long SQL query to be monitored.
>> Above example is a parallel SQL query. Lowest pid is the main backend of
>> the query.
>>
>> test=# PROGRESS 19341;
>>PLAN
>> PROGRESS
>> 
>> ---
>>  Gather Merge
>>->  Sort=> dumping tuples to tapes
>>  rows r/w merge 0/0 rows r/w effective 0/2722972 0%
>>  Sort Key: md5
>>  ->  Parallel Seq Scan on t_10m => rows 2751606/3954135 69% blks
>> 125938/161222 78%
>> (5 rows)
>>
>> test=#
>>
>> The query of the monitored backend is:
>> test=# select * from t_10M order by md5;
>>
>> Because the table has 10 millions of rows, the sort is done on tapes.
>>
>>
>> Design of the

Re: [HACKERS] Inadequate parallel-safety check for SubPlans

2017-04-18 Thread Amit Kapila

On Mon, Apr 17, 2017 at 10:54 AM, Tom Lane  wrote:
> Amit Kapila  writes:
>> On Sun, Apr 16, 2017 at 9:30 PM, Tom Lane  wrote:
>>> FYI, I have this on my to-look-at list, and expect to fix it before Robert
>>> returns from vacation.
>
>> Let me know if an initial patch by someone else can be helpful?
>
> Sure, have a go at it.  I won't get to this for a day or so ...
>

I have ended up doing something along the lines suggested by you (or
at least what I have understood from your e-mail).  Basically, pass
the safe-param-ids list to parallel safety function and decide based
on that if Param node reference in input expression is safe.  Note
that I have added a new parameter paramids list to build_subplan to
make AlternativeSubPlan case consistent with SubPlan case even though
we should be able to do without that as it seems to be exercised only
for the correlated EXISTS case.  Also, I think we don't need to check
parallel-safety for splan->args as that also is related to correlated
queries for which parallelism is not supported as of now.

I have also explored it to do it by checking parallel-safety of
testexpr before PARAM_EXEC params are replaced in testexpr, however
that won't work because we add PARAM_SUBLINK params at the time of
parse-analysis in transformSubLink().  I am not sure if it is a good
idea to test parallel-safety of testexpr at the time of parse-analysis
phase, so didn't pursue that path.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

fix_parallel_safety_info_subplans_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication launcher uses wal_retrieve_retry_interval

On 4/16/17 22:40, Masahiko Sawada wrote:
> Attached two patches add new GUCs apply_worker_timeout and
> apply_worker_launch_interval which are used instead of
> wal_receiver_timeout and wal_retrieve_retry_timeout. These new
> parameters are not settable at worker-level so far.

Under what circumstances are these needed?  Does anyone ever set these?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Interval for launching the table sync worker

On 4/13/17 06:23, Masahiko Sawada wrote:
> Attached the latest patch. It didn't actually necessary to change
> GetSubscriptionNotReadyRelations. I just changed the logic refreshing
> the sync table state list.

I think this was the right direction, but then I got worried about
having a loop within a loop to copy over the last start times.  If you
have very many tables, that could be a big nested loop.

Here is an alternative proposal to store the last start times in a hash
table.

(We also might want to lower the interval for the test suite, because
there are intentional failures in there, and this patch or one like it
will cause the tests to run a few seconds longer.)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0001-Wait-between-tablesync-worker-restarts.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Passing values to a dynamic background worker

2017-04-18 Thread Keith Fiske

On Tue, Apr 18, 2017 at 5:40 AM, Amit Langote  wrote:

> On 2017/04/18 18:12, Kyotaro HORIGUCHI wrote:
> > At Mon, 17 Apr 2017 16:19:13 -0400, Keith Fiske wrote:
> >> So after reading a recent thread on the steep learning curve for PG
> >> internals [1], I figured I'd share where I've gotten stuck with this in
> a
> >> new thread vs hijacking that one.
> >>
> >> One of the goals I had with pg_partman was to see if I could get the
> >> partitioning python scripts redone as C functions using a dynamic
> >> background worker to be able to commit in batches with a single call. My
> >> thinking was to have a user-function that can accept arguments for
> things
> >> like the interval value, batch size, and other arguments to the python
> >> script, then start/stop a dynamic bgw up for each batch so it can commit
> >> after each one. The dymanic bgw would essentially just have to call the
> >> already existing partition_data() plpgsql function, but I have to be
> able
> >> to pass the argument values that the user gave down into the dynamic
> bgw.
> >>
> >> I've reached a roadblock in that bgw_main_arg can only accept a single
> >> argument that must be passed by value for a dynamic bgw. I already
> worked
> >> around this for passing the database name to the my existing use of a
> bgw
> >> with doing partition maintenance (pass a simple integer to use as an
> index
> >> array value). But I'm not sure how to do this for passing multiple
> values
> >> in. I'm assuming this would be the place where I'd see about storing
> values
> >> in shared memory to be able to re-use later? I'm not even sure if that's
> >> the right approach, and if it is, where to even start to understand how
> to
> >> do that.
> >
> > On the other hand, AFAICS, DSM doesn't seem well documented. I
> > mangaged to find a related document in Postgres Wiki but it seems
> > a bit old.
> >
> > https://wiki.postgresql.org/wiki/Parallel_Internal_Sort
> >
> > This is a little complex than static shared memory, and it is
> > *not* guaranteed to mapped at the same address among workers. You
> > will see an instance in LaunchParallelWorkers() and the related
> > functions in parallel.c. The basic of its usage would be as the
> > follows.
> >
> > - Create a segment :
> >dsm_segment *seg = dsm_create(size);
> > - Send its handle via the bgw_main_arg.
> >worker.bgw_main_arg = dsm_segment_handle(seg);
> > - Attach the memory on the other side.
> >dsm_segment *seg = dsm_attach(main_arg);
> >
> > On both side, the address of the attached shared memory is
> > obtained using dsm_segment_address(seg).
> >
> > dsm_detach(seg) detaches the segment. All users of this segment
> > detach the segment, it will be destroyed.
>
> Perhaps, the more modern DSA mechanism could be applicable here, too.
>
> Some recent commits demonstrate examples of DSA usage, such as BRIN
> autosummarization commit (7526e10224f) and tidbitmap.c's shared iteration
> support commit (98e6e89040a05).
>
> Thanks,
> Amit
>
>
Thank you both very much for the suggestions!

Keith

Re: [HACKERS] Continuous buildfarm failures on hamster with bin-check



On 04/18/2017 08:59 AM, Michael Paquier wrote:
>> Lets's say we have a bunch of possible environment settings with names
>> that all begin with "PG_TAP_" PostgresNode.pm could check for the
>> existence of these and take action accordingly, and you could set them
>> on a buildfarm animal in the config file, or for interactive use in your
>> .profile.
> That's the point I am trying to make upthread: slow buildfarm animals
> should have minimal impact on core code modifications. We could for
> example have one environment variable that lists all the parameters to
> modify in a single string and appends them at the end of
> postgresql.conf. But honestly I don't think that this is necessary if
> there is only one variable able to define a base directory for
> temporary statistics as the real bottleneck comes from there at least
> in the case of hamster. When initializing a node via PostgresNode.pm,
> we would just check for this variable, and the init() routine just
> creates a temporary folder in it, setting up temp_stats_path in
> postgresql.conf.



How is that going to deal with your wal_*_timeout etc settings?

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] New command to monitor progression of long running queries

2017-04-18 Thread Maksim Milyutin


Hi!

On 17.04.2017 15:09, Remi Colinet wrote:

Hello,

I've implemented a new command named PROGRESS to monitor progression of
long running SQL queries in a backend process.


Use case
===

A use case is shown in the below example based on a table named t_10m
with 10 millions rows.

The table has been created with :

CREATE TABLE T_10M ( id integer, md5 text);
INSERT INTO T_10M SELECT generate_series(1,1000) AS id,
md5(random()::text) AS md5;

1/ Start a first psql session to run long SQL queries:

[pgadm@rco ~]$ psql -A -d test
psql (10devel)
Type "help" for help.
test=#

The option -A is used to allow rows to be output without formatting work.

Redirect output to a file in order to let the query run without terminal
interaction:
test=# \o out

Start a long running query:
test=# select * from t_10M order by md5;

2/ In a second psql session, list the backend pid and their SQL query

[pgadm@rco ~]$ psql -d test
psql (10devel)
Type "help" for help.

test=# select pid, query from pg_stat_activity ;
  pid  |   query
---+---
 19081 |
 19084 |
 19339 | select pid, query from pg_stat_activity ;
 19341 | select * from t_10m order by md5;
 19727 | select * from t_10m order by md5;
 19726 | select * from t_10m order by md5;
 19079 |
 19078 |
 19080 |
(9 rows)

test=#

Chose the pid of the backend running the long SQL query to be monitored.
Above example is a parallel SQL query. Lowest pid is the main backend of
the query.

test=# PROGRESS 19341;
   PLAN
PROGRESS
---
 Gather Merge
   ->  Sort=> dumping tuples to tapes
 rows r/w merge 0/0 rows r/w effective 0/2722972 0%
 Sort Key: md5
 ->  Parallel Seq Scan on t_10m => rows 2751606/3954135 69% blks
125938/161222 78%
(5 rows)

test=#

The query of the monitored backend is:
test=# select * from t_10M order by md5;

Because the table has 10 millions of rows, the sort is done on tapes.


Design of the command
=

The design of the patch/command is:
- the user issue the "PROGRESS pid" command from a psql session. The pid
is the one of the backend which runs the SQL query for which we want to
get a progression report. It can be determined from the view
pg_stat_activity.
- the monitoring backend, upon receiving the "PROGRESS pid" command from
psql utility used in step above, sends a signal to the backend whose
process pid is the one provided in the PROGRESS command.
- the monitored backend receives the signal and notes the request as for
any interrupt. Then, it continues its execution of its SQL query until
interrupts can be serviced.
- when the monitored process can service the interrupts, it deals with
the progress request by collecting its execution tree with the execution
progress of each long running node. At this time, the SQL query is no
more running. The progression of each node is calculated during the
execution of the SQL query which is at this moment stopped. The
execution tree is dumped in shared memory pages allocated at the start
of the server. Then, the monitored backend set a latch on which the
monitoring process is waiting for. It then continues executing its SQL
query.
- the monitoring backend collects the share memory data dumped by the
monitored backed, and sends it to its psql session, as a list of rows.

The command PROGRESS does not incur any slowness on the running query
because the execution progress is only computed upon receiving the
progress request which is supposed to be seldom used.

The code heavily reuses the one of the explain command. In order to
share as much code as possible with the EXPLAIN command, part of the
EXPLAIN code which deals with reporting quals for instance, has been
moved to a new report.c file in the src/backend/commands folder. This
code in report.c is shared between explain.c source code and PROGRESS
command source code which is in progress.c file.

The progression reported by PROGRESS command is given in terms of rows,
blocks, bytes and percents. The values displayed depend on the node type
in the execution plan.

The current patch implements all the possible nodes which could take a
lot of time:
- Sequential scan nodes with rows and block progress (node type
T_SeqScan, T_SampleScan, T_BitmapHeaepScan, T_SubqueryScan,
T_FunctionScan, T_ValuesScan, T_CteScan, T_WorkTableScan)
- Tuple id scan node with rows and blocks progress (T_TidScan)
- Limit node with rows progress (T_Limit)
- Foreign and custom scan with rows and blocks progress (T_ForeignScan,
T_CustomScan)
- Index scan, index only scan and bitmap index scan with rows and blocks
progress


Use cases


Some further examples of use are shown below in the test_v1.txt file.


What do you make of this idea/patch?

Does it make sense?

Any suggestion is welcome.

The current patch is still work in progress. It is meanwhile stable. It

Re: [HACKERS] Continuous buildfarm failures on hamster with bin-check

On Tue, Apr 18, 2017 at 9:35 PM, Andrew Dunstan
 wrote:
>
>
> On 04/18/2017 08:23 AM, Michael Paquier wrote:
>> On Tue, Apr 18, 2017 at 9:13 PM, Andrew Dunstan
>>  wrote:
>>> Yeah, but the way you have done it could also to lead to errors unless
>>> you're very careful, as I found on axolotl (which died recently,
>>> unfortunately). There I had to set the stats_temp directory to a
>>> branch-specific name so a crash on branch A didn't leave stats picked up
>>> by a run on branch B. I had a script that ran before every buildfarm run
>>> that made sure the branch-specific directories existed on the tmpfs.
>> I didn't consider that. Still hamster is only set to run master so
>> that's not a problem. Running more branches would be too much for this
>> little dear as it includes TAP tests.
>>
  but it is not possible to set up that with the
 TAP tests. I could always disable --enable-tap-tests on this machine
 but I don't think that this is a correct answer. Something that could
 be done is to use an environment variable to set up a base directory
 for all the nodes created in PostgresNode.pm, and use that for
 temporary statistics with a custom folder name. But that's not
 scalable if we want to enforce more parameter.
>>> What more parameters do you want?
>> Increasing wal_sender_timeout and wal_receiver_timeout can help in
>> reducing the failures seen.
>
> OK, but you're only talking about a handful of these, right?

Yup, that would be one solution but that's not attacking the problem
at its root.

> Lets's say we have a bunch of possible environment settings with names
> that all begin with "PG_TAP_" PostgresNode.pm could check for the
> existence of these and take action accordingly, and you could set them
> on a buildfarm animal in the config file, or for interactive use in your
> .profile.

That's the point I am trying to make upthread: slow buildfarm animals
should have minimal impact on core code modifications. We could for
example have one environment variable that lists all the parameters to
modify in a single string and appends them at the end of
postgresql.conf. But honestly I don't think that this is necessary if
there is only one variable able to define a base directory for
temporary statistics as the real bottleneck comes from there at least
in the case of hamster. When initializing a node via PostgresNode.pm,
we would just check for this variable, and the init() routine just
creates a temporary folder in it, setting up temp_stats_path in
postgresql.conf.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Failed recovery with new faster 2PC code

On 18 April 2017 at 13:12, Michael Paquier  wrote:
> On Tue, Apr 18, 2017 at 7:54 PM, Simon Riggs  wrote:
>> Yeh, this is better. Pushed.
>
> I have been outraced on this one, the error is obvious once you see it ;)

Didn't realise you were working on it, nothing competitive about it.

It's clear this needed fixing, whether or not it fixes Jeff's report.

I do think it explains the report, so I'm hopeful.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Continuous buildfarm failures on hamster with bin-check



On 04/18/2017 08:23 AM, Michael Paquier wrote:
> On Tue, Apr 18, 2017 at 9:13 PM, Andrew Dunstan
>  wrote:
>> Yeah, but the way you have done it could also to lead to errors unless
>> you're very careful, as I found on axolotl (which died recently,
>> unfortunately). There I had to set the stats_temp directory to a
>> branch-specific name so a crash on branch A didn't leave stats picked up
>> by a run on branch B. I had a script that ran before every buildfarm run
>> that made sure the branch-specific directories existed on the tmpfs.
> I didn't consider that. Still hamster is only set to run master so
> that's not a problem. Running more branches would be too much for this
> little dear as it includes TAP tests.
>
>>>  but it is not possible to set up that with the
>>> TAP tests. I could always disable --enable-tap-tests on this machine
>>> but I don't think that this is a correct answer. Something that could
>>> be done is to use an environment variable to set up a base directory
>>> for all the nodes created in PostgresNode.pm, and use that for
>>> temporary statistics with a custom folder name. But that's not
>>> scalable if we want to enforce more parameter.
>> What more parameters do you want?
> Increasing wal_sender_timeout and wal_receiver_timeout can help in
> reducing the failures seen.


OK, but you're only talking about a handful of these, right?

Lets's say we have a bunch of possible environment settings with names
that all begin with "PG_TAP_" PostgresNode.pm could check for the
existence of these and take action accordingly, and you could set them
on a buildfarm animal in the config file, or for interactive use in your
.profile.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On How To Shorten the Steep Learning Curve Towards PG Hacking...



On 04/18/2017 03:54 AM, Craig Ringer wrote:
>> But almost nothing about The Internals of PostgreSQL:
> Not surprising. They'd go out of date fast, be a huge effort to write
> and maintain, and sell poorly given the small audience.
>
> Print books probably aren't the way forward here.
>


Agreed,  a well organized web site would work much better.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On How To Shorten the Steep Learning Curve Towards PG Hacking...



On 04/18/2017 03:41 AM, Kang Yuzhe wrote:
>
>
> But almost nothing about The Internals of PostgreSQL:
> 1. The Internals of PostgreSQL:
> http://www.interdb.jp/pg/index.html  translated from Japanese Book
> 2. PostgreSQL数据库内核分析(Chinese) Book on the Internals of PostgreSQL:
> 3. PG Docs/site
> 4. some here and there which are less useful
>



I agree that this is an area where more material would be very welcome,
and not only to newcomers. #1 is useful as far as it goes, but the
missing bits (esp. Query Processing) are critical.



> Lastly, I have come to understand that PG community is not
> harsh/intimidating to newbies and thus, I am feeling at home.
>
>


Glad you have found it so.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Continuous buildfarm failures on hamster with bin-check

On Tue, Apr 18, 2017 at 9:13 PM, Andrew Dunstan
 wrote:
> Yeah, but the way you have done it could also to lead to errors unless
> you're very careful, as I found on axolotl (which died recently,
> unfortunately). There I had to set the stats_temp directory to a
> branch-specific name so a crash on branch A didn't leave stats picked up
> by a run on branch B. I had a script that ran before every buildfarm run
> that made sure the branch-specific directories existed on the tmpfs.

I didn't consider that. Still hamster is only set to run master so
that's not a problem. Running more branches would be too much for this
little dear as it includes TAP tests.

>>  but it is not possible to set up that with the
>> TAP tests. I could always disable --enable-tap-tests on this machine
>> but I don't think that this is a correct answer. Something that could
>> be done is to use an environment variable to set up a base directory
>> for all the nodes created in PostgresNode.pm, and use that for
>> temporary statistics with a custom folder name. But that's not
>> scalable if we want to enforce more parameter.
>
> What more parameters do you want?

Increasing wal_sender_timeout and wal_receiver_timeout can help in
reducing the failures seen.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Continuous buildfarm failures on hamster with bin-check



On 04/18/2017 03:07 AM, Michael Paquier wrote:
> Hi all,
>
> Some of you may have noticed that hamster is heavily red on the
> buildfarm. I have done a bit of investigation, and I am able to
> reproduce the failure manually. But actually after looking at the logs
> the error has obviously showed up:
> 2017-04-16 05:07:19.650 JST [18282] LOG:  database system is ready to
> accept connections
> 2017-04-16 05:08:36.725 JST [18296] LOG:  using stale statistics
> instead of current ones because stats collector is not responding
> 2017-04-16 05:10:22.207 JST [18303] t/010_pg_basebackup.pl LOG:
> terminating walsender process due to replication timeout
> 2017-04-16 05:10:30.180 JST [18306] LOG:  using stale statistics
> instead of current ones because stats collector is not responding
>
> Stale regressions means that the system is just constrained so much
> that things are timing out.
>
> In order to avoid such failures with normal regression tests, I have
> set up extra_config so as stats_temp_directory goes to a tmpfs to
> avoid stale statistics,


Yeah, but the way you have done it could also to lead to errors unless
you're very careful, as I found on axolotl (which died recently,
unfortunately). There I had to set the stats_temp directory to a
branch-specific name so a crash on branch A didn't leave stats picked up
by a run on branch B. I had a script that ran before every buildfarm run
that made sure the branch-specific directories existed on the tmpfs.



>  but it is not possible to set up that with the
> TAP tests. I could always disable --enable-tap-tests on this machine
> but I don't think that this is a correct answer. Something that could
> be done is to use an environment variable to set up a base directory
> for all the nodes created in PostgresNode.pm, and use that for
> temporary statistics with a custom folder name. But that's not
> scalable if we want to enforce more parameter.
>
>

What more parameters do you want?

cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Failed recovery with new faster 2PC code

On Tue, Apr 18, 2017 at 7:54 PM, Simon Riggs  wrote:
> Yeh, this is better. Pushed.

I have been outraced on this one, the error is obvious once you see it ;)

Thanks for the investigation and the fix! I have spent a couple of
hours reviewing the interactions between the shmem entries of 2PC
state data created at the beginning of recovery and all the lookups in
procarray.c and varsup.c, noticing nothing by the way.

> The bug was that the loop set gxact to be the last entry in the array,
> causing the exit condition to fail and us then to remove the last
> gxact from memory even when it didn't match the xid, removing a valid
> entry too early. That then allowed xmin to move forwards, which causes
> autovac to remove pg_xact entries earlier than needed.
>
> Well done for finding that one, thanks for the patch.

Running Jeff's test suite, I can confirm that there are no problems now.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication and synchronous replication

2017-04-18 Thread Craig Ringer

On 18 April 2017 at 18:55, Masahiko Sawada  wrote:
> Hi,
>
> As doc of logical decoding said as a note[1], logical replication can
> support the synchronous replication with some restriction. But In
> addition to this, IIUC in logical replication decoded data is sent to
> subscribers only when the commit WAL record is decoded (calls
> ReorderBufferCommit) .

Correct.

> It means that the local SQL execution and
> applying the decoded data on subscriber side are always executed in a
> sequential order, and the response time can simply be doubled or even
> more (OTOH a good point is that decoded data of aborted transaction is
> never sent to subscriber). I think there will be a lot of peoples who
> want to use logical synchronous replication but this is a big
> restriction for such user. I think we should document it or deal with
> it.
> Thought?

Definitely should be documented. I think it's covered under logical
decoding, but needs at least an xref.
-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SCRAM authentication, take three

2017-04-18 Thread Magnus Hagander

On Tue, Apr 18, 2017 at 1:52 PM, Heikki Linnakangas  wrote:

> On 04/14/2017 10:33 PM, Peter Eisentraut wrote:
>
>> On 4/11/17 01:10, Heikki Linnakangas wrote:
>>
>>> That question won't arise in practice. Firstly, if the server can do
>>> scram-sha-256-plus, it presumably can also do scram-sha-512-plus. Unless
>>> there's a change in the way the channel binding works, such that the
>>> scram-sha-512-plus variant needs a newer version of OpenSSL or
>>> something. Secondly, the user's pg_authid row will contain a
>>> SCRAM-SHA-256 or SCRAM-SHA-512 verifier, not both, so that will dictate
>>> which one to use.
>>>
>>
>> Right.  So putting the actual password method in pg_hba.conf isn't going
>> to be useful very often.
>>
>> I think the most practical thing that the user wants in pg_hba.conf is
>> "best password method supported by what is in pg_authid".  This is
>> currently spelled "md5", which is of course pretty weird.  And it will
>> become weirder over time.
>>
>> I think we want to have a new keyword in pg_hba.conf for that, one which
>> does not indicate any particular algorithm or method (so not "scram" or
>> "sasl").
>>
>> We could use "password".  If we think that "md5" can mean md5-or-beyond,
>> then maybe "password" can mean password-or-md5-or-beyond.
>>
>> Or otherwise a completely new word.
>>
>> We also want to give users/admins a way to phase out old methods or set
>> some policy.  We could either make a global GUC setting
>> password_methods='md5 scram-sha-256' and/or make that an option in
>> pg_hba.conf past the method field.
>>
>
> Yeah, that would be reasonable. It can't be called just "password",
> though, because there's no way to implement "password-or-md5-or-scram" in a
> sensible way (see my reply to Simon at [1]). Unless we remove the support
> for what "password" does today altogether, and redefine "password" to mean
> just "md5-or-beyond". Which might not be a bad idea, but that's a separate
> discussion.
>

It is an interesting one though. "password" today is really only useful in
the case of db_user_namespace=on, right? Given the very few people I think
are using that feature, it wouldn't be unreasonable to rename it to
something more closely related to that.

However, that would also leave us in the position to explain "before 10,
avoid using password because it stores in clear text. after 10, we
recommend you use password". Reusing something that's existed before and
not really been secure for something that would be a good choice in the
future seems like a bad idea.

But we can also implement this functionality but under a differet name.
Like just "hashed" or something, which would mean md5-or-scram?



> In any case, I think we would probably still need more fine-grained
> control, too, so we would still need to have "scram-sha-256" as a method
> you can specify directly in pg_hba.conf. So I consider this as a separate,
> new, feature that we can add in the future, if it seems worth the effort.
>

Yes, I think wherever we go we don't want to loose the fine-grained
control. But some people will be happier for not having to use it.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: [HACKERS] SCRAM authentication, take three


On 04/14/2017 10:33 PM, Peter Eisentraut wrote:

On 4/11/17 01:10, Heikki Linnakangas wrote:

That question won't arise in practice. Firstly, if the server can do
scram-sha-256-plus, it presumably can also do scram-sha-512-plus. Unless
there's a change in the way the channel binding works, such that the
scram-sha-512-plus variant needs a newer version of OpenSSL or
something. Secondly, the user's pg_authid row will contain a
SCRAM-SHA-256 or SCRAM-SHA-512 verifier, not both, so that will dictate
which one to use.


Right.  So putting the actual password method in pg_hba.conf isn't going
to be useful very often.

I think the most practical thing that the user wants in pg_hba.conf is
"best password method supported by what is in pg_authid".  This is
currently spelled "md5", which is of course pretty weird.  And it will
become weirder over time.

I think we want to have a new keyword in pg_hba.conf for that, one which
does not indicate any particular algorithm or method (so not "scram" or
"sasl").

We could use "password".  If we think that "md5" can mean md5-or-beyond,
then maybe "password" can mean password-or-md5-or-beyond.

Or otherwise a completely new word.

We also want to give users/admins a way to phase out old methods or set
some policy.  We could either make a global GUC setting
password_methods='md5 scram-sha-256' and/or make that an option in
pg_hba.conf past the method field.


Yeah, that would be reasonable. It can't be called just "password", 
though, because there's no way to implement "password-or-md5-or-scram" 
in a sensible way (see my reply to Simon at [1]). Unless we remove the 
support for what "password" does today altogether, and redefine 
"password" to mean just "md5-or-beyond". Which might not be a bad idea, 
but that's a separate discussion.


In any case, I think we would probably still need more fine-grained 
control, too, so we would still need to have "scram-sha-256" as a method 
you can specify directly in pg_hba.conf. So I consider this as a 
separate, new, feature that we can add in the future, if it seems worth 
the effort.


I've committed a simple renaming of "scram" to "scram-sha-256", as the 
pg_hba.conf and password_encryption option. I think that will do for v10.


[1] 
https://www.postgresql.org/message-id/fa6cec54-4fa9-756d-53be-a5ba3d03d...@iki.fi


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Logical replication ApplyContext bloat

2017-04-18 Thread Stas Kelvich

Hi,

currently logical replication worker uses ApplyContext to decode received data
and that context is never freed during transaction processing. Hence if 
publication
side is performing something like 10M row inserts in single transaction, then
subscription worker will occupy more than 10G of ram (and can be killed by OOM).

Attached patch resets ApplyContext after each insert/update/delete/commit.
I’ve tried to reset context only on each 100/1000/1 value of CommandCounter,
but didn’t spotted any measurable difference in speed.

Problem spotted by Mikhail Shurutov.



applycontext_bloat.patch
Description: Binary data


Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-04-18 Thread Maksim Milyutin


On 18.04.2017 13:08, Amit Langote wrote:

Hi,



Hi, Amit!


On 2017/04/17 23:00, Maksim Milyutin wrote:


Ok, thanks for the note.

But I want to discuss the relevancy of introduction of a new relkind for
partitioned index. I could to change the control flow in partitioned index
creation (specify conditional statement in the 'index_create' routine in
attached patch) and not enter to the 'heap_create' routine. This case
releases us from integrating new relkind into different places of Postgres
code. But we have to copy-paste some specific code from 'heap_create'
function, e.g., definition of relfilenode and tablespaceid for the new
index and perhaps something more when 'heap_create' routine will be extended.


I may be missing something, but isn't it that a new relkind will be needed
anyway?  How does the rest of the code distinguish such index objects once
they are created?


Local partitioned indexes can be recognized through the check on the 
relkind of table to which the index refers. Something like this:


heap = relation_open(IndexGetRelation(indexid, false), heapLockmode);
if (heap->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
/* indexid is local index on partitioned table */


Is it possible that some other code may try to access
the storage for an index whose indrelid is a partitioned table?



Thеsе cases must be caught. But as much as partitioned tables doesn't 
participate in query plans their indexes are unaccessible by executor. 
Reindex operation is overloaded with my patch.



--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Logical replication and synchronous replication

Hi,

As doc of logical decoding said as a note[1], logical replication can
support the synchronous replication with some restriction. But In
addition to this, IIUC in logical replication decoded data is sent to
subscribers only when the commit WAL record is decoded (calls
ReorderBufferCommit) . It means that the local SQL execution and
applying the decoded data on subscriber side are always executed in a
sequential order, and the response time can simply be doubled or even
more (OTOH a good point is that decoded data of aborted transaction is
never sent to subscriber). I think there will be a lot of peoples who
want to use logical synchronous replication but this is a big
restriction for such user. I think we should document it or deal with
it.
Thought?

[1] 
https://www.postgresql.org/docs/devel/static/logicaldecoding-synchronous.html

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-04-18 Thread Ashutosh Bapat

On Wed, Apr 5, 2017 at 8:39 AM, Robert Haas  wrote:
> On Tue, Apr 4, 2017 at 10:22 AM, Ashutosh Bapat
>  wrote:
>> Yes, I agree. For an inner join, the partition key types need to "shrink"
>> and for outer join they need to be "widened". I don't know if there is a way
>> to know "wider" or "shorter" of two given types. We might have to implement
>> a method to merge partition keys to produce partition key of the join, which
>> may be different from either of the partition keys. So, after-all we may
>> have to abandon the idea of canonical partition scheme. I haven't included
>> this change in the attached set of patches.
>
> I think this is why you need to regard the partitioning scheme as
> something more like an equivalence class - possibly the partitioning
> scheme should actually contain (or be?) an equivalence class.  Suppose
> this is the query:
>
> SELECT * FROM i4 INNER JOIN i8 ON i4.x = i8.x;
>
> ...where i4 (x) is an int4 partitioning key and i8 (x) is an int8
> partitioning key.  It's meaningless to ask whether the result of the
> join is partitioned by int4 or int8.  It's partitioned by the
> equivalence class that contains both i4.x and i8.x.  If the result of
> this join where joined to another table on either of those two
> columns, a second partition-wise join would be theoretically possible.
> If you insist on knowing the type of the partitioning scheme, rather
> than just the opfamily, you've boxed yourself into a corner from which
> there's no good escape.

When we merge partition bounds from two relations with different
partition key types, the merged partition bounds need to have some
information abound the way those constants look like e.g. their
length, structure etc. That's the reason we need to store partition
key types of merged partitioning scheme. Consider a three way join (i4
JOIN i8 ON i4.x = i8.x) JOIN i2 ON (i2.x = i.x). When we compare
partition bounds of i4 and i8, we use operators for int4 and int8. The
join i4 JOIN i8 will get partition bounds by merging those of i4 and
i8. When we come to join with i2, we need to know which operators to
use for comparing the partition bounds of the join with those of i2.

So, if the partition key types of the joining relations differ (but
they have matching partitioning schemes per strategy, natts and
operator family) the partition bounds of the join are converted to the
wider type among the partition key types of the joining tree.
Actually, as I am explained earlier we could choose a wider outer type
for an OUTER join and shorter type for inner join. This type is used
as partition key type of the join. In the above case join between i4
and i8 have its partition bounds converted to i8 (or i4) and then when
it is joined with i2 the partition bounds of the join are converted to
i8 (or i2).

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Failed recovery with new faster 2PC code

On 18 April 2017 at 09:57, Nikhil Sontakke  wrote:

> Please find attached a second version of my bug fix which is stylistically
> better and clearer than the first one.

Yeh, this is better. Pushed.

The bug was that the loop set gxact to be the last entry in the array,
causing the exit condition to fail and us then to remove the last
gxact from memory even when it didn't match the xid, removing a valid
entry too early. That then allowed xmin to move forwards, which causes
autovac to remove pg_xact entries earlier than needed.

Well done for finding that one, thanks for the patch.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PANIC in pg_commit_ts slru after crashes

On Tue, Apr 18, 2017 at 7:05 PM, Simon Riggs  wrote:
> I've added a recheck in ProcessTwoPhaseBuffer() after we acquire the lock.
>
> If its worth acquiring the lock its worth checking we don't have a race.

I see. No objections to that.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Continuous buildfarm failures on hamster with bin-check

On Tue, Apr 18, 2017 at 4:15 PM, Andres Freund  wrote:
> Hi,
>
> On 2017-04-18 16:07:38 +0900, Michael Paquier wrote:
>> Some of you may have noticed that hamster is heavily red on the
>> buildfarm. I have done a bit of investigation, and I am able to
>> reproduce the failure manually. But actually after looking at the logs
>> the error has obviously showed up:
>> 2017-04-16 05:07:19.650 JST [18282] LOG:  database system is ready to
>> accept connections
>> 2017-04-16 05:08:36.725 JST [18296] LOG:  using stale statistics
>> instead of current ones because stats collector is not responding
>> 2017-04-16 05:10:22.207 JST [18303] t/010_pg_basebackup.pl LOG:
>> terminating walsender process due to replication timeout
>> 2017-04-16 05:10:30.180 JST [18306] LOG:  using stale statistics
>> instead of current ones because stats collector is not responding
>>
>> Stale regressions means that the system is just constrained so much
>> that things are timing out.
>>
>> In order to avoid such failures with normal regression tests, I have
>> set up extra_config so as stats_temp_directory goes to a tmpfs to
>> avoid stale statistics
>
> How high do you need to make the hardcoded limit for this to succeed
> without a tmpfs?

Increasing wal_sender_timeout helps visibly to reduce the failure
rate. With 10 attempts I can see before at least 3 failures, and
nothing after.

> If hamster fails this regularly I think we have to do
> something about it, rather than paper over it.  What's the storage
> situation currently like?

The SD card of this RPI is half-empty.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Local indexes for partitioned table

Hi,

On 2017/04/17 23:00, Maksim Milyutin wrote:
> On 10.04.2017 14:20, Robert Haas wrote:
>> On Tue, Apr 4, 2017 at 12:10 PM, Maksim Milyutin
>>  wrote:
>>> 1. I have added a new relkind for local indexes named RELKIND_LOCAL_INDEX
>>> (literal 'l').
>>
>> Seems like it should maybe be RELKIND_PARTITIONED_INDEX.  There's
>> nothing particularly "local" about it.  I suppose what you're going
>> for is that it's not global, but in a way it *is* global to the
>> partitioning hierarchy.  That's the point.  It's just that it's
>> partitioned.
>>
> 
> Ok, thanks for the note.
> 
> But I want to discuss the relevancy of introduction of a new relkind for
> partitioned index. I could to change the control flow in partitioned index
> creation (specify conditional statement in the 'index_create' routine in
> attached patch) and not enter to the 'heap_create' routine. This case
> releases us from integrating new relkind into different places of Postgres
> code. But we have to copy-paste some specific code from 'heap_create'
> function, e.g., definition of relfilenode and tablespaceid for the new
> index and perhaps something more when 'heap_create' routine will be extended.

I may be missing something, but isn't it that a new relkind will be needed
anyway?  How does the rest of the code distinguish such index objects once
they are created?  Is it possible that some other code may try to access
the storage for an index whose indrelid is a partitioned table?

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki

2017-04-18 Thread Fabien COELHO



Hello Jan,


It seems that the patch does not apply anymore on head due to changes in
psql non regression tests. Could you rebase?


This should work on current master (all test passed).


Patch applies, compiles and make check is ok.

There are different flavour of markdown, maybe you should document which 
one is targetted. Should it be CommonMark? Another variant? Why?


ISTM that the md format lacks escaping for special md characters:

 fabien=# SELECT E'\\n\n' AS foo;
 │ foo  │
 |--|
 │ \n

I'd say that you need to do escaping more or less similar to html?

Also, it seems that you use distinct vertical bar characters in the 
format? Or is this a trick of my terminal?? It seems that your patch 
introduces U+2502 (BOX DRAWINGS LIGHT VERTICAL) instead of the usual pipe 
in some places. Maybe you copy-pasted things from the unicode linestyle.


Why are *_newline variants added for length and formatting? Would it be 
possible to do without, say by relying on the line count computed by the 
standard function for instance?


The help line is too long, I would suggest not to add the new formats, 
the list is already truncated with "..." for other formats.


In the sgml documentation, you introduce tab characters, where only spaces 
should be used.


pg_markdown contains a spurious space between a comma and a newline.

--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PANIC in pg_commit_ts slru after crashes

On 18 April 2017 at 09:51, Simon Riggs  wrote:
> On 17 April 2017 at 16:33, Jeff Janes  wrote:
>> On Sun, Apr 16, 2017 at 6:59 PM, Michael Paquier 
>> wrote:
>>>
>>>
>>>
>>> Jeff, does this patch make the situation better? The fix is rather
>>> simple as it just makes sure that the next XID never gets updated if
>>> there are no 2PC files.
>>
>>
>> Yes, that fixes the reported case when 2PC are not being used.
>
> Thanks Jeff.
>
> I certainly prefer the simplicity of Michael's approach. I'll move to commit.

Minor change to patch.

I've added a recheck in ProcessTwoPhaseBuffer() after we acquire the lock.

If its worth acquiring the lock its worth checking we don't have a race.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


2pc-restore-fix.v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Quorum commit for multiple synchronous replication.

On Tue, Apr 18, 2017 at 6:40 PM, Kyotaro HORIGUCHI
 wrote:
> At Tue, 18 Apr 2017 14:58:50 +0900, Masahiko Sawada  
> wrote in 
>> On Tue, Apr 18, 2017 at 3:04 AM, Fujii Masao  wrote:
>> > On Wed, Apr 12, 2017 at 2:36 AM, Masahiko Sawada  
>> > wrote:
>> >> On Thu, Apr 6, 2017 at 4:17 PM, Masahiko Sawada  
>> >> wrote:
>> >>> On Thu, Apr 6, 2017 at 10:51 AM, Noah Misch  wrote:
>>  On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote:
>> > On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch  wrote:
>> > > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
>> > >> Regarding this feature, there are some loose ends. We should work on
>> > >> and complete them until the release.
>> > >>
>> > >> (1)
>> > >> Which synchronous replication method, priority or quorum, should be
>> > >> chosen when neither FIRST nor ANY is specified in s_s_names? Right 
>> > >> now,
>> > >> a priority-based sync replication is chosen for keeping backward
>> > >> compatibility. However some hackers argued to change this decision
>> > >> so that a quorum commit is chosen because they think that most users
>> > >> prefer to a quorum.
>> > >>
>> > >> (2)
>> > >> There will be still many source comments and documentations that
>> > >> we need to update, for example, in high-availability.sgml. We need 
>> > >> to
>> > >> check and update them throughly.
>> > >>
>> > >> (3)
>> > >> The priority value is assigned to each standby listed in s_s_names
>> > >> even in quorum commit though those priority values are not used at 
>> > >> all.
>> > >> Users can see those priority values in pg_stat_replication.
>> > >> Isn't this confusing? If yes, it might be better to always assign 1 
>> > >> as
>> > >> the priority, for example.
>> > >
>> > > [Action required within three days.  This is a generic notification.]
>> > >
>> > > The above-described topic is currently a PostgreSQL 10 open item.  
>> > > Fujii,
>> > > since you committed the patch believed to have created it, you own 
>> > > this open
>> > > item.  If some other commit is more relevant or if this does not 
>> > > belong as a
>> > > v10 open item, please let us know.  Otherwise, please observe the 
>> > > policy on
>> > > open item ownership[1] and send a status update within three 
>> > > calendar days of
>> > > this message.  Include a date for your subsequent status update.  
>> > > Testers may
>> > > discover new open items at any time, and I want to plan to get them 
>> > > all fixed
>> > > well in advance of shipping v10.  Consequently, I will appreciate 
>> > > your efforts
>> > > toward speedy resolution.  Thanks.
>> > >
>> > > [1] 
>> > > https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
>> >
>> > Thanks for the notice!
>> >
>> > Regarding the item (2), Sawada-san told me that he will work on it 
>> > after
>> > this CommitFest finishes. So we would receive the patch for the item 
>> > from
>> > him next week. If there will be no patch even after the end of next 
>> > week
>> > (i.e., April 14th), I will. Let's wait for Sawada-san's action at 
>> > first.
>> 
>>  Sounds reasonable; I will look for your update on 14Apr or earlier.
>> 
>> > The items (1) and (3) are not bugs. So I don't think that they need to 
>> > be
>> > resolved before the beta release. After the feature freeze, many users
>> > will try and play with many new features including quorum-based 
>> > syncrep.
>> > Then if many of them complain about (1) and (3), we can change the code
>> > at that timing. So we need more time that users can try the feature.
>> 
>>  I've moved (1) to a new section for things to revisit during beta.  If 
>>  someone
>>  feels strongly that the current behavior is Wrong and must change, 
>>  speak up as
>>  soon as you reach that conclusion.  Absent such arguments, the behavior 
>>  won't
>>  change.
>> 
>> > BTW, IMO (3) should be fixed so that pg_stat_replication reports NULL
>> > as the priority if quorum-based sync rep is chosen. It's less 
>> > confusing.
>> 
>>  Since you do want (3) to change, please own it like any other open item,
>>  including the mandatory status updates.
>> >>>
>> >>> I agree to report NULL as the priority. I'll send a patch for this as 
>> >>> well.
>> >>>
>> >>> Regards,
>> >>>
>> >>
>> >> Attached two draft patches. The one makes pg_stat_replication.sync
>> >> priority report NULL if in quorum-based sync replication. To prevent
>> >>

Re: [HACKERS] Comments not allowed on partitioned table columns

On 2017/04/18 18:45, Simon Riggs wrote:
> On 18 April 2017 at 06:57, Amit Langote  wrote:
>> On 2017/04/18 14:50, Amit Langote wrote:
>>> Attached patch fixes the oversight that COMMENT ON COLUMN is not allowed
>>> on partitioned tables columns.
>>
>> Forgot to mention that I added this to the open items list.
> 
> Pushed, thanks.

Thanks.

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Comments not allowed on partitioned table columns

On 18 April 2017 at 06:57, Amit Langote  wrote:
> On 2017/04/18 14:50, Amit Langote wrote:
>> Attached patch fixes the oversight that COMMENT ON COLUMN is not allowed
>> on partitioned tables columns.
>
> Forgot to mention that I added this to the open items list.

Pushed, thanks.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Quorum commit for multiple synchronous replication.

2017-04-18 Thread Kyotaro HORIGUCHI

At Tue, 18 Apr 2017 14:58:50 +0900, Masahiko Sawada  
wrote in 
> On Tue, Apr 18, 2017 at 3:04 AM, Fujii Masao  wrote:
> > On Wed, Apr 12, 2017 at 2:36 AM, Masahiko Sawada  
> > wrote:
> >> On Thu, Apr 6, 2017 at 4:17 PM, Masahiko Sawada  
> >> wrote:
> >>> On Thu, Apr 6, 2017 at 10:51 AM, Noah Misch  wrote:
>  On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote:
> > On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch  wrote:
> > > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
> > >> Regarding this feature, there are some loose ends. We should work on
> > >> and complete them until the release.
> > >>
> > >> (1)
> > >> Which synchronous replication method, priority or quorum, should be
> > >> chosen when neither FIRST nor ANY is specified in s_s_names? Right 
> > >> now,
> > >> a priority-based sync replication is chosen for keeping backward
> > >> compatibility. However some hackers argued to change this decision
> > >> so that a quorum commit is chosen because they think that most users
> > >> prefer to a quorum.
> > >>
> > >> (2)
> > >> There will be still many source comments and documentations that
> > >> we need to update, for example, in high-availability.sgml. We need to
> > >> check and update them throughly.
> > >>
> > >> (3)
> > >> The priority value is assigned to each standby listed in s_s_names
> > >> even in quorum commit though those priority values are not used at 
> > >> all.
> > >> Users can see those priority values in pg_stat_replication.
> > >> Isn't this confusing? If yes, it might be better to always assign 1 
> > >> as
> > >> the priority, for example.
> > >
> > > [Action required within three days.  This is a generic notification.]
> > >
> > > The above-described topic is currently a PostgreSQL 10 open item.  
> > > Fujii,
> > > since you committed the patch believed to have created it, you own 
> > > this open
> > > item.  If some other commit is more relevant or if this does not 
> > > belong as a
> > > v10 open item, please let us know.  Otherwise, please observe the 
> > > policy on
> > > open item ownership[1] and send a status update within three calendar 
> > > days of
> > > this message.  Include a date for your subsequent status update.  
> > > Testers may
> > > discover new open items at any time, and I want to plan to get them 
> > > all fixed
> > > well in advance of shipping v10.  Consequently, I will appreciate 
> > > your efforts
> > > toward speedy resolution.  Thanks.
> > >
> > > [1] 
> > > https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> >
> > Thanks for the notice!
> >
> > Regarding the item (2), Sawada-san told me that he will work on it after
> > this CommitFest finishes. So we would receive the patch for the item 
> > from
> > him next week. If there will be no patch even after the end of next week
> > (i.e., April 14th), I will. Let's wait for Sawada-san's action at first.
> 
>  Sounds reasonable; I will look for your update on 14Apr or earlier.
> 
> > The items (1) and (3) are not bugs. So I don't think that they need to 
> > be
> > resolved before the beta release. After the feature freeze, many users
> > will try and play with many new features including quorum-based syncrep.
> > Then if many of them complain about (1) and (3), we can change the code
> > at that timing. So we need more time that users can try the feature.
> 
>  I've moved (1) to a new section for things to revisit during beta.  If 
>  someone
>  feels strongly that the current behavior is Wrong and must change, speak 
>  up as
>  soon as you reach that conclusion.  Absent such arguments, the behavior 
>  won't
>  change.
> 
> > BTW, IMO (3) should be fixed so that pg_stat_replication reports NULL
> > as the priority if quorum-based sync rep is chosen. It's less confusing.
> 
>  Since you do want (3) to change, please own it like any other open item,
>  including the mandatory status updates.
> >>>
> >>> I agree to report NULL as the priority. I'll send a patch for this as 
> >>> well.
> >>>
> >>> Regards,
> >>>
> >>
> >> Attached two draft patches. The one makes pg_stat_replication.sync
> >> priority report NULL if in quorum-based sync replication. To prevent
> >> extra change I don't change so far the code of setting standby
> >> priority. The another one improves the comment and documentation. If
> >> there is more thing what we need to mention in documentation please
> >> give me feedback.
> >
> > Attached is the

Re: [HACKERS] Passing values to a dynamic background worker

On 2017/04/18 18:12, Kyotaro HORIGUCHI wrote:
> At Mon, 17 Apr 2017 16:19:13 -0400, Keith Fiske wrote:
>> So after reading a recent thread on the steep learning curve for PG
>> internals [1], I figured I'd share where I've gotten stuck with this in a
>> new thread vs hijacking that one.
>>
>> One of the goals I had with pg_partman was to see if I could get the
>> partitioning python scripts redone as C functions using a dynamic
>> background worker to be able to commit in batches with a single call. My
>> thinking was to have a user-function that can accept arguments for things
>> like the interval value, batch size, and other arguments to the python
>> script, then start/stop a dynamic bgw up for each batch so it can commit
>> after each one. The dymanic bgw would essentially just have to call the
>> already existing partition_data() plpgsql function, but I have to be able
>> to pass the argument values that the user gave down into the dynamic bgw.
>>
>> I've reached a roadblock in that bgw_main_arg can only accept a single
>> argument that must be passed by value for a dynamic bgw. I already worked
>> around this for passing the database name to the my existing use of a bgw
>> with doing partition maintenance (pass a simple integer to use as an index
>> array value). But I'm not sure how to do this for passing multiple values
>> in. I'm assuming this would be the place where I'd see about storing values
>> in shared memory to be able to re-use later? I'm not even sure if that's
>> the right approach, and if it is, where to even start to understand how to
>> do that.
> 
> On the other hand, AFAICS, DSM doesn't seem well documented. I
> mangaged to find a related document in Postgres Wiki but it seems
> a bit old.
> 
> https://wiki.postgresql.org/wiki/Parallel_Internal_Sort
> 
> This is a little complex than static shared memory, and it is
> *not* guaranteed to mapped at the same address among workers. You
> will see an instance in LaunchParallelWorkers() and the related
> functions in parallel.c. The basic of its usage would be as the
> follows.
> 
> - Create a segment :
>dsm_segment *seg = dsm_create(size);
> - Send its handle via the bgw_main_arg.
>worker.bgw_main_arg = dsm_segment_handle(seg);
> - Attach the memory on the other side.
>dsm_segment *seg = dsm_attach(main_arg);
> 
> On both side, the address of the attached shared memory is
> obtained using dsm_segment_address(seg).
> 
> dsm_detach(seg) detaches the segment. All users of this segment
> detach the segment, it will be destroyed.

Perhaps, the more modern DSA mechanism could be applicable here, too.

Some recent commits demonstrate examples of DSA usage, such as BRIN
autosummarization commit (7526e10224f) and tidbitmap.c's shared iteration
support commit (98e6e89040a05).

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-18 Thread Dagfinn Ilmari Mannsåker

Tom Lane  writes:

> Andres Freund  writes:
>> On 2017-04-17 17:49:54 +0100, Dagfinn Ilmari Mannsåker wrote:
>>> I threw Devel::NYTProf at it and picked some more low-hanging fruit.
>
>> I'm a bit doubtful about improving the performance of genbki at the cost
>> of any sort of complication - it's only executed during the actual
>> build, not during initdb...  I don't see much point in doing things like
>> 3) and 4), it's just not worth it imo.
>
> Yeah, it's only run once per build, or probably even less often than that
> considering we only re-run it when the input files change.  I could get
> interested in another 20% reduction in initdb's time, but not genbki's.

Fair enough.  I still think 1), 2) and 5) are worthwile code cleanups
regardless of the performance impact.  In fact, if we don't care that
much about the performance, we can move the duplicated code in
Gen_fmgrmtab.pl and genbki.pl that turns bki_values into a hash into
Catalogs.pm.  That regresses genbki.pl time by ~30ms on my machine.

Revised patch series attached.

-- 
- Twitter seems more influential [than blogs] in the 'gets reported in
  the mainstream press' sense at least.   - Matt McLeod
- That'd be because the content of a tweet is easier to condense down
  to a mainstream media article.  - Calle Dybedahl

>From f7b75bcbe993d4ddbc92da85c7148bf7fee143ee Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Mon, 17 Apr 2017 13:26:35 +0100
Subject: [PATCH 1/4] Avoid unnecessary regex captures in Catalog.pm

---
 src/backend/catalog/Catalog.pm | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index fa8de04..e7b647a 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -54,7 +54,7 @@ sub Catalogs
 		{
 
 			# Strip C-style comments.
-			s;/\*(.|\n)*\*/;;g;
+			s;/\*(?:.|\n)*\*/;;g;
 			if (m;/\*;)
 			{
 
@@ -80,12 +80,12 @@ sub Catalogs
 			{
 $catalog{natts} = $1;
 			}
-			elsif (/^DATA\(insert(\s+OID\s+=\s+(\d+))?\s+\(\s*(.*)\s*\)\s*\)$/)
+			elsif (/^DATA\(insert(?:\s+OID\s+=\s+(\d+))?\s+\(\s*(.*)\s*\)\s*\)$/)
 			{
-check_natts($filename, $catalog{natts}, $3,
+check_natts($filename, $catalog{natts}, $2,
 			$input_file, $input_line_number);
 
-push @{ $catalog{data} }, { oid => $2, bki_values => $3 };
+push @{ $catalog{data} }, { oid => $1, bki_values => $2 };
 			}
 			elsif (/^DESCR\(\"(.*)\"\)$/)
 			{
-- 
2.7.4

>From decd1b8c171cc508da5f79f01d2f0779a569a963 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Mon, 17 Apr 2017 14:04:20 +0100
Subject: [PATCH 2/4] Avoid repeated calls to Catalog::SplitDataLine

Both check_natts and the callers of Catalog::Catalogs were calling
Catalog::SplitDataLines, the former discarding the result.

Instead, have Catalog::Catalogs store the split fields directly and pass
the count to check_natts
---
 src/backend/catalog/Catalog.pm   | 14 +++---
 src/backend/catalog/genbki.pl|  3 +--
 src/backend/utils/Gen_fmgrtab.pl |  3 +--
 3 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index e7b647a..81513c7 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -82,10 +82,12 @@ sub Catalogs
 			}
 			elsif (/^DATA\(insert(?:\s+OID\s+=\s+(\d+))?\s+\(\s*(.*)\s*\)\s*\)$/)
 			{
-check_natts($filename, $catalog{natts}, $2,
+my @bki_values = SplitDataLine($2);
+
+check_natts($filename, $catalog{natts}, scalar(@bki_values),
 			$input_file, $input_line_number);
 
-push @{ $catalog{data} }, { oid => $1, bki_values => $2 };
+push @{ $catalog{data} }, { oid => $1, bki_values => \@bki_values };
 			}
 			elsif (/^DESCR\(\"(.*)\"\)$/)
 			{
@@ -254,17 +256,15 @@ sub RenameTempFile
 # verify the number of fields in the passed-in DATA line
 sub check_natts
 {
-	my ($catname, $natts, $bki_val, $file, $line) = @_;
+	my ($catname, $natts, $bki_vals, $file, $line) = @_;
 
 	die "Could not find definition for Natts_${catname} before start of DATA() in $file\n"
 		unless defined $natts;
 
-	my $nfields = scalar(SplitDataLine($bki_val));
-
 	die sprintf
 		"Wrong number of attributes in DATA() entry at %s:%d (expected %d but got %d)\n",
-		$file, $line, $natts, $nfields
-	  unless $natts == $nfields;
+		$file, $line, $natts, $bki_vals
+	  unless $natts == $bki_vals;
 }
 
 1;
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 198e072..8875f6c 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -161,9 +161,8 @@ foreach my $catname (@{ $catalogs->{names} })
 		foreach my $row (@{ $catalog->{data} })
 		{
 
-			# Split line into tokens without interpreting their meaning.
 			my %bki_values;
-			@bki_values{@attnames}

Re: [HACKERS] Passing values to a dynamic background worker

2017-04-18 Thread Kyotaro HORIGUCHI

Hello,

At Mon, 17 Apr 2017 16:19:13 -0400, Keith Fiske  wrote in 

> So after reading a recent thread on the steep learning curve for PG
> internals [1], I figured I'd share where I've gotten stuck with this in a
> new thread vs hijacking that one.
> 
> One of the goals I had with pg_partman was to see if I could get the
> partitioning python scripts redone as C functions using a dynamic
> background worker to be able to commit in batches with a single call. My
> thinking was to have a user-function that can accept arguments for things
> like the interval value, batch size, and other arguments to the python
> script, then start/stop a dynamic bgw up for each batch so it can commit
> after each one. The dymanic bgw would essentially just have to call the
> already existing partition_data() plpgsql function, but I have to be able
> to pass the argument values that the user gave down into the dynamic bgw.
> 
> I've reached a roadblock in that bgw_main_arg can only accept a single
> argument that must be passed by value for a dynamic bgw. I already worked
> around this for passing the database name to the my existing use of a bgw
> with doing partition maintenance (pass a simple integer to use as an index
> array value). But I'm not sure how to do this for passing multiple values
> in. I'm assuming this would be the place where I'd see about storing values
> in shared memory to be able to re-use later? I'm not even sure if that's
> the right approach, and if it is, where to even start to understand how to
> do that.

I think you are on the way, shared memory is that. There are two
ways to acquire shared memory areas for such purpose. One is
static shared memory that stays living aside shared_buffers, and
the another is dynamic shared memory (DSM). If you need fixed
size of memory segment, the former will work. If you need that of
indefinite amount, DSM will work.

You will see how to use (static) shared memory in the following
section in the documentation. Or pg_stat_statements.c will be a
good reference. This kind of shared memory is guaranteed to be
mapped at the same address so we can use pointers on there.

https://www.postgresql.org/docs/devel/static/xfunc-c.html#idp83376336


On the other hand, AFAICS, DSM doesn't seem well documented. I
mangaged to find a related document in Postgres Wiki but it seems
a bit old.

https://wiki.postgresql.org/wiki/Parallel_Internal_Sort

This is a little complex than static shared memory, and it is
*not* guaranteed to mapped at the same address among workers. You
will see an instance in LaunchParallelWorkers() and the related
functions in parallel.c. The basic of its usage would be as the
follows.

- Create a segment :
   dsm_segment *seg = dsm_create(size);
- Send its handle via the bgw_main_arg.
   worker.bgw_main_arg = dsm_segment_handle(seg);
- Attach the memory on the other side.
   dsm_segment *seg = dsm_attach(main_arg);

On both side, the address of the attached shared memory is
obtained using dsm_segment_address(seg).

dsm_detach(seg) detaches the segment. All users of this segment
detach the segment, it will be destroyed.

You might need some locking or notification mechanism. Usually
the mechanisms named LWLock and Latch are used for the purpose.


> Let alone in the context of how that would interact with the
> background worker system. If you look at my existing C code, you can see
> it's very simple and doesn't do much more than the worker_spi example. I've
> yet to have to interact with any memory contexts or such things, and as the
> referenced thread below mentions, doing so is quite a steep learning curve.
> 
> Any guidance for a newer internals dev here would be great.
> 
> 1.
> https://www.postgresql.org/message-id/CAH%3Dt1kqwCBF7J1bP0RjgsTcp-SaJaHrF4Yhb1iiQZMe3W-FX2w%40mail.gmail.com
> 
> --
> Keith Fiske
> Database Administrator
> OmniTI Computer Consulting, Inc.
> http://www.keithf4.com

Good luck!

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Failed recovery with new faster 2PC code

2017-04-18 Thread Nikhil Sontakke

Please find attached a second version of my bug fix which is stylistically
better and clearer than the first one.

Regards,
Nikhils

On 18 April 2017 at 13:47, Nikhil Sontakke  wrote:

> Hi,
>
> There was a bug in the redo 2PC remove code path. Because of which,
> autovac would think that the 2PC is gone and cause removal of the
> corresponding clog entry earlier than needed.
>
> Please find attached, the bug fix: 2pc_redo_remove_bug.patch.
>
> I have been testing this on top of Michael's 2pc-restore-fix.patch and
> things seem to be ok for the past one+ hour. Will keep it running for long.
>
> Jeff, thanks for these very useful scripts. I am going to make a habit to
> run these scripts on my side from now on. Do you have any other script that
> I could try against these patches? Please let me know.
>
> Regards,
> Nikhils
>
> On 18 April 2017 at 12:09, Nikhil Sontakke 
> wrote:
>
>>
>>
>> On 17 April 2017 at 15:02, Nikhil Sontakke 
>> wrote:
>>
>>>
>>>
 >> commit 728bd991c3c4389fb39c45dcb0fe57e4a1dccd71
 >> Author: Simon Riggs 
 >> Date:   Tue Apr 4 15:56:56 2017 -0400
 >>
 >>Speedup 2PC recovery by skipping two phase state files in normal
 path
 >
 > Thanks Jeff for your tests.
 >
 > So that's now two crash bugs in as many days and lack of clarity about
 > how to fix it.
 >

>>>
>>> The issue seems to be that a prepared transaction is yet to be
>> committed. But autovacuum comes in and causes the clog to be truncated
>> beyond this prepared transaction ID in one of the runs.
>>
>> We only add the corresponding pgproc entry for a surviving 2PC
>> transaction on completion of recovery. So could be a race condition here.
>> Digging in further.
>>
>> Regards,
>> Nikhils
>> --
>>  Nikhil Sontakke   http://www.2ndQuadrant.com/
>>  PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services
>>
>
>
>
> --
>  Nikhil Sontakke   http://www.2ndQuadrant.com/
>  PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services
>



-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services


2pc_redo_remove_bug_v2.0.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE TRIGGER document typo


On 04/17/2017 12:09 PM, Yugo Nagata wrote:

Hi,

Attached is a patch fixing simple typos in the CREATE TRIGGER document.


Applied, thanks!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On How To Shorten the Steep Learning Curve Towards PG Hacking...

2017-04-18 Thread Kang Yuzhe

Hello Amit,
Thanks gain for being patient with me.
YES, I am working with the PostgreSQL source git repository but I don't
think I updated my local forked/cloned branch. I am also working on
standalone PG 9.6.2 source code as well.

I will try to fetch/pull the PG master content to my forked/cloned branch
and apply those current patches.

I will also try to reply to the email threads where I downloaded the
patches so that they can update their patches accordingly.

Thanks,
Zeray



On Tue, Apr 18, 2017 at 11:25 AM, Amit Langote <
langote_amit...@lab.ntt.co.jp> wrote:

> Hi,
>
> On 2017/04/18 16:54, Kang Yuzhe wrote:
> > Thanks Amit for taking your time and pointing to some useful stuff on the
> > Internals of PostgreSQL.
> >
> >
> > One thing I have learned is that PG community is not as hostile/harsh as
> I
> > imagined to newbies. Rather, its the reverse.
> > I am feeling at home here.
> >
> > Amit, would you please help out on  how to apply some patches in PG
> source
> > code. For example, there are two patches attached here: one on
> > CORRESPONDING CLAUSE and one on MERGE SQL Standard.
> >
> > There are some errors saying Hunk failed(src/backend/parser/gram.y.rej).
> >
> > postgresql-9.6.2$ patch --dry-run -p1 < corresponding_clause_v12.patch
> > patching file doc/src/sgml/queries.sgml
> > Hunk #1 succeeded at 1603 (offset 2 lines).
> > Hunk #2 succeeded at 1622 (offset 2 lines).
> > Hunk #3 succeeded at 1664 (offset 2 lines).
>
> [ ... ]
>
> > /postgresql-9.6.2$
>
> Firstly, it looks like you're trying to apply the patch to the 9.6 source
> tree (are you working with the PostgreSQL source git repository?).  But,
> since all the new feature patches are created against the master
> development branch of the git repository, the patch most likely won't
> apply cleanly against a source tree from the older branch.
>
> If you're not using the git repository currently, you may have better luck
> trying the development branch snapshot tarballs (see the link below):
>
> https://www.postgresql.org/ftp/snapshot/dev/
>
> Also, it's a good idea to reply on the email thread from where you
> downloaded the patch to ask them to update the patch, so that they can
> send a fresh patch that applies cleanly.
>
> The MERGE patch looks very old (from 2010 probably), so properly applying
> it to the source tree of today is going to be hard.  Actually, it most
> likely won't be in a working condition anymore.  You can try recently
> proposed patches, for example, those in the next commitfest:
>
> https://commitfest.postgresql.org/14/
>
> Patches listed on the above page are more likely to apply cleanly and be
> in working condition.  But of course, you will need to be interested in
> the topics those patches are related to.  There are some new SQL feature
> patches, for example:
>
> https://commitfest.postgresql.org/14/839/
>
> Thanks,
> Amit
>
>

Re: [HACKERS] PANIC in pg_commit_ts slru after crashes

On 17 April 2017 at 16:33, Jeff Janes  wrote:
> On Sun, Apr 16, 2017 at 6:59 PM, Michael Paquier 
> wrote:
>>
>>
>>
>> Jeff, does this patch make the situation better? The fix is rather
>> simple as it just makes sure that the next XID never gets updated if
>> there are no 2PC files.
>
>
> Yes, that fixes the reported case when 2PC are not being used.

Thanks Jeff.

I certainly prefer the simplicity of Michael's approach. I'll move to commit.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On How To Shorten the Steep Learning Curve Towards PG Hacking...

2017-04-18 Thread Kang Yuzhe

Thanks Craig for teaching me a lot of things. I am just learning a lot why
PG hacking/development is the way it is.

Regarding interest and enthusiasm, no problem. Whats is lacking is the
skill-sets and I believe having interest and enthusiasm and with your
support, we will expand PG hacking/devs/usage in Africa and other
continents.

People here in Africa using Oracle/SQL Server/IBM products(generally
commercial products) even for which PG is more than enough.

I want to change this scenario and trend and I hope one day in the future
there will be PG conference in Africa/Ethiopia which is my country.

Thanks,
zeray




On Tue, Apr 18, 2017 at 10:54 AM, Craig Ringer 
wrote:

> On 18 April 2017 at 15:41, Kang Yuzhe  wrote:
> > Thanks Simon for taking your time and trying to tell and warn me the
> harsh
> > reality truth:there is no shortcut to expertise. One has to fail and rise
> > towards any journey to expertise.
>
> Yeah, just because Pg is hard doesn't mean it's notably bad or worse
> than other things. I generally find working on code in other projects,
> even smaller and simpler ones, a rather unpleasant change.
>
> That doesn't mean we can't do things to help interested new people get
> and stay engaged and grow into productive devs to grow our pool.
>
> > Overall, you are right. But I do believe that there is a way(some
> > techniques) to speed up any journey to expertise. One of them is
> mentorship.
> > For example(just an example), If you show me how to design and implement
> FDW
> > to Hadoop/HBase., I believe that I will manage to design and implement
> FDW
> > to Cassandra/MengoDB.
>
> TBH, that's the sort of thing where looking at existing examples is
> often the best way forward and will stay that way.
>
> What I'd like to do is make it easier to understand those examples by
> providing background and overview info on subsystems, so you can read
> the code and have more idea what it does and why.
>
> > But almost nothing about The Internals of PostgreSQL:
>
> Not surprising. They'd go out of date fast, be a huge effort to write
> and maintain, and sell poorly given the small audience.
>
> Print books probably aren't the way forward here.
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

Re: [HACKERS] scram and \password


On 04/18/2017 11:15 AM, Simon Riggs wrote:

As a potential open item, if we treat "md5" as ">= md5"
should we not also treat "password" as ">=password"?

It seems strange that we still support "password" and yet tell
everyonenot to use it.

I'd like PG10 to be the version where I don't have to tell people not
to use certain things, hash indexes, "password" etc.


Between md5 and scram, the choice is easy, because a user can only have 
an MD5 hashed or SCRAM "hashed" password in pg_authid. So you present 
the client an MD5 challenge or a SCRAM challenge, depending on what the 
user has in pg_authid, or you error out without even trying. But 
"password" authentication can be used with any kind of a verifier in 
pg_authid. "password" authentication can be useful, for example, if a 
user has a SCRAM verifier in pg_authid, but the client doesn't support 
SCRAM.


You could argue that you shouldn't use it even in that situation, you 
should upgrade the client, or use SSL certs or an ssh tunnel or 
something else instead. But that's a very different argument than the 
one for treating "md5" as ">= md5".


Also note that LDAP and RADIUS authentication look identical to 
"password" authentication, on the wire. The only difference is that 
instead of checking the password against pg_authid, the server checks it 
against an LDAP or RADIUS server.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On How To Shorten the Steep Learning Curve Towards PG Hacking...