Re: [HACKERS] Partition-wise aggregation/grouping

2017-04-27 Thread Jeevan Chalke
On Thu, Apr 27, 2017 at 4:53 PM, Antonin Houska  wrote:

> Robert Haas  wrote:
>
> > On Wed, Apr 26, 2017 at 6:28 AM, Antonin Houska  wrote:
> > > Attached is a diff that contains both patches merged. This is just to
> prove my
> > > assumption, details to be elaborated later. The scripts attached
> produce the
> > > following plan in my environment:
> > >
> > >QUERY PLAN
> > > 
> > >  Parallel Finalize HashAggregate
> > >Group Key: b_1.j
> > >->  Append
> > >  ->  Parallel Partial HashAggregate
> > >Group Key: b_1.j
> > >->  Hash Join
> > >  Hash Cond: (b_1.j = c_1.k)
> > >  ->  Seq Scan on b_1
> > >  ->  Hash
> > >->  Seq Scan on c_1
> > >  ->  Parallel Partial HashAggregate
> > >Group Key: b_2.j
> > >->  Hash Join
> > >  Hash Cond: (b_2.j = c_2.k)
> > >  ->  Seq Scan on b_2
> > >  ->  Hash
> > >->  Seq Scan on c_2
> >
> > Well, I'm confused.  I see that there's a relationship between what
> > Antonin is trying to do and what Jeevan is trying to do, but I can't
> > figure out whether one is a subset of the other, whether they're both
> > orthogonal, or something else.  This plan looks similar to what I
> > would expect Jeevan's patch to produce,
>
> The point is that the patch Jeevan wanted to work on is actually a subset
> of
> [1] combined with [2].
>

Seems like, as you are targeting every relation whether or not it is
partitioned. Where as I am targeting only partitioned relations in my
patch.


>
> > except i have no idea what "Parallel" would mean in a plan that contains
> no
> > Gather node.
>
> parallel_aware field was set mistakenly on the AggPath. Fixed patch is
> attached below, producing this plan:
>
>QUERY PLAN
> 
>  Finalize HashAggregate
>Group Key: b_1.j
>->  Append
>  ->  Partial HashAggregate
>Group Key: b_1.j
>->  Hash Join
>  Hash Cond: (b_1.j = c_1.k)
>  ->  Seq Scan on b_1
>  ->  Hash
>->  Seq Scan on c_1
>  ->  Partial HashAggregate
>Group Key: b_2.j
>->  Hash Join
>  Hash Cond: (b_2.j = c_2.k)
>  ->  Seq Scan on b_2
>  ->  Hash
>->  Seq Scan on c_2
>

With my patch, I am getting following plan where we push entire
aggregation below append.

QUERY PLAN
--
 Append
   ->  HashAggregate
 Group Key: b_1.j
 ->  Hash Join
   Hash Cond: (b_1.j = c_1.k)
   ->  Seq Scan on b_1
   ->  Hash
 ->  Seq Scan on c_1
   ->  HashAggregate
 Group Key: b_2.j
 ->  Hash Join
   Hash Cond: (b_2.j = c_2.k)
   ->  Seq Scan on b_2
   ->  Hash
 ->  Seq Scan on c_2
(15 rows)


Antonin, I have tried applying your patch on master but it doesn't get
apply. Can you please provide the HEAD and any other changes required
to be applied first?

How the plan look like when GROUP BY key does not match with the
partitioning key i.e. GROUP BY b.v ?


>
> [1] https://www.postgresql.org/message-id/9666.1491295317%40localhost
>
> [2] https://commitfest.postgresql.org/14/994/
>
> --
> Antonin Houska
> Cybertec Schönig & Schönig GmbH
> Gröhrmühlgasse 26
> A-2700 Wiener Neustadt
> Web: http://www.postgresql-support.de, http://www.cybertec.at
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


-- 
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 66449694

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are
not the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-04-27 Thread Ashutosh Bapat
On Fri, Apr 28, 2017 at 1:32 AM, Robert Haas  wrote:
> On Thu, Apr 27, 2017 at 3:41 AM, Ashutosh Bapat
>  wrote:
>> The third goal requires that the partition bounds be compared based on
>> the partition keys present in the equi-join. While matching the
>> partitions to be joined, the partition bounds corresponding the base
>> relation whose partition keys appear in the equi-join are used for
>> comparison using support function corresponding to the data types of
>> partition keys. This requires us to associate the partitions of a join
>> with the bounds of base relation. E.g. A(A1, A2) with bounds (X1, X3)
>> (notice missing X2), B (B1, B2) bounds (X1, X2), C (C1, C2, C3) bounds
>> (X1, X2, X3) and the join is A LJ B on A.a = B.b LJ C on B.b = C.c
>> assuming strict operators this can be executed as (AB)C or A(BC). AB
>> will have partitions A1B1, A2B3 since there is no matching bound of A
>> for B2 and A is outer relation. A1B1 is associated with bound X1 of A
>> and C both. A2B3 is associated with bound of X3, which happens to be
>> 2nd bound of A but third of B. When we join (AB) with C, we should
>> notice that C1 goes with A1B1, C2 doesn't have any matching partition
>> in AB and C3 goes with A2B3. If we compare bounds of B with C without
>> any transformation we will know C2 matches B2, but we need to look at
>> the children of AB to realize that B2 isn't present in any of the
>> children and thus C2 should not be joined with any partition of AB.
>
> Sure.
>
>> That usually looks a quadratic order operation on the number of
>> partitions.
>
> Now that I don't buy.  Certainly, for range partitions, given a list
> of ranges of length M and another of length N, this can be done in
> O(M+N) time by merge-joining the lists of bounds.  You pointed out
> upthread that for list partitions, things are a bit complicated
> because a single list partition can contain multiple values which are
> not necessarily contiguous, but I think that this can still be done in
> O(M+N) time.  Sort all of the bounds, associating each one to a
> partition, and do a merge pass; whenever two bounds match, match the
> two corresponding partitions, but if one of those partitions is
> already matched to some other partition, then fail.
>
> For example, consider A1 FOR VALUES IN (1,3,5), A2 FOR VALUES IN
> (2,4,6), B1 FOR VALUES IN (1,6), B2 FOR VALUES IN (2,4).  The sorted
> bounds for A are 1,2,3,4,5,6; for B, 1,2,4,6.  The first value in both
> lists is a 1, so the corresponding partitions A1 and B1 are matched.
> The second value in both lists is a 2, so the corresponding partitions
> A2 and B2 are matched.  Then we hit a 3 on the A side that has no
> match on the B side, but that's fine; we don't need to do anything.
> If the partition on the A side never got a mapping at any point during
> this merge pass, we'd eventually need to match it to a dummy partition
> (unless this is an inner join) but it's already mapped to B1 so no
> problem.  Then we hit a 4 which says that A2 must match B2, which is
> consistent with what we already determine; no problem.  Then we hit
> another value that only exists on the A side, which is fine just as
> before.  Finally we hit a 6 on each side, which means that A2 must
> match B1, which is inconsistent with the existing mappings so we give
> up; no partitionwise join is possible here.

For two-way join this works and is fairly straight-forward. I am
assuming that A an B are base relations and not joins. But making it
work for N-way join is the challenge. I don't see your example
describing that. But I think, given your revised position below, we
don't need to get this right at this point. Remember, that the
paragraph was about 3rd goal, which according to your revised position
is now deferred.

>
> Having said that I think we could make this work, I'm starting to
> agree with you that it will add more complexity than it's worth.
> Needing to keep track of the type of every partition bound
> individually seems like a real nuisance, and it's not likely to win
> very often because, realistically, people should and generally will
> use the same type for the partitioning column in all of the relevant
> tables.  So I'm going to revise my position and say it's fine to just
> give up on partitionwise join unless the types match exactly, but I
> still think we should try to cover the cases where the bounds don't
> match exactly but only 1:1 or 1:0 or 0:1 mappings are needed (iow,
> optimizations 1 and 2 from your list of 4).  I agree that ganging
> partitions (optimization 4 from your list) is not something to tackle
> right now.

Good. I will have a more enjoyable vacation now.

Do you still want the patition key type to be out of partition scheme?
Keeping it there means we match it only once and save it only at a
single place. Otherwise, it will have to be stored in RelOptInfo of
the partitioned table and match it for every pair of joining

Re: [HACKERS] [COMMITTERS] pgsql: Replication lag tracking for walsenders

2017-04-27 Thread Thomas Munro
On Mon, Apr 24, 2017 at 2:53 PM, Tom Lane  wrote:
> Thomas Munro  writes:
>> Every machine sees the LSN moving backwards, but the code path that
>> had the assertion only reached if it decides to interpolate, which is
>> timing dependent: there needs to be a future sample in the lag
>> tracking buffer, which I guess is not the case in those runs.
>
> I'm dissatisfied with this explanation because if it's just timing,
> it doesn't seem very likely that some machines would reproduce the
> failure every single time while others never would.  Maybe that can be
> blamed on kernel scheduler vagaries + different numbers of cores, but
> I can't escape the feeling that there's something here we've not
> fully understood.

Ahh.  It is timing-dependent, but what I said before wasn't right.
The write LSN always jumps back to the start of the current segment,
but some machines don't report that upstream because of timing
effects.

I found a machine that doesn't reproduce the earlier assertion failure
(or, rather, an elog I put into that code path where the assertion
was).  It's a VirtualBox VM running Debian 8 on amd64, and it doesn't
matter whether I tell it to use 1 or 4 CPU cores.

After the timeline change, standby2 restarts WAL streaming from
0/300, and standby1 sends it two messages: one 128KB message, and
then another smaller one.  On this virtual machine, the inner message
processing loop in walreceiver.c reliably manages to read both
messages from the socket one after the other, taking it all the way to
0/3028470 (end of WAL).  Then it calls XLogWalRcvSendReply, so the
position reported to the upstream server never goes backwards.

On my other machines it reliably reads the first message, runs out of
data on the socket so it falls out of the loop, and calls
XLogWalRcvSendReply to report the intermediate location 0/302.
Then the socket's soon ready again, we go around the outer loop again
and finally report 0/3028470.

> [...]
>
> So you're right that the standby's reported "write" position can go
> backward, but it seems pretty darn odd that the flush and apply
> positions didn't go backward too.  Is there a bug there?

Flush doesn't go backwards because XLogWalRcvFlush does nothing unless
Flush < Write.  I am not sure whether it constitutes a bug that we
rewrite the start of the current segment without flushing it.

A timeline change isn't actually necessary for the
rewind-to-start-of-segment behaviour, by the way: we always do that at
the start of streaming, as described in RequestXLogStreaming.  But
it's it's only a problem for timeline changes, because in other cases
it's talking to a fresh walsender that doesn't have any state that
would have triggered the (now removed) assertion.  I think the change
that was committed in 84638808 was therefore correct.

> I remain of the opinion that if we can't tell from the transmitted
> data whether a timeline switch has caused the position to go backward,
> then that's a protocol shortcoming that ought to be fixed.

Another approach might have been to teach the switch case for
T_StartReplicationCmd in exec_replication_command to clear all
LagTracker state, since that is the point at which the remote
walreceiver has requested a new stream and all subsequent messages
will refer to that stream.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transition tables for triggers on foreign tables and views

2017-04-27 Thread Noah Misch
On Wed, Apr 26, 2017 at 11:17:05AM +1200, Thomas Munro wrote:
> My colleague Prabhat Sahu reported off list that transition tables
> don't work for views.  I probably should have thought about that when
> I fixed something similar for partitioned tables, and after some
> experimentation I see that this is also broken for foreign tables.
> 
> For foreign tables using postgres_fdw, I see that transition tables
> capture new rows for INSERT but capture nothing for DELETE and UPDATE.
> 
> For views, aside from the question of transition tables, I noticed
> that statement triggers don't seem to fire at all with updatable
> views.  Surely they should -- isn't that a separate bug?
> 
> Example:
> 
>   create table my_table (i int);
>   create view my_view as select * from my_table;
>   create function my_trigger_function() returns trigger language plpgsql as
> $$ begin raise warning 'hello world'; return null; end; $$;
>   create trigger my_trigger after insert or update or delete on my_view
> for each statement execute procedure my_trigger_function();
>   insert into my_view values (42);
> 
> ... and the world remains ungreeted.
> 
> As for transition tables, there are probably meaningful ways to
> support those for both views and foreign tables at least in certain
> cases, as future feature enhancements.  For now, do you agree that we
> should reject such triggers as unsupported?  See attached.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Kevin,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] some review comments on logical rep code

2017-04-27 Thread Noah Misch
On Fri, Apr 28, 2017 at 01:55:48PM +0900, Masahiko Sawada wrote:
> On Fri, Apr 28, 2017 at 1:42 PM, Noah Misch  wrote:
> > On Fri, Apr 28, 2017 at 06:37:09AM +0900, Fujii Masao wrote:
> >> Pushed. Thanks!
> >
> > Does this close the open item, or is there more to do?
> 
> There is only one item remaining, and the patch is attached on
> here[1]. I guess reviewing that patch is almost done.
> 
> [1] 
> https://www.postgresql.org/message-id/CAHGQGwELzJrA4SDA4TsJGds4X-ykTOP%2By5hecsoQmQqzZf8T7A%40mail.gmail.com

Thanks.  Peter, This PostgreSQL 10 open item is past due for your status
update.  Kindly send a status update within 24 hours, and include a date for
your subsequent status update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] some review comments on logical rep code

2017-04-27 Thread Masahiko Sawada
On Fri, Apr 28, 2017 at 1:42 PM, Noah Misch  wrote:
> On Fri, Apr 28, 2017 at 06:37:09AM +0900, Fujii Masao wrote:
>> Pushed. Thanks!
>
> Does this close the open item, or is there more to do?

There is only one item remaining, and the patch is attached on
here[1]. I guess reviewing that patch is almost done.

[1] 
https://www.postgresql.org/message-id/CAHGQGwELzJrA4SDA4TsJGds4X-ykTOP%2By5hecsoQmQqzZf8T7A%40mail.gmail.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] scram and \password

2017-04-27 Thread Noah Misch
On Fri, Apr 21, 2017 at 11:04:14PM +0300, Heikki Linnakangas wrote:
> I'll continue reviewing the rest of the patch on Monday, but [...]

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Moreover, this open item has been in progress for 17 days, materially
longer than the 1-2 week RMT non-intervention period.  Refer to the policy on
open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump emits ALTER TABLE ONLY partitioned_table

2017-04-27 Thread Stephen Frost
Noah,

* Noah Misch (n...@leadboat.com) wrote:
> On Sun, Apr 23, 2017 at 11:58:23PM +, Stephen Frost wrote:
> > The status is simply that I've been considering Robert's comments regarding
> > the documentation and have had a busy weekend. I'll provide an update
> > tomorrow.
> 
> This PostgreSQL 10 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:

I committed the immediate issue regarding the ALTER TABLE ONLY commands
failing when pg_dump's output was run against a new system on Tuesday,
so that part is "fixed", but I continue to work through the lingering
related issues with Amit and have been pushed related patches both
yesterday and today.

I'm anticipating a new patch (or perhaps two) from Amit tomorrow (well,
later today at this point...) that I hope to push to clean up other
related bugs in pg_dump's handling of partitioned tables.

Frankly, I fully expect to find additional issues beyond those as we
move forward, but assuming I can finish up the last couple patches from
Amit tomorrow then I'll close out this open issue and open new ones as
new issues are discovered.  I'll commit to having this issue closed out,
barring additional issues, by Sunday (2017-04-30).

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] snapbuild woes

2017-04-27 Thread Andres Freund


On April 27, 2017 9:34:44 PM PDT, Noah Misch  wrote:
>On Fri, Apr 21, 2017 at 10:36:21PM -0700, Andres Freund wrote:
>> On 2017-04-17 21:16:57 -0700, Andres Freund wrote:
>> > I've since the previous update reviewed Petr's patch, which he
>since has
>> > updated over the weekend.  I'll do another round tomorrow, and will
>see
>> > how it looks.  I think we might need some more tests for this to be
>> > committable, so it might not become committable tomorrow.  I hope
>we'll
>> > have something in tree by end of this week, if not I'll send an
>update.
>> 
>> I was less productive this week than I'd hoped, and creating a
>testsuite
>> was more work than I'd anticipated, so I'm slightly lagging behind. 
>I
>> hope to have a patchset tomorrow, aiming to commit something
>> Monday/Tuesday.
>
>This PostgreSQL 10 open item is past due for your status update. 
>Kindly send
>a status update within 24 hours, and include a date for your subsequent
>status
>update.  Refer to the policy on open item ownership:
>https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

I committed part of the series today, plan to continue doing so over the next 
few days.  Changes require careful review & testing, this is easy to get 
wrong...
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] some review comments on logical rep code

2017-04-27 Thread Noah Misch
On Fri, Apr 28, 2017 at 06:37:09AM +0900, Fujii Masao wrote:
> Pushed. Thanks!

Does this close the open item, or is there more to do?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-27 Thread Noah Misch
On Fri, Apr 21, 2017 at 10:36:21PM -0700, Andres Freund wrote:
> On 2017-04-17 21:16:57 -0700, Andres Freund wrote:
> > I've since the previous update reviewed Petr's patch, which he since has
> > updated over the weekend.  I'll do another round tomorrow, and will see
> > how it looks.  I think we might need some more tests for this to be
> > committable, so it might not become committable tomorrow.  I hope we'll
> > have something in tree by end of this week, if not I'll send an update.
> 
> I was less productive this week than I'd hoped, and creating a testsuite
> was more work than I'd anticipated, so I'm slightly lagging behind.  I
> hope to have a patchset tomorrow, aiming to commit something
> Monday/Tuesday.

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump emits ALTER TABLE ONLY partitioned_table

2017-04-27 Thread Noah Misch
On Sun, Apr 23, 2017 at 11:58:23PM +, Stephen Frost wrote:
> Noah, all,
> 
> On Sun, Apr 23, 2017 at 19:52 Noah Misch  wrote:
> 
> > On Sat, Apr 22, 2017 at 01:14:08PM -0700, Noah Misch wrote:
> > > On Thu, Apr 20, 2017 at 09:53:28PM -0400, Stephen Frost wrote:
> > > > * Noah Misch (n...@leadboat.com) wrote:
> > > > > On Mon, Apr 17, 2017 at 03:41:25PM -0400, Stephen Frost wrote:
> > > > > > I've put up a new patch for review on the thread and plan to commit
> > > > > > that tomorrow, assuming there isn't anything further.  That should
> > > > > > resolve the immediate issue, but I do think there's some merit to
> > Amit's
> > > > > > follow-on patches and will continue to discuss those and may
> > commit one
> > > > > > or both of those later this week.
> > > > >
> > > > > This PostgreSQL 10 open item is past due for your status update.
> > Kindly send
> > > > > a status update within 24 hours, and include a date for your
> > subsequent status
> > > > > update.  Refer to the policy on open item ownership:
> > > > >
> > https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> > > >
> > > > I thought I'd be able to get to this today, but other activities have
> > > > taken up the time I had planned to work on this.  I'll be on it again
> > > > tomorrow morning and will provide an update (most likely a couple of
> > > > commits) by COB tomorrow.
> > >
> > > This PostgreSQL 10 open item is again past due for your status update.
> > Kindly
> > > send a status update within 24 hours, and include a date for your
> > subsequent
> > > status update.  Refer to the policy on open item ownership:
> > >
> > https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> >
> > IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is long past
> > due
> > for your status update.  Please reacquaint yourself with the policy on open
> > item ownership[1] and then reply immediately.  If I do not hear from you by
> > 2017-04-25 00:00 UTC, I will transfer this item to release management team
> > ownership without further notice.
> 
> 
> The status is simply that I've been considering Robert's comments regarding
> the documentation and have had a busy weekend. I'll provide an update
> tomorrow.

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG 10 release notes

2017-04-27 Thread Masahiko Sawada
On Tue, Apr 25, 2017 at 10:31 AM, Bruce Momjian  wrote:
> I have committed the first draft of the Postgres 10 release notes.  They
> are current as of two days ago, and I will keep them current.  Please
> give me any feedback you have.
>

Related to a following item in release note, oldestxmin is also
reported by VACUUM VERBOSE and autovacuum , which is introduced by
commit 9eb344faf54a849898d9be012ddfa8204cfeb57c (author is Simon).

* Have VACUUM VERBOSE report the number of skipped frozen pages
(Masahiko Sawada)

Could we add this item to the release note?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] proposal psql \gdesc

2017-04-27 Thread Pavel Stehule
Hi

Sometimes I have to solve the result types of some query. It is invisible
in psql. You have to materialize table or you have to create view. Now,
when we can enhance \g command, we can introduce query describing

some like

select a, b from foo
\gdesc

 |   type | length | collation | 

 a  | varchar  | 30  |
 b  | numeric |  20 |

What do you think about this idea?

Regards

Pavel


Re: [HACKERS] PG 10 release notes

2017-04-27 Thread Amit Kapila
On Thu, Apr 27, 2017 at 7:53 PM, Marko Tiikkaja  wrote:
> On Thu, Apr 27, 2017 at 4:13 PM, Bruce Momjian  wrote:
>>
>> On Thu, Apr 27, 2017 at 08:00:28AM +0530, Amit Kapila wrote:
>> > > Oh, so non-correlated subqueries can be run in parallel.  Yes, that is
>> > > something we should have in the release notes.  How is this?
>> > >
>> > > Author: Robert Haas 
>> > > 2017-02-14 [5e6d8d2bb] Allow parallel workers to execute
>> > > subplans.
>> > >
>> > > Allow non-correlated subqueries to be run in parallel (Amit
>> > > Kapila)
>> > >
>> >
>> > Looks good to me.
>
>
> It's not clear from this item what the previous behavior was.  How about
> adding something like "Correlated subqueries can not yet be parallelized."?
> (I'm guessing that's the change, anyway, please correct me if I'm mistaken.)
>

Previous to this change queries containing a reference to subplans
(correlated or un-correlated) doesn't get parallelised.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PostgreSQL 10] default of hot_standby should be "on"?

2017-04-27 Thread Masahiko Sawada
On Thu, Apr 27, 2017 at 11:05 PM, Huong Dangminh
 wrote:
>> >>> I would refrain from doing that, having some parameters listed in the
>> >>> tests makes the intention behind those perl routines clear.
>> >
>> > Hmm, you've got a point. But when we changed the default values
>> > related to replication we dropped some explicitly settings from the
>> > regression test code.
>>
>> Looking at the patch. This is fine:
>> -  # Change a setting and restart
>> -  $node->append_conf('postgresql.conf', 'hot_standby = on');
>> -  $node->restart();
>>
>> But not that:
>>  print $conf "wal_log_hints = on\n";
>> -print $conf "hot_standby = on\n";
>>  print $conf "max_connections = 10\n";
>>
>> This is a minor point though.

After some thoughts I agree to remain it in the perl code.

>
> Thanks, I attached the update patch.

So it looks good to me.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] subscription worker doesn't start immediately on eabled

2017-04-27 Thread Masahiko Sawada
On Thu, Apr 27, 2017 at 12:51 AM, Fujii Masao  wrote:
> On Wed, Apr 26, 2017 at 4:03 PM, Kyotaro HORIGUCHI
>  wrote:
>> At Tue, 25 Apr 2017 14:45:03 -0400, Peter Eisentraut 
>>  wrote in 
>> <3d6a1bd0-08ce-301d-3336-ec9f623a3...@2ndquadrant.com>
>>> On 4/6/17 08:24, Kyotaro HORIGUCHI wrote:
>>> > The attached patch wakes up launcher when a subscription is
>>> > enabled. This fails when a subscription is enabled immedaitely
>>> > after disabling but it won't be a matter.
>>>
>>> committed, thanks
>>
>> Thanks!
>
> This patch makes me think that CREATE SUBSCRIPTION should also wake up
> the launcher only when ENABLE is specified. Patch attached. Thought?
>

That makes sense to me. Since NOCONNECT option changes some default
values including ENABLED to false I think we should apply it also when
NOCONNECT is specified?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-27 Thread Masahiko Sawada
On Fri, Apr 28, 2017 at 4:00 AM, Peter Eisentraut
 wrote:
> On 4/27/17 06:47, Petr Jelinek wrote:
>> One thing I am missing in your patch however is cleanup of entries for
>> relations that finished sync. I wonder if it would be enough to just
>> destroy the hash when we get to empty list.
>
> I had omitted that because the amount of memory "leaked" is not much,
> but I guess it wouldn't hurt to clean it up.
>
> How about the attached?
>

Thank you for updating patch!

+   /*
+* Clean up the hash table when we're done with all tables (just to
+* release the bit of memory).
+*/
+   else if (!table_states && last_start_times)
+   {

Isn't it better to use  != NIL instead as follows?

   else if (table_state != NIL && last_start_times)


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash when partition column specified twice

2017-04-27 Thread Amit Langote
On 2017/04/27 12:36, Amit Langote wrote:
> Noticed that a crash occurs if a column is specified twice when creating a
> partition:
> 
> create table p (a int) partition by list (a);
> 
> -- crashes
> create table p1 partition of parent (
>   a not null,
>   a default 1
> ) for values in (1);
> 
> The logic in MergeAttributes() that merged partition column options with
> those of the parent didn't properly check for column being specified twice
> and instead tried to delete the same ColumnDef from a list twice, causing
> the crash.
> 
> Attached fixes that.

Patch rebased, because of a conflict with b9a3ef55b2.

Thanks,
Amit
>From ea30d12e7cf3f0b5858ebd8f003269b992d10f92 Mon Sep 17 00:00:00 2001
From: amit 
Date: Thu, 27 Apr 2017 11:22:55 +0900
Subject: [PATCH] Fix crash when partition column specified twice

---
 src/backend/commands/sequence.c|  1 +
 src/backend/commands/tablecmds.c   | 20 +++-
 src/backend/nodes/copyfuncs.c  |  1 +
 src/backend/nodes/equalfuncs.c |  1 +
 src/backend/nodes/makefuncs.c  |  1 +
 src/backend/nodes/outfuncs.c   |  1 +
 src/backend/parser/gram.y  |  5 +
 src/backend/parser/parse_utilcmd.c |  2 ++
 src/include/nodes/parsenodes.h |  1 +
 src/test/regress/expected/create_table.out |  8 
 src/test/regress/sql/create_table.sql  |  9 +
 11 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index ad28225b36..acd4e359bb 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -167,6 +167,7 @@ DefineSequence(ParseState *pstate, CreateSeqStmt *seq)
 		coldef->is_local = true;
 		coldef->is_not_null = true;
 		coldef->is_from_type = false;
+		coldef->is_from_parent = false;
 		coldef->storage = 0;
 		coldef->raw_default = NULL;
 		coldef->cooked_default = NULL;
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index a35713096d..4df17c0efc 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1919,6 +1919,7 @@ MergeAttributes(List *schema, List *supers, char relpersistence,
 def->is_local = false;
 def->is_not_null = attribute->attnotnull;
 def->is_from_type = false;
+def->is_from_parent = true;
 def->storage = attribute->attstorage;
 def->raw_default = NULL;
 def->cooked_default = NULL;
@@ -2206,11 +2207,20 @@ MergeAttributes(List *schema, List *supers, char relpersistence,
 	 * merge the column options into the column from the
 	 * parent
 	 */
-	coldef->is_not_null = restdef->is_not_null;
-	coldef->raw_default = restdef->raw_default;
-	coldef->cooked_default = restdef->cooked_default;
-	coldef->constraints = restdef->constraints;
-	list_delete_cell(schema, rest, prev);
+	if (coldef->is_from_parent)
+	{
+		coldef->is_not_null = restdef->is_not_null;
+		coldef->raw_default = restdef->raw_default;
+		coldef->cooked_default = restdef->cooked_default;
+		coldef->constraints = restdef->constraints;
+		coldef->is_from_parent = false;
+		list_delete_cell(schema, rest, prev);
+	}
+	else
+		ereport(ERROR,
+(errcode(ERRCODE_DUPLICATE_COLUMN),
+ errmsg("column \"%s\" specified more than once",
+		coldef->colname)));
 }
 prev = rest;
 rest = next;
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 00a0fed23d..8fb872d288 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2804,6 +2804,7 @@ _copyColumnDef(const ColumnDef *from)
 	COPY_SCALAR_FIELD(is_local);
 	COPY_SCALAR_FIELD(is_not_null);
 	COPY_SCALAR_FIELD(is_from_type);
+	COPY_SCALAR_FIELD(is_from_parent);
 	COPY_SCALAR_FIELD(storage);
 	COPY_NODE_FIELD(raw_default);
 	COPY_NODE_FIELD(cooked_default);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 46573ae767..21dfbb0d75 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2540,6 +2540,7 @@ _equalColumnDef(const ColumnDef *a, const ColumnDef *b)
 	COMPARE_SCALAR_FIELD(is_local);
 	COMPARE_SCALAR_FIELD(is_not_null);
 	COMPARE_SCALAR_FIELD(is_from_type);
+	COMPARE_SCALAR_FIELD(is_from_parent);
 	COMPARE_SCALAR_FIELD(storage);
 	COMPARE_NODE_FIELD(raw_default);
 	COMPARE_NODE_FIELD(cooked_default);
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index e88d82f3b0..f5fde1533f 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -494,6 +494,7 @@ makeColumnDef(const char *colname, Oid typeOid, int32 typmod, Oid collOid)
 	n->is_local = true;
 	n->is_not_null = false;
 	n->is_from_type = false;
+	n->is_from_parent = false;
 	n->storage = 0;
 	n->raw_default = NULL;
 	n->cooked_default = NULL;
diff --git a/src/backend/nodes/outfuncs.c 

Re: [HACKERS] tablesync patch broke the assumption that logical rep depends on?

2017-04-27 Thread Neha Khatri
On Thu, Apr 27, 2017 at 4:01 AM, Tom Lane  wrote:

> Jeff Janes  writes:
> > This gives me compiler warning:
> > launcher.c: In function 'logicalrep_worker_launch':
> > launcher.c:257: warning: 'slot' may be used uninitialized in this
> function
>
> Yeah, me too.  Fix pushed.


Somewhat off the mark, but related to warning above, would you get similar
warning for "address" in ProcessUtilitySlow() in utility.c.

Regards,
Neha


Re: [HACKERS] Declarative partitioning - another take

2017-04-27 Thread David Fetter
On Thu, Apr 27, 2017 at 10:30:54AM +0900, Amit Langote wrote:
> On 2017/04/27 1:52, Robert Haas wrote:
> > On Tue, Apr 25, 2017 at 10:34 PM, Amit Langote
> >  wrote:
> >> FWIW, I too prefer the latter, that is, fire only the parent's triggers.
> >> In that case, applying only the patch 0001 will do.
> > 
> > Do we need to update the documentation?
> 
> Yes, I think we should.  How about as in the attached?
> 
> By the way, code changes I made in the attached are such that a subsequent
> patch could implement firing statement-level triggers of all the tables in
> a partition hierarchy, which it seems we don't want to do.  Should then
> the code be changed to not create ResultRelInfos of all the tables but
> only the root table (the one mentioned in the command)?  You will see that
> the patch adds fields named es_nonleaf_result_relations and
> es_num_nonleaf_result_relations, whereas just es_root_result_relation
> would perhaps do, for example.

Did I notice correctly that there's no way to handle transition tables
for partitions in AFTER STATEMENT triggers?

If not, I'm not suggesting that this be added at this late date, but
we might want to document that.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] subscription worker doesn't start immediately on eabled

2017-04-27 Thread Fujii Masao
On Thu, Apr 27, 2017 at 6:32 PM, Kyotaro HORIGUCHI
 wrote:
> At Thu, 27 Apr 2017 00:51:03 +0900, Fujii Masao  wrote 
> in 

Re: [HACKERS] some review comments on logical rep code

2017-04-27 Thread Fujii Masao
On Thu, Apr 27, 2017 at 5:37 PM, Petr Jelinek
 wrote:
> On 26/04/17 18:36, Fujii Masao wrote:
>> On Thu, Apr 27, 2017 at 1:28 AM, Fujii Masao  wrote:
>>> On Wed, Apr 26, 2017 at 3:47 PM, Kyotaro HORIGUCHI
>>>  wrote:
 At Wed, 26 Apr 2017 14:31:12 +0900, Masahiko Sawada 
  wrote in 
 
> On Wed, Apr 26, 2017 at 12:35 PM, Petr Jelinek
>  wrote:
>> On 26/04/17 01:01, Fujii Masao wrote:
> However this is overkill for small gain and false wakeup of the
> launcher is not so harmful (probably we can live with that), so
> we do nothing here for this issue.

 I agree this as a whole. But I think that the main issue here is
 not false wakeups, but 'possible delay of launching new workers
 by 3 minutes at most' (this is centainly a kind of false wakeups,
 though). We can live with this failure when using two-paase
 commmit, but I think it shouldn't happen silently.


 How about providing AtPrepare_ApplyLauncher(void) like the
 follows and calling it in PrepareTransaction?
>>>
>>> Or we should apply the attached patch and handle the 2PC case properly?
>>> I was thinking that it's overkill more than necessary, but that seems 
>>> not true
>>> as far as I implement that.
>>>
>> Looks like it does not even increase size of the 2pc file, +1 for this.
>
> In my honest opinion, I didn't have a big will that we should handle
> even two-phase commit case, because this case is very rare (I could
> not image such case) and doesn't mean to lead a harmful result such as
> crash of server and returning inconsistent result. it just delays the
> launching worker for at most 3 minutes. We also can deal with this for
> example by making maximum nap time of apply launcher user-configurable
> and document it.
> But if we can deal with it by minimum changes like attached your patch I 
> agree.

 This change looks reasonable to me, +1 from me to this.

 The patch reads on_commit_launcher_wakeup directly then updates
 it via ApplyALuncherWakeupAtCommit() but it's too much to add a
 function for the sake of this.
>>>
>>> OK, so what about the attached patch? I replaced all the calls to
>>> ApplyLauncherWakeupAtCommit() with the code "on_commit_launcher_wakeup = 
>>> true".
>>
>> BTW, while I was reading the code to implement the patch that
>> I posted upthread, I found that the following condition doesn't
>> work as expected. This is because "last_start_time" is always 0.
>>
>> /* Limit the start retry to once a wal_retrieve_retry_interval */
>> if (TimestampDifferenceExceeds(last_start_time, now,
>>   wal_retrieve_retry_interval))
>>
>> "last_start_time" is set to "now" when the launcher starts up new
>> worker. But "last_start_time" is defined and always initialized with 0
>> at the beginning of the main loop in ApplyLauncherMain(), so
>> the above condition becomes always true. This is obviously a bug.
>> Attached patch fixes this issue.
>>
>
> Yes, please go ahead and commit this.

Pushed. Thanks!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

2017-04-27 Thread Tom Lane
Andres Freund  writes:
> On 2017-04-27 17:22:25 -0400, Tom Lane wrote:
>> How so?  Shouldn't the indexscan go back and mark such tuples dead in
>> the index, such that they'd be visited this way only once?  If that's
>> not happening, maybe we should try to fix it.

> One way that never happens is if you end up choosing bitmap index scans
> all the time...

What I'm thinking of is the regular indexscan that's done internally
by get_actual_variable_range, not whatever ends up getting chosen as
the plan for the user query.  I had supposed that that would kill
dead index entries as it went, but maybe that's not happening for
some reason.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

2017-04-27 Thread Andres Freund
On 2017-04-27 17:22:25 -0400, Tom Lane wrote:
> > Yep, and I've seen that turn into a serious problem in production.
> 
> How so?  Shouldn't the indexscan go back and mark such tuples dead in
> the index, such that they'd be visited this way only once?  If that's
> not happening, maybe we should try to fix it.

One way that never happens is if you end up choosing bitmap index scans
all the time...  I've previously had to force a certain percentage of
queries with it disabled to keep indexes in a good state :(

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

2017-04-27 Thread Tom Lane
Robert Haas  writes:
> On Thu, Apr 27, 2017 at 4:08 AM, Dmitriy Sarafannikov
>  wrote:
>> I'd like to propose to search min and max value in index with SnapshotAny in
>> get_actual_variable_range function.

> +1 from me, but Tom rejected that approach last time.

I remain concerned about the fact that this would allow accepting deleted
values that might be way outside the surviving range of values.
SnapshotDirty is a bit lax about tuple state, but it's not so lax as to
accept fully dead tuples.

>> But if we delete many rows from beginning or end of index, it would be
>> very expensive too because we will fetch each dead row and reject it.

> Yep, and I've seen that turn into a serious problem in production.

How so?  Shouldn't the indexscan go back and mark such tuples dead in
the index, such that they'd be visited this way only once?  If that's
not happening, maybe we should try to fix it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unportable implementation of background worker start

2017-04-27 Thread Tom Lane
Andres Freund  writes:
> On 2017-04-27 16:35:29 -0400, Tom Lane wrote:
>> It looks like it might be sufficient to do "#ifdef EPOLL_CLOEXEC"
>> in latch.c, rather than bothering with a full-blown configure check.

> Yea, that sounds worth trying.  Wonder if we need to care about kernels
> not supporting it, but glibc having support?  I'd be ok skimping on that
> for now.

On my RHEL6 box,  is provided by glibc not the kernel:

$ rpm -qf /usr/include/sys/epoll.h
glibc-headers-2.12-1.209.el6_9.1.x86_64

So I think it's probably safe to assume that the header is in sync
with what glibc can do.

As for kernel (much) older than glibc, I'd rather expect glibc to paper
over that, though I've not looked at the source code to be sure.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unportable implementation of background worker start

2017-04-27 Thread Andres Freund
On 2017-04-27 16:35:29 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-04-26 17:05:39 -0400, Tom Lane wrote:
> >> I went ahead and changed the call to epoll_create into epoll_create1.
> >> I'm not too concerned about loss of portability there --- it seems
> >> unlikely that many people are still using ten-year-old glibc, and
> >> even less likely that any of them would be interested in running
> >> current Postgres on their stable-unto-death platform.  We could add
> >> a configure test for epoll_create1 if you feel one's needed, but
> >> I think it'd just be a waste of cycles.
> 
> > Yea, I think we can live with that.  If we find it's a problem, we can
> > add a configure test later.
> 
> Well, according to the buildfarm, "later" is "now" :-(.

Too bad.


> If RHEL5 is too old to have epoll_create1, I think your dates for it
> might be a bit off.  Anyway, I'll go do something about that in a
> little bit.

2008-11-13 is when glibc 2.9 was released. Appears that RHEL5 still
ships with glibc 2.5, released 2006-09-29. Given that RHEL 5 originally
was released 2007-03-05, that's not too surprising?


> It looks like it might be sufficient to do "#ifdef EPOLL_CLOEXEC"
> in latch.c, rather than bothering with a full-blown configure check.

Yea, that sounds worth trying.  Wonder if we need to care about kernels
not supporting it, but glibc having support?  I'd be ok skimping on that
for now.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] frogmouth failures

2017-04-27 Thread Andres Freund
On 2017-04-27 16:30:35 -0400, Tom Lane wrote:
> Andrew Dunstan  writes:
> > I've been trying to track down the cause of recent failures at the "make
> > check" stage on frogmouth, a 32-bit Windows/Mingw instance running on XP.
> 
> I've been wondering about that too.

Same here.  Over the years there've been a number of bug reports with
the same error code, so it's not necessarily specific to master.  Could
just be a question of backend spawn rate or such.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] frogmouth failures

2017-04-27 Thread Andrew Dunstan


On 04/27/2017 04:30 PM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> I've been trying to track down the cause of recent failures at the "make
>> check" stage on frogmouth, a 32-bit Windows/Mingw instance running on XP.
> I've been wondering about that too.
>
>> Then I tried running (offline mode) the serial schedule instead of the
>> parallel schedule, and it went through with no error. So then I tried
>> setting MAX_CONNECTIONS=10 and that also worked - see
>> 
>> I've reverted that setting, but if errors start to occur again we'll
>> have some slight notion of where to look.
> Judging by the recent history,
> https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=frogmouth=HEAD
> it's not 100% reproducible.  (Either that, or we un-broke it and re-broke
> it within the last week, which seems improbable.)  So unless you made
> quite a few successful runs with the lower MAX_CONNECTIONS setting,
> I'm dubious that there's really a connection.
>
> Having said that, I won't be a bit surprised if it is some sort of
> parallelism effect.  I just don't think one test proves much.
>

I'll leave it on for a week and then remove it, that should give us a larger 
sample.

cheers

andrew 


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unportable implementation of background worker start

2017-04-27 Thread Tom Lane
Andres Freund  writes:
> On 2017-04-26 17:05:39 -0400, Tom Lane wrote:
>> I went ahead and changed the call to epoll_create into epoll_create1.
>> I'm not too concerned about loss of portability there --- it seems
>> unlikely that many people are still using ten-year-old glibc, and
>> even less likely that any of them would be interested in running
>> current Postgres on their stable-unto-death platform.  We could add
>> a configure test for epoll_create1 if you feel one's needed, but
>> I think it'd just be a waste of cycles.

> Yea, I think we can live with that.  If we find it's a problem, we can
> add a configure test later.

Well, according to the buildfarm, "later" is "now" :-(.

If RHEL5 is too old to have epoll_create1, I think your dates for it
might be a bit off.  Anyway, I'll go do something about that in a
little bit.

It looks like it might be sufficient to do "#ifdef EPOLL_CLOEXEC"
in latch.c, rather than bothering with a full-blown configure check.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] frogmouth failures

2017-04-27 Thread Tom Lane
Andrew Dunstan  writes:
> I've been trying to track down the cause of recent failures at the "make
> check" stage on frogmouth, a 32-bit Windows/Mingw instance running on XP.

I've been wondering about that too.

> Then I tried running (offline mode) the serial schedule instead of the
> parallel schedule, and it went through with no error. So then I tried
> setting MAX_CONNECTIONS=10 and that also worked - see
> 
> I've reverted that setting, but if errors start to occur again we'll
> have some slight notion of where to look.

Judging by the recent history,
https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=frogmouth=HEAD
it's not 100% reproducible.  (Either that, or we un-broke it and re-broke
it within the last week, which seems improbable.)  So unless you made
quite a few successful runs with the lower MAX_CONNECTIONS setting,
I'm dubious that there's really a connection.

Having said that, I won't be a bit surprised if it is some sort of
parallelism effect.  I just don't think one test proves much.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

2017-04-27 Thread Robert Haas
On Thu, Apr 27, 2017 at 4:08 AM, Dmitriy Sarafannikov
 wrote:
> I'd like to propose to search min and max value in index with SnapshotAny in
> get_actual_variable_range function.

+1 from me, but Tom rejected that approach last time.

> But if we delete many rows from beginning or end of index, it would be
> very expensive too because we will fetch each dead row and reject it.

Yep, and I've seen that turn into a serious problem in production.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2017-04-27 Thread Robert Haas
On Wed, Apr 26, 2017 at 9:30 PM, Amit Langote
 wrote:
>> Do we need to update the documentation?
>
> Yes, I think we should.  How about as in the attached?

Looks reasonable, but I was thinking you might also update the section
which contrasts inheritance-based partitioning with declarative
partitioning.

> By the way, code changes I made in the attached are such that a subsequent
> patch could implement firing statement-level triggers of all the tables in
> a partition hierarchy, which it seems we don't want to do.  Should then
> the code be changed to not create ResultRelInfos of all the tables but
> only the root table (the one mentioned in the command)?  You will see that
> the patch adds fields named es_nonleaf_result_relations and
> es_num_nonleaf_result_relations, whereas just es_root_result_relation
> would perhaps do, for example.

It seems better not to create any ResultRelInfos that we don't
actually need, so +1 for such a revision to the patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding support for Default partition in partitioning

2017-04-27 Thread Robert Haas
On Thu, Apr 27, 2017 at 3:15 PM, Sven R. Kunze  wrote:
> On 27.04.2017 15:07, Robert Haas wrote:
>> On Thu, Apr 27, 2017 at 8:49 AM, Rahila Syed 
>> wrote:
>>>
>>> +1 for CREATE TABLE..PARTITION OF...DEFAULT syntax.
>>> I think substituting DEFAULT for FOR VALUES is appropriate as
>>> both cases are mutually exclusive.
>
> Just to make sound a little rounder:
>
> CREATE TABLE ... PARTITION OF ... AS DEFAULT
> CREATE TABLE ... PARTITION OF ... AS FALLBACK
>
> or
>
> CREATE TABLE ... PARTITION OF ... AS DEFAULT PARTITION
> CREATE TABLE ... PARTITION OF ... AS FALLBACK PARTITION
>
> Could any of these be feasible?

FALLBACK wouldn't be a good choice because it's not an existing parser
keyword.  We could probably insert AS before DEFAULT and/or PARTITION
afterwards, but they sort of seem like noise words.  SQL seems to have
been invented by people who didn't have any trouble remembering really
long command strings, but brevity is not without some merit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] frogmouth failures

2017-04-27 Thread Andrew Dunstan

I've been trying to track down the cause of recent failures at the "make
check" stage on frogmouth, a 32-bit Windows/Mingw instance running on XP.

I couldn't see any obvious reason for the failures, and a reboot didn't
cure the problem.

Then I tried running (offline mode) the serial schedule instead of the
parallel schedule, and it went through with no error. So then I tried
setting MAX_CONNECTIONS=10 and that also worked - see


I've reverted that setting, but if errors start to occur again we'll
have some slight notion of where to look.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] identity columns

2017-04-27 Thread Robert Haas
On Thu, Apr 27, 2017 at 3:42 PM, Peter Eisentraut
 wrote:
> On 4/27/17 10:03, Robert Haas wrote:
>>> So we could make up new syntax
>>>
>>> ALTER TABLE t1 ALTER COLUMN c1 SET GENERATED ALWAYS AS IDENTITY;
>>>
>>> and let that be set-or-add, but then the argument for staying within the
>>> SQL standard goes out the window.
>>
>> What does the SQL standard actually mandate here?  It's not clear to
>> me which parts of this syntax are mandated by the standard and which
>> we just made up.  I'm gathering (perhaps incorrectly) that you made
>> ALTER TABLE ... DROP IDENTITY as per standard, but the reverse
>> operation of setting the identity non-standard syntax.  If that's so,
>> it seems like a questionable choice.
>
> Standard syntax:
>
> 1. Add new column with identity:
>
> ALTER TABLE t1 ADD COLUMN c1 int GENERATED ALWAYS AS IDENTITY;
>
> (or equivalent for CREATE TABLE)
>
> 2. Change ALWAYS to BY DEFAULT (or vice versa) of existing column:
>
> ALTER TABLE t1 ALTER COLUMN c1 SET GENERATED BY DEFAULT;
>
> 3. Change sequence parameters of existing column:
>
> ALTER TABLE t1 ALTER COLUMN c1 SET (START 2)
>
> (the previous two can be combined)
>
> 4. Drop identity property of existing column:
>
> ALTER TABLE t1 ALTER COLUMN c1 DROP IDENTITY;
>
> (with IF EXISTS being our extension)
>
> There is no standard syntax for the inverse, that is, adding an identity
> property to an existing column.  (I have checked DB2 and Oracle.  They
> don't have anything either.  One even documents that explicitly.)
> Therefore ...
>
> Nonstandard syntax:
>
> 5. Add identity property to existing column:
>
> ALTER TABLE t1 ALTER COLUMN c1 ADD GENERATED ALWAYS AS IDENTITY;
>
> The competing proposal is that we should not have syntax #5 but that
> syntax #2 should (also) do that.
>
> My concerns about that, as previously explained, are
>
> - asymmetric idempotency behavior with DROP IDENTITY
>
> - ambiguous/inconsistent behavior when sequence options are specified in
> same command
>
> - syntax ambiguity/inconsistency with other SQL standard features
>
> The argument in favor is that syntax #5 is nonstandard.  But of course
> making #2 doing something nonstandard is also nonstandard.

OK, got it.  Given that explanation, I'm not really prepared to argue
this matter further.  That sounds basically reasonable, even if
somebody else might've chosen to do things differently.

I still think you should consider improving the psql output, though.
Vitaly's examples upthread indicate that for a serial sequence,
there's psql output showing the linkage between the table and sequence
in both directions, but not when GENERATED is used.  Can we get
something morally equivalent for the GENERATED case?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-04-27 Thread Robert Haas
On Thu, Apr 27, 2017 at 3:41 AM, Ashutosh Bapat
 wrote:
> The third goal requires that the partition bounds be compared based on
> the partition keys present in the equi-join. While matching the
> partitions to be joined, the partition bounds corresponding the base
> relation whose partition keys appear in the equi-join are used for
> comparison using support function corresponding to the data types of
> partition keys. This requires us to associate the partitions of a join
> with the bounds of base relation. E.g. A(A1, A2) with bounds (X1, X3)
> (notice missing X2), B (B1, B2) bounds (X1, X2), C (C1, C2, C3) bounds
> (X1, X2, X3) and the join is A LJ B on A.a = B.b LJ C on B.b = C.c
> assuming strict operators this can be executed as (AB)C or A(BC). AB
> will have partitions A1B1, A2B3 since there is no matching bound of A
> for B2 and A is outer relation. A1B1 is associated with bound X1 of A
> and C both. A2B3 is associated with bound of X3, which happens to be
> 2nd bound of A but third of B. When we join (AB) with C, we should
> notice that C1 goes with A1B1, C2 doesn't have any matching partition
> in AB and C3 goes with A2B3. If we compare bounds of B with C without
> any transformation we will know C2 matches B2, but we need to look at
> the children of AB to realize that B2 isn't present in any of the
> children and thus C2 should not be joined with any partition of AB.

Sure.

> That usually looks a quadratic order operation on the number of
> partitions.

Now that I don't buy.  Certainly, for range partitions, given a list
of ranges of length M and another of length N, this can be done in
O(M+N) time by merge-joining the lists of bounds.  You pointed out
upthread that for list partitions, things are a bit complicated
because a single list partition can contain multiple values which are
not necessarily contiguous, but I think that this can still be done in
O(M+N) time.  Sort all of the bounds, associating each one to a
partition, and do a merge pass; whenever two bounds match, match the
two corresponding partitions, but if one of those partitions is
already matched to some other partition, then fail.

For example, consider A1 FOR VALUES IN (1,3,5), A2 FOR VALUES IN
(2,4,6), B1 FOR VALUES IN (1,6), B2 FOR VALUES IN (2,4).  The sorted
bounds for A are 1,2,3,4,5,6; for B, 1,2,4,6.  The first value in both
lists is a 1, so the corresponding partitions A1 and B1 are matched.
The second value in both lists is a 2, so the corresponding partitions
A2 and B2 are matched.  Then we hit a 3 on the A side that has no
match on the B side, but that's fine; we don't need to do anything.
If the partition on the A side never got a mapping at any point during
this merge pass, we'd eventually need to match it to a dummy partition
(unless this is an inner join) but it's already mapped to B1 so no
problem.  Then we hit a 4 which says that A2 must match B2, which is
consistent with what we already determine; no problem.  Then we hit
another value that only exists on the A side, which is fine just as
before.  Finally we hit a 6 on each side, which means that A2 must
match B1, which is inconsistent with the existing mappings so we give
up; no partitionwise join is possible here.

> The complexity can be reduced by maintaining as many
> partition bounds as the number of base relations participating in the
> join (an idea, I have floated earlier [1]) I don't elaborate it here
> to avoid digression. There's also the complexity of an N-way join with
> multiple partition keys and joins on partition keys from different
> relations as discussed in [1]. There may be more involved cases, that
> I haven't thought about. In short, implementation for 1st and 3rd
> optimization together looks fairly complex.

I spent some time thinking about this today and I think I see how we
could make it work: keep a single set of bounds for each join
relation, but record the type of each bound.  For example, suppose we
are full joining relation i2, with an int2 partition column, which has
partitions i2a from 0 to 1 and i2b from 2 to 3, to
relation i4, with an int4 partition column, which has partitions i4a
from 5000 to 15000 and i4b from 25000 to 35000.   We end up with a
joinrel with 2 partitions.  The first goes from 0 (stored as an int2)
to 15000 (stored as an int4) and the second goes from 2 (stored as
an int2) to 35000 (stored as an int4).  If we subsequently need to
merge these bounds with yet another relation at a higher join level,
we can use the opfamily (which is common) to dig out the right
cross-type operator for each comparison we may need to perform, based
on the precise types of the datums being compared.  Of course, we
might not find an appropriate cross-type operator in some cases,
because an opfamily isn't required to provide that, so then we'd have
to fail gracefully somehow, but that could be done.

Having said that I think we could make this work, I'm starting to
agree with 

Re: [HACKERS] identity columns

2017-04-27 Thread Peter Eisentraut
On 4/27/17 10:03, Robert Haas wrote:
>> So we could make up new syntax
>>
>> ALTER TABLE t1 ALTER COLUMN c1 SET GENERATED ALWAYS AS IDENTITY;
>>
>> and let that be set-or-add, but then the argument for staying within the
>> SQL standard goes out the window.
> 
> What does the SQL standard actually mandate here?  It's not clear to
> me which parts of this syntax are mandated by the standard and which
> we just made up.  I'm gathering (perhaps incorrectly) that you made
> ALTER TABLE ... DROP IDENTITY as per standard, but the reverse
> operation of setting the identity non-standard syntax.  If that's so,
> it seems like a questionable choice.

Standard syntax:

1. Add new column with identity:

ALTER TABLE t1 ADD COLUMN c1 int GENERATED ALWAYS AS IDENTITY;

(or equivalent for CREATE TABLE)

2. Change ALWAYS to BY DEFAULT (or vice versa) of existing column:

ALTER TABLE t1 ALTER COLUMN c1 SET GENERATED BY DEFAULT;

3. Change sequence parameters of existing column:

ALTER TABLE t1 ALTER COLUMN c1 SET (START 2)

(the previous two can be combined)

4. Drop identity property of existing column:

ALTER TABLE t1 ALTER COLUMN c1 DROP IDENTITY;

(with IF EXISTS being our extension)

There is no standard syntax for the inverse, that is, adding an identity
property to an existing column.  (I have checked DB2 and Oracle.  They
don't have anything either.  One even documents that explicitly.)
Therefore ...

Nonstandard syntax:

5. Add identity property to existing column:

ALTER TABLE t1 ALTER COLUMN c1 ADD GENERATED ALWAYS AS IDENTITY;

The competing proposal is that we should not have syntax #5 but that
syntax #2 should (also) do that.

My concerns about that, as previously explained, are

- asymmetric idempotency behavior with DROP IDENTITY

- ambiguous/inconsistent behavior when sequence options are specified in
same command

- syntax ambiguity/inconsistency with other SQL standard features

The argument in favor is that syntax #5 is nonstandard.  But of course
making #2 doing something nonstandard is also nonstandard.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding support for Default partition in partitioning

2017-04-27 Thread Rahila Syed
Hi,

On Apr 27, 2017 18:37, "Robert Haas"  wrote:

>
>
> Are you also working on extending this to work with range
> partitioning?  Because I think that would be good to do.
>
>
> Currently I am working on review comments and bug fixes for the
default list partitioning patch. After that I can start with default
partition for range partitioning.

Thank you,
Rahila Syed


Re: [HACKERS] Interval for launching the table sync worker

2017-04-27 Thread Petr Jelinek
On 27/04/17 21:00, Peter Eisentraut wrote:
> On 4/27/17 06:47, Petr Jelinek wrote:
>> One thing I am missing in your patch however is cleanup of entries for
>> relations that finished sync. I wonder if it would be enough to just
>> destroy the hash when we get to empty list.
> 
> I had omitted that because the amount of memory "leaked" is not much,
> but I guess it wouldn't hurt to clean it up.
> 
> How about the attached?
> 

Yes that's more or less what I meant. All good.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding support for Default partition in partitioning

2017-04-27 Thread Sven R. Kunze

On 27.04.2017 15:07, Robert Haas wrote:

On Thu, Apr 27, 2017 at 8:49 AM, Rahila Syed  wrote:

+1 for CREATE TABLE..PARTITION OF...DEFAULT syntax.
I think substituting DEFAULT for FOR VALUES is appropriate as
both cases are mutually exclusive.


Just to make sound a little rounder:

CREATE TABLE ... PARTITION OF ... AS DEFAULT
CREATE TABLE ... PARTITION OF ... AS FALLBACK

or

CREATE TABLE ... PARTITION OF ... AS DEFAULT PARTITION
CREATE TABLE ... PARTITION OF ... AS FALLBACK PARTITION


Could any of these be feasible?


Sven


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication in the same cluster

2017-04-27 Thread Peter Eisentraut
On 4/27/17 04:08, Petr Jelinek wrote:
> Note that the workaround for all of this is not all that complex, you do
> same thing (create slot manually) you'd do for physical replication with
> slots.

Maybe we should just document this issue for now.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-27 Thread Peter Eisentraut
On 4/27/17 06:47, Petr Jelinek wrote:
> One thing I am missing in your patch however is cleanup of entries for
> relations that finished sync. I wonder if it would be enough to just
> destroy the hash when we get to empty list.

I had omitted that because the amount of memory "leaked" is not much,
but I guess it wouldn't hurt to clean it up.

How about the attached?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


v2-0001-Wait-between-tablesync-worker-restarts.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG 10 release notes

2017-04-27 Thread Dagfinn Ilmari Mannsåker
Bruce Momjian  writes:

> I have committed the first draft of the Postgres 10 release notes.  They
> are current as of two days ago, and I will keep them current.  Please
> give me any feedback you have.

I noticed a few niggles with the links in "my" entires, patch attached.

Cheers,

ilmari

-- 
"A disappointingly low fraction of the human race is,
 at any given time, on fire." - Stig Sandbeck Mathisen

>From a32336f420a4fb25bdedeb0dc8ccf68503638e93 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Thu, 27 Apr 2017 15:20:59 +0100
Subject: [PATCH] Fix some release note links

- Make max_pred_locks_per_page a link too
- The ALTER TYPE link was pointing to the ALTER TABLE page
---
 doc/src/sgml/release-10.sgml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/release-10.sgml b/doc/src/sgml/release-10.sgml
index de2129758c..6f49618d07 100644
--- a/doc/src/sgml/release-10.sgml
+++ b/doc/src/sgml/release-10.sgml
@@ -665,7 +665,7 @@

 The new settings are  and
-   max_pred_locks_per_page.
+.

   
 
@@ -1830,7 +1830,7 @@
   
 
   
-   This uses the syntax ALTER
+   This uses the syntax ALTER
TYPE ... RENAME VALUE.
   
  
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Incremental sort

2017-04-27 Thread Alexander Korotkov
On Thu, Apr 27, 2017 at 5:06 PM, Robert Haas  wrote:

> On Wed, Apr 26, 2017 at 11:39 AM, Alexander Korotkov
>  wrote:
> > But I'd like to make incremental sort not slower than quicksort in case
> of
> > presorted data.  New idea about it comes to my mind.  Since cause of
> > incremental sort slowness in this case is too frequent reset of
> tuplesort,
> > then what if we would artificially put data in larger groups.  Attached
> > revision of patch implements this: it doesn't stop to accumulate tuples
> to
> > tuplesort until we have MIN_GROUP_SIZE tuples.
> >
> > Now, incremental sort is not slower than quicksort.  And this seems to be
> > cool.
> > However, in the LIMIT case we will pay the price of fetching some extra
> > tuples from outer node.  But, that doesn't seem to hurt us too much.
> >
> > Any thoughts?
>
> Nice idea.



Cool.
Than I'm going to make a set of synthetic performance tests in order to
ensure that there is no regression.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] PG 10 release notes

2017-04-27 Thread Marko Tiikkaja
On Thu, Apr 27, 2017 at 4:13 PM, Bruce Momjian  wrote:

> On Thu, Apr 27, 2017 at 08:00:28AM +0530, Amit Kapila wrote:
> > > Oh, so non-correlated subqueries can be run in parallel.  Yes, that is
> > > something we should have in the release notes.  How is this?
> > >
> > > Author: Robert Haas 
> > > 2017-02-14 [5e6d8d2bb] Allow parallel workers to execute
> subplans.
> > >
> > > Allow non-correlated subqueries to be run in parallel (Amit
> Kapila)
> > >
> >
> > Looks good to me.


It's not clear from this item what the previous behavior was.  How about
adding something like "Correlated subqueries can not yet be parallelized."?
(I'm guessing that's the change, anyway, please correct me if I'm mistaken.)


.m


Re: [HACKERS] pg_dump emits ALTER TABLE ONLY partitioned_table

2017-04-27 Thread Stephen Frost
Amit,

* Amit Langote (langote_amit...@lab.ntt.co.jp) wrote:
> >> So to summarize what the patches do (some of these were posted earlier)
> >>
> >> 0002: pg_dump: Do not emit WITH OPTIONS keywords with partition's columns
> > 
> > I'm trying to understand why this is also different.  At least on an
> > initial look, there seems to be a bunch more copy-and-paste from the
> > existing TypedTable to Partition in gram.y, which seems to all boil down
> > to removing the need for WITH OPTIONS to be specified.  If WITH OPTIONS
> > would be too much to have included for partitioning, and it isn't
> > actually necessary, why not just make it optional, and use the same
> > construct for both, removing all the duplicaiton and the need for
> > pg_dump to output it?
> 
> OK, I think optionally allowing WITH OPTIONS keywords would be nice.
> 
> So in lieu of this patch, I propose a patch that modifies gram.y to allow
> WITH OPTIONS to specified.

The point I was trying to get at was that if you make WITH OPTIONS
optional for the TypedTableElement case, you can remove all of the
PartitionElement-related productions and then both the OF type_name and
the PARTITION OF case will accept WITH OPTIONS as noise words and also
work without WITH OPTIONS being specified.

This also makes the code actually match the documentation since, at
least in the CREATE FOREIGN TABLE documentation, we claim that WITH
OPTIONS is required.

Please see a proposed patch attached to accomplish this.

> > Given that we're now setting numParents for partitions, I
> > would think we'd just move the if (tbinfo->partitionOf) branches up
> > under the if (numParents > 0) branches, which I think would also help us
> > catch additional issues, like the fact that a binary-upgrade with
> > partitions in a different namespace from the partitioned table would
> > fail because the ATTACH PARTITION code doesn't account for it:
> > 
> > appendPQExpBuffer(q, "ALTER TABLE ONLY %s ",
> >   fmtId(tbinfo->partitionOf->dobj.name));
> > appendPQExpBuffer(q, "ATTACH PARTITION %s %s;\n",
> >   fmtId(tbinfo->dobj.name),
> >   tbinfo->partitiondef);
> > 
> > unlike the existing inheritance code a few lines above, which does:
> > 
> > appendPQExpBuffer(q, "ALTER TABLE ONLY %s INHERIT ",
> >   fmtId(tbinfo->dobj.name));
> > if (parentRel->dobj.namespace != tbinfo->dobj.namespace)
> > appendPQExpBuffer(q, "%s.",
> > 
> > fmtId(parentRel->dobj.namespace->dobj.name));
> > appendPQExpBuffer(q, "%s;\n",
> >   fmtId(parentRel->dobj.name));
> 
> That's a good point.  I put both cases under the if (numParents > 0) block
> and appropriate text is emitted depending on whether the table is a
> partition or plain inheritance child.

Right, ok.

> >> 0004: Change the way pg_dump retrieves partitioning info (removes a lot
> >>   of unnecessary code and instead makes partitioning case use the old
> >>   inheritance code, etc.)
> > 
> > This looks pretty good, at least on first blush, probably in part
> > because it's mostly removing code.  The CASE statement in the
> > getTables() query isn't needed, once pg_get_partkeydef() has been
> > changed to not throw an ERROR when passed a non-partitioned table OID.
> 
> Oh yes, fixed.

Good.

> 0003: Change the way pg_dump retrieves partitioning info (removes a lot
>   of unnecessary code and instead makes partitioning case use the old
>   inheritance code, etc.)

This has conflicts due to my proposed patch as my patch changes pg_dump
to not output the now-noise-words WITH OPTIONS at all.

> 0004: Fix a bug in pg_dump's --binary-upgrade code that caused ALTER TABLE
>   INHERIT to be emitted to attach a partition in addition to of ALTER
>   TABLE ATTACH PARTITION and that no schema-name was emitted where it
>   should have

Given that it's touching the same places, this would really make sense
to merge into 0003 now.

I'm going to continue to mull over the attached for a bit and add some
tests to it, but if it looks good then I'll push it this afternoon,
after which you'll hopefully have time to rebase 0003 and merge in 0004
to that, which I can review and probably push tomorrow.

Thanks!

Stephen
From 1d997077fb4e5395fed0e8e0dd7dc3b629a11553 Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Thu, 27 Apr 2017 09:19:34 -0400
Subject: [PATCH] Remove unnecessairly duplicated gram.y productions

Declarative partitioning duplicated the TypedTableElement productions,
evidently to remove the need to specify WITH OPTIONS when creating
partitions.  Instead, simply make WITH OPTIONS optional in the
TypedTableElement production and remove all of the duplicate
PartitionElement-related productions.  

Re: [HACKERS] PG 10 release notes

2017-04-27 Thread Bruce Momjian
On Thu, Apr 27, 2017 at 08:14:34AM +0530, Amit Kapila wrote:
> 
> 
> 
>  Improve efficiency of hash index growth (Amit Kapila, Mithun Cy)
> 
>  
> 
> The first two commits b0f18cb77, 30df93f69 are done as preliminary
> work to "Add write-ahead logging support to hash indexes", so it seems
> inappropriate to add them here.  We can add it along with below item:
> 
>  
> 
>  Add write-ahead logging support to hash indexes (Amit Kapila)
> 

OK, commits moved.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG 10 release notes

2017-04-27 Thread Bruce Momjian
On Thu, Apr 27, 2017 at 08:00:28AM +0530, Amit Kapila wrote:
> > Oh, so non-correlated subqueries can be run in parallel.  Yes, that is
> > something we should have in the release notes.  How is this?
> >
> > Author: Robert Haas 
> > 2017-02-14 [5e6d8d2bb] Allow parallel workers to execute subplans.
> >
> > Allow non-correlated subqueries to be run in parallel (Amit Kapila)
> >
> 
> Looks good to me.

OK, added.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Incremental sort

2017-04-27 Thread Robert Haas
On Wed, Apr 26, 2017 at 11:39 AM, Alexander Korotkov
 wrote:
> But I'd like to make incremental sort not slower than quicksort in case of
> presorted data.  New idea about it comes to my mind.  Since cause of
> incremental sort slowness in this case is too frequent reset of tuplesort,
> then what if we would artificially put data in larger groups.  Attached
> revision of patch implements this: it doesn't stop to accumulate tuples to
> tuplesort until we have MIN_GROUP_SIZE tuples.
>
> Now, incremental sort is not slower than quicksort.  And this seems to be
> cool.
> However, in the LIMIT case we will pay the price of fetching some extra
> tuples from outer node.  But, that doesn't seem to hurt us too much.
>
> Any thoughts?

Nice idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PostgreSQL 10] default of hot_standby should be "on"?

2017-04-27 Thread Huong Dangminh
> >>> I would refrain from doing that, having some parameters listed in the
> >>> tests makes the intention behind those perl routines clear.
> >
> > Hmm, you've got a point. But when we changed the default values
> > related to replication we dropped some explicitly settings from the
> > regression test code.
> 
> Looking at the patch. This is fine:
> -  # Change a setting and restart
> -  $node->append_conf('postgresql.conf', 'hot_standby = on');
> -  $node->restart();
> 
> But not that:
>  print $conf "wal_log_hints = on\n";
> -print $conf "hot_standby = on\n";
>  print $conf "max_connections = 10\n";
> 
> This is a minor point though.

Thanks, I attached the update patch.

---
Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/


hot_standby.patch
Description: hot_standby.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] identity columns

2017-04-27 Thread Robert Haas
On Wed, Apr 26, 2017 at 10:03 PM, Peter Eisentraut
 wrote:
> On 4/23/17 16:58, Robert Haas wrote:
>> I agree that ADD is a little odd here, but it doesn't seem terrible.
>> But why do we need it?  Instead of:
>>
>> ADD GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY
>> SET GENERATED { ALWAYS | BY DEFAULT }
>> DROP IDENTITY [ IF EXISTS ]
>>
>> Why not just:
>>
>> SET GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY
>> DROP IDENTITY [ IF EXISTS ]
>>
>> Surely the ALTER TABLE command can tell whether the column is already
>> GENERATED, so the first form could make it generated if it's not and
>> adjust the ALWAYS/BY DEFAULT property if it is.
>
> Note that DROP IDENTITY is a non-idempotent command (i.e., it errors if
> the thing has already been dropped), per SQL standard, which is why we
> have IF EXISTS there.  So it would be weird if the corresponding
> creation command would be idempotent (i.e., it did not care whether the
> thing is already there).

Hmm, I guess that has some validity to it.

> Also, if we tried to merge the ADD and SET cases, the syntax would come
> out weird.  The creation syntax is
>
> CREATE TABLE t1 (c1 int GENERATED ALWAYS AS IDENTITY);
>
> The syntax to change an existing column is
>
> ALTER TABLE t1 ALTER COLUMN c1 SET GENERATED ALWAYS;
>
> But we can't just make the "AS IDENTITY" optional, because that same
> syntax is also used by the "generated columns" feature.

I don't understand how that follows.

> So we could make up new syntax
>
> ALTER TABLE t1 ALTER COLUMN c1 SET GENERATED ALWAYS AS IDENTITY;
>
> and let that be set-or-add, but then the argument for staying within the
> SQL standard goes out the window.

What does the SQL standard actually mandate here?  It's not clear to
me which parts of this syntax are mandated by the standard and which
we just made up.  I'm gathering (perhaps incorrectly) that you made
ALTER TABLE ... DROP IDENTITY as per standard, but the reverse
operation of setting the identity non-standard syntax.  If that's so,
it seems like a questionable choice.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding support for Default partition in partitioning

2017-04-27 Thread Robert Haas
On Thu, Apr 27, 2017 at 8:49 AM, Rahila Syed  wrote:
>>I suspect it could be done as of now, but I'm a little worried that it
>>might create grammar conflicts in the future as we extend the syntax
>>further.  If we use CREATE TABLE ... PARTITION OF .. DEFAULT, then the
>>word DEFAULT appears in the same position where we'd normally have FOR
>>VALUES, and so the parser will definitely be able to figure out what's
>>going on.  When it gets to that position, it will see FOR or it will
>>see DEFAULT, and all is clear.  OTOH, if we use CREATE TABLE ...
>>DEFAULT PARTITION OF ..., then we have action at a distance: whether
>>or not the word DEFAULT is present before PARTITION affects which
>>tokens are legal after the parent table name.  bison isn't always very
>>smart about that kind of thing.  No particular dangers come to mind at
>>the moment, but it makes me nervous anyway.
>
> +1 for CREATE TABLE..PARTITION OF...DEFAULT  syntax.
> I think substituting DEFAULT for FOR VALUES is appropriate as
> both cases are mutually exclusive.
>
> One more thing that needs consideration is should default partitions be
> partitioned further? Other databases allow default partitions to be
> partitioned further. I think, its normal for users to expect the data in
> default partitions to also be divided into sub partitions.  So
> it should be supported.
> My colleague Rajkumar Raghuwanshi brought to my notice the current patch
> does not handle this correctly.
> I will include this in the updated patch if there is no objection.
>
> On the other hand if sub partitions of a default partition is to be
> prohibited,
> an error should be thrown if PARTITION BY is specified after DEFAULT.

I see no reason to prohibit it.  You can further partition any other
kind of partition, so there seems to be no reason to disallow it in
this one case.

Are you also working on extending this to work with range
partitioning?  Because I think that would be good to do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG 10 release notes

2017-04-27 Thread Daniel Verite
Fabien COELHO wrote:

>Fix psql \p to always print what would be executed by \g or \w (Daniel
>Vérité)
> 
>Previously \p didn't properly print the reverted-to command after a
>buffer contents reset. CLARIFY?
> 
> The fix is linked to the change introduced by Tom when 
> refactoring/cleaning up in e984ef586 (\if) which change psql's \p 
> behavior.

That's my recollection as well. The "Previously" does not refer to 9.6,
but to that commit.

> I'm not sure how this should appear in the release notes. Maybe not at 
> all, associated to the feature in which the behavioral change was 
> introduced...

There is small change of behavior coming as a by-product of the
introduction of  /if.../endif blocks.

When doing in 9.x:

select '1st buffer' \g
followed by \e
and editing with select '2nd buffer' (without ending the query)
and then back in psql doing '\r' and '\p', the result is
select '2nd buffer'

The same with v10 leads instead to
select '1st buffer'

I'm not sure whether it's above the level of detail worth being mentioned
in the release notes.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding support for Default partition in partitioning

2017-04-27 Thread Rahila Syed
>I suspect it could be done as of now, but I'm a little worried that it
>might create grammar conflicts in the future as we extend the syntax
>further.  If we use CREATE TABLE ... PARTITION OF .. DEFAULT, then the
>word DEFAULT appears in the same position where we'd normally have FOR
>VALUES, and so the parser will definitely be able to figure out what's
>going on.  When it gets to that position, it will see FOR or it will
>see DEFAULT, and all is clear.  OTOH, if we use CREATE TABLE ...
>DEFAULT PARTITION OF ..., then we have action at a distance: whether
>or not the word DEFAULT is present before PARTITION affects which
>tokens are legal after the parent table name.  bison isn't always very
>smart about that kind of thing.  No particular dangers come to mind at
>the moment, but it makes me nervous anyway.

+1 for CREATE TABLE..PARTITION OF...DEFAULT  syntax.
I think substituting DEFAULT for FOR VALUES is appropriate as
both cases are mutually exclusive.

One more thing that needs consideration is should default partitions be
partitioned further? Other databases allow default partitions to be
partitioned further. I think, its normal for users to expect the data in
default partitions to also be divided into sub partitions.  So
it should be supported.
My colleague Rajkumar Raghuwanshi brought to my notice the current patch
does not handle this correctly.
I will include this in the updated patch if there is no objection.

On the other hand if sub partitions of a default partition is to be
prohibited,
an error should be thrown if PARTITION BY is specified after DEFAULT.


Thank you,
Rahila Syed



On Tue, Apr 25, 2017 at 1:46 AM, Robert Haas  wrote:

> On Mon, Apr 24, 2017 at 8:14 AM, Ashutosh Bapat
>  wrote:
> > On Mon, Apr 24, 2017 at 4:24 PM, Robert Haas 
> wrote:
> >> On Mon, Apr 24, 2017 at 5:10 AM, Rahila Syed 
> wrote:
> >>> Following can also be considered as it specifies more clearly that the
> >>> partition holds default values.
> >>>
> >>> CREATE TABLE ...PARTITION OF...FOR VALUES DEFAULT;
> >>
> >> The partition doesn't contain default values; it is itself a default.
> >
> > Is CREATE TABLE ... DEFAULT PARTITION OF ... feasible? That sounds more
> natural.
>
> I suspect it could be done as of now, but I'm a little worried that it
> might create grammar conflicts in the future as we extend the syntax
> further.  If we use CREATE TABLE ... PARTITION OF .. DEFAULT, then the
> word DEFAULT appears in the same position where we'd normally have FOR
> VALUES, and so the parser will definitely be able to figure out what's
> going on.  When it gets to that position, it will see FOR or it will
> see DEFAULT, and all is clear.  OTOH, if we use CREATE TABLE ...
> DEFAULT PARTITION OF ..., then we have action at a distance: whether
> or not the word DEFAULT is present before PARTITION affects which
> tokens are legal after the parent table name.  bison isn't always very
> smart about that kind of thing.  No particular dangers come to mind at
> the moment, but it makes me nervous anyway.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] Interval for launching the table sync worker

2017-04-27 Thread Petr Jelinek
On 25/04/17 19:54, Peter Eisentraut wrote:
> I feel it's getting a bit late for reworkings of this extent, also
> considering the marginal nature of the problem we are trying to fix.  My
> patch from April 18 is very localized and gets the job done.
> 
> I think this is still a good direction to investigate, but if we have to
> extend the hash table API to get it done, this might not be the best time.
> 

Yeah the hash API change needed is a bummer at this stage.

One thing I am missing in your patch however is cleanup of entries for
relations that finished sync. I wonder if it would be enough to just
destroy the hash when we get to empty list.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] subscription worker doesn't start immediately on eabled

2017-04-27 Thread Kyotaro HORIGUCHI
At Thu, 27 Apr 2017 00:51:03 +0900, Fujii Masao  wrote 
in 

Re: [HACKERS] PG 10 release notes

2017-04-27 Thread Fabien COELHO


Hello Bruce,


I have committed the first draft of the Postgres 10 release notes.  They
are current as of two days ago, and I will keep them current.  Please
give me any feedback you have.


About:

"""
  Fix psql \p to always print what would be executed by \g or \w (Daniel
  Vérité)

  Previously \p didn't properly print the reverted-to command after a
  buffer contents reset. CLARIFY?
"""

The fix is linked to the change introduced by Tom when 
refactoring/cleaning up in e984ef586 (\if) which change psql's \p 
behavior.


This is not a user visible change version-to-version, it is more like a 
bug fix for a bug/behavioral change introduced within pg10 developement 
process itself.


I'm not sure how this should appear in the release notes. Maybe not at 
all, associated to the feature in which the behavioral change was 
introduced...


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] some review comments on logical rep code

2017-04-27 Thread Petr Jelinek
On 26/04/17 18:36, Fujii Masao wrote:
> On Thu, Apr 27, 2017 at 1:28 AM, Fujii Masao  wrote:
>> On Wed, Apr 26, 2017 at 3:47 PM, Kyotaro HORIGUCHI
>>  wrote:
>>> At Wed, 26 Apr 2017 14:31:12 +0900, Masahiko Sawada  
>>> wrote in 
>>> 
 On Wed, Apr 26, 2017 at 12:35 PM, Petr Jelinek
  wrote:
> On 26/04/17 01:01, Fujii Masao wrote:
 However this is overkill for small gain and false wakeup of the
 launcher is not so harmful (probably we can live with that), so
 we do nothing here for this issue.
>>>
>>> I agree this as a whole. But I think that the main issue here is
>>> not false wakeups, but 'possible delay of launching new workers
>>> by 3 minutes at most' (this is centainly a kind of false wakeups,
>>> though). We can live with this failure when using two-paase
>>> commmit, but I think it shouldn't happen silently.
>>>
>>>
>>> How about providing AtPrepare_ApplyLauncher(void) like the
>>> follows and calling it in PrepareTransaction?
>>
>> Or we should apply the attached patch and handle the 2PC case properly?
>> I was thinking that it's overkill more than necessary, but that seems 
>> not true
>> as far as I implement that.
>>
> Looks like it does not even increase size of the 2pc file, +1 for this.

 In my honest opinion, I didn't have a big will that we should handle
 even two-phase commit case, because this case is very rare (I could
 not image such case) and doesn't mean to lead a harmful result such as
 crash of server and returning inconsistent result. it just delays the
 launching worker for at most 3 minutes. We also can deal with this for
 example by making maximum nap time of apply launcher user-configurable
 and document it.
 But if we can deal with it by minimum changes like attached your patch I 
 agree.
>>>
>>> This change looks reasonable to me, +1 from me to this.
>>>
>>> The patch reads on_commit_launcher_wakeup directly then updates
>>> it via ApplyALuncherWakeupAtCommit() but it's too much to add a
>>> function for the sake of this.
>>
>> OK, so what about the attached patch? I replaced all the calls to
>> ApplyLauncherWakeupAtCommit() with the code "on_commit_launcher_wakeup = 
>> true".
> 
> BTW, while I was reading the code to implement the patch that
> I posted upthread, I found that the following condition doesn't
> work as expected. This is because "last_start_time" is always 0.
> 
> /* Limit the start retry to once a wal_retrieve_retry_interval */
> if (TimestampDifferenceExceeds(last_start_time, now,
>   wal_retrieve_retry_interval))
> 
> "last_start_time" is set to "now" when the launcher starts up new
> worker. But "last_start_time" is defined and always initialized with 0
> at the beginning of the main loop in ApplyLauncherMain(), so
> the above condition becomes always true. This is obviously a bug.
> Attached patch fixes this issue.
> 

Yes, please go ahead and commit this.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup: Allow use of arbitrary compression program

2017-04-27 Thread Michael Harris
Hi All,

I have a working prototype now, but there is one aspect I haven't been
able to find the best solution for.

The CLI interface so far has the following new added option:

-C, --compressprog=PRG use supplied external program for compression

An example usage would be:

pg_basebackup -D /home/harmic/tmp/ -C bzip2 -F t

The command string supplied to -C should be a compression command that
reads from stdin and outputs to stdout.

The problem is: when constructing output filename(s), how can we
suffix them with the correct suffix (.gz / .bz2 / .xz / ) ?

The options I can think of are:

 1. Add yet another command line option to specify a suffix
 2. Some kind of heuristic to figure it out from the supplied command
string (from known compression programs, but that will never be
complete)
 3. Don't worry about it, let the user rename them afterwards, in
which case they would be named .tar
 4. Make the compression command a template, eg. "bzip2 -c > %s.bz2",
so that the template itself will add the suffix

#4 might also be more flexible for tools that don't support output to
stdout, but it is a bit more complex to use.

Any other ideas?

Regards // Mike


On Wed, Apr 12, 2017 at 3:49 PM, Michael Harris  wrote:
> Hi,
>
> Thanks for the feedback!
>
>>> 2) The current logic either uses zlib if compiled in, or offers no
>>> compression at all, controlled by a series of #ifdef/#endif. I would
>>> prefer that the user can either use zlib or an external program
>>> without having to recompile, so I would remove the #ifdefs and replace
>>> them with run time branching.
>>
>>
>> Not sure how that would work or be needed. The reasonable thing would be if 
>> zlib
>> is available when building the choices would be "no compression",
>> "zlib compression" or "external compression". If there was no zlib available
>> when building, the choices would be "no compression" or "external 
>> compression".
>
> That's exactly how I intend it to work. I had thought that the current
> structure of the code would not allow that, but looking at it more
> closely I see that it does, so I don't have to re-organize the
> #ifdefs.
>
> Regards // Mike


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication in the same cluster

2017-04-27 Thread Petr Jelinek
On 27/04/17 04:50, Tom Lane wrote:
> Peter Eisentraut  writes:
 If that's a predictable deadlock, I think a minimum expectation is that
 the system should notice it and throw an error, not just hang.
> 
>> We had some discussions early on about detecting connections to the same
>> server, but it's not entirely clear how to do that and it didn't seem
>> worth it at the time.
> 
> I wonder whether we actually need to detect connections to the same
> server per se.  I'm thinking about the one end taking some special
> heavyweight lock, and the other end taking the same lock, which would
> generally be free as long as the two ends aren't on the same server.
> Cascading replication might be a problem though ...
> 

Well cascading might not be problem. I mean this issue exists only
during the slot creation which is one time operation. Which is why the
workaround solves it. But I don't see what we could lock that's common
between publisher and subscriber unless we invent some database object
specifically for this purpose.

My idea in the original thread was to put the info about xid and sysid
somewhere in shmem when creating subscription and checking that on the
other side if the sysid is same as local one and the xid is active. It
would serialize the subscription creation but I don't see that as big
issue, it's not like it's common to create thousands of them in parallel
nor it is something where we care about shaving microseconds of runtime.

Back when writing the original patch set, I was also playing with the
idea of having CREATE SUBSCRIPTION do multiple committed steps in
similar fashion to CREATE INDEX CONCURRENTLY but that leaves mess behind
on failure which also wasn't very popular outcome. I wonder how bad it
would be if we created all the stuff for subscription but in disabled
form, then committed, then created slot outside of tx (slot creation is
not transactional anyway) and then switched the subscription to enabled
(if needed) in next tx. It would still leave subscription behind on
failure but a) user would see the failure, b) the subscription would be
inactive so no active harm from it. We also already prevent running
CREATE SUBSCRIPTION inside transaction block when automatic slot
creation is chosen so there is no difference from that perspective.

Just for info, in pglogical, we solve this by having the worker create
slot, not the user command, so then it just works. The reason why I
didn't do this in core is that from practice it does not seem to be very
user friendly in case there are errors (not enough slots free,
connecting not only to same server but same db, etc) because they will
only see the errors in logs after the fact (and often they don't look).
I am already unhappy about the fact that we have no facility for
bgworker to save the last error before dying into place that is
accessible via SQL and I'd rather not hide even more errors in the log.

Note that the workaround for all of this is not all that complex, you do
same thing (create slot manually) you'd do for physical replication with
slots.

Thoughts?

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump emits ALTER TABLE ONLY partitioned_table

2017-04-27 Thread Amit Langote
Hi Stephen,

On 2017/04/26 23:31, Stephen Frost wrote:
>>> I looked through
>>> pg_get_partkeydef() and it didn't seem to be particularly expensive to
>>> run, though evidently it doesn't handle being passed an OID that it
>>> doesn't expect very cleanly:
>>>
>>> =# select pg_get_partkeydef(oid) from pg_class;
>>> ERROR:  cache lookup failed for partition key of 52333
>>>
>>> I don't believe that's really very appropriate of it to do though and
>>> instead it should work the same way pg_get_indexdef() and others do and
>>> return NULL in such cases, so people can use it sanely.
>>>
>>> I'm working through this but it's going to take a bit of time to clean
>>> it up and it's not going to get finished today, but I do think it needs
>>> to get done and so I'll work to make it happen this week.  I didn't
>>> anticipate finding this, obviously and am a bit disappointed by it.
>>
>> Yeah, that was sloppy. :-(
>>
>> Attached patch 0005 fixes that and adds a test to rules.sql.
> 
> Good, I'll commit that first, since it will make things simpler for
> pg_dump.

Thanks for committing this one.

>> So to summarize what the patches do (some of these were posted earlier)
>>
>> 0002: pg_dump: Do not emit WITH OPTIONS keywords with partition's columns
> 
> I'm trying to understand why this is also different.  At least on an
> initial look, there seems to be a bunch more copy-and-paste from the
> existing TypedTable to Partition in gram.y, which seems to all boil down
> to removing the need for WITH OPTIONS to be specified.  If WITH OPTIONS
> would be too much to have included for partitioning, and it isn't
> actually necessary, why not just make it optional, and use the same
> construct for both, removing all the duplicaiton and the need for
> pg_dump to output it?

OK, I think optionally allowing WITH OPTIONS keywords would be nice.

So in lieu of this patch, I propose a patch that modifies gram.y to allow
WITH OPTIONS to specified.

>> 0003: Fix a bug in pg_dump's --binary-upgrade code that caused ALTER TABLE
>>   INHERIT to be emitted to attach a partition instead of ALTER TABLE
>>   ATTACH PARTITION
> 
> Well, worse, it was dumping out both, the INHERITs and the ATTACH
> PARTITION.

Oops, you're right.  Actually meant to say that.

> Given that we're now setting numParents for partitions, I
> would think we'd just move the if (tbinfo->partitionOf) branches up
> under the if (numParents > 0) branches, which I think would also help us
> catch additional issues, like the fact that a binary-upgrade with
> partitions in a different namespace from the partitioned table would
> fail because the ATTACH PARTITION code doesn't account for it:
> 
> appendPQExpBuffer(q, "ALTER TABLE ONLY %s ",
>   fmtId(tbinfo->partitionOf->dobj.name));
> appendPQExpBuffer(q, "ATTACH PARTITION %s %s;\n",
>   fmtId(tbinfo->dobj.name),
>   tbinfo->partitiondef);
> 
> unlike the existing inheritance code a few lines above, which does:
> 
> appendPQExpBuffer(q, "ALTER TABLE ONLY %s INHERIT ",
>   fmtId(tbinfo->dobj.name));
> if (parentRel->dobj.namespace != tbinfo->dobj.namespace)
> appendPQExpBuffer(q, "%s.",
> fmtId(parentRel->dobj.namespace->dobj.name));
> appendPQExpBuffer(q, "%s;\n",
>   fmtId(parentRel->dobj.name));

That's a good point.  I put both cases under the if (numParents > 0) block
and appropriate text is emitted depending on whether the table is a
partition or plain inheritance child.

>> 0004: Change the way pg_dump retrieves partitioning info (removes a lot
>>   of unnecessary code and instead makes partitioning case use the old
>>   inheritance code, etc.)
> 
> This looks pretty good, at least on first blush, probably in part
> because it's mostly removing code.  The CASE statement in the
> getTables() query isn't needed, once pg_get_partkeydef() has been
> changed to not throw an ERROR when passed a non-partitioned table OID.

Oh yes, fixed.

I put the patches in different order this time so that the refactoring
patch appears before the --binary-upgrade bug-fix patch (0003 and 0004
have reversed their roles.)  Also, 0002 is no longer a pg_dump fix.

0001: Improve test coverage of partition constraints (non-inherited ones)

0002: Allow partition columns to optionally include WITH OPTIONS keywords

0003: Change the way pg_dump retrieves partitioning info (removes a lot
  of unnecessary code and instead makes partitioning case use the old
  inheritance code, etc.)

0004: Fix a bug in pg_dump's --binary-upgrade code that caused ALTER TABLE
  INHERIT to be emitted to attach a partition in addition to of ALTER
  TABLE ATTACH PARTITION and that no schema-name was emitted where it
  should 

[HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

2017-04-27 Thread Dmitriy Sarafannikov
Hi hackers,I'd like to propose to search min and max value in index with SnapshotAny in get_actual_variable_range function.Current implementation scans index with SnapshotDirty which accepts uncommitted rows and rejects dead rows.In a code there is a comment about this:  /*   * In principle, we should scan the index with our current   * active snapshot, which is the best approximation we've got   * to what the query will see when executed.  But that won't   * be exact if a new snap is taken before running the query,   * and it can be very expensive if a lot of uncommitted rows   * exist at the end of the index (because we'll laboriously   * fetch each one and reject it).  What seems like a good   * compromise is to use SnapshotDirty.  That will accept   * uncommitted rows, and thus avoid fetching multiple heap   * tuples in this scenario.  On the other hand, it will reject   * known-dead rows, and thus not give a bogus answer when the   * extreme value has been deleted; that case motivates not   * using SnapshotAny here.   */But if we delete many rows from beginning or end of index, it would bevery expensive too because we will fetch each dead row and reject it.Following sequence can be used to reproduce this issue:psql -c "DROP DATABASE test_polygon";psql -c "CREATE DATABASE test_polygon";psql test_polygon -c "CREATE EXTENSION postgis";psql test_polygon -f /tmp/data.sql;psql test_polygon -c "ANALYZE";# \d polygon_table  Table "public.polygon_table" Column   |   Type   | Modifiers---+--+id    | integer  | not null default nextval('polygon_table_id_seq'::regclass)time  | timestamp with time zone | not nullpoly  | geometry(Polygon,4326)   | not nullsecond_id | integer  | not nullIndexes:   "polygon_table_pkey" PRIMARY KEY, btree (id)   "polygon_table_b179ed4a" btree (second_id)   "polygon_table_poly_id" gist (poly)   "polygon_table_time" btree ("time")Foreign-key constraints:   "second_table_id" FOREIGN KEY (second_id) REFERENCES second_table(id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED1st session:pgbench test_polygon -P 1 -R 6000 -c 24 -j 8 -T 1000 -n -f /tmp/bad_request2nd session:psql test_polygon -c "DELETE FROM polygon_table WHERE time <= '2017-03-30 16:00:00+03'"After delete we have many dead rows in the beginning of index "polygon_table_time" (time).pgbench output:progress: 1.0 s, 6023.8 tps, lat 1.170 ms stddev 1.022, lag 0.157 msprogress: 2.0 s, 6023.8 tps, lat 1.045 ms stddev 0.182, lag 0.076 msprogress: 3.0 s, 5957.0 tps, lat 1.046 ms stddev 0.176, lag 0.071 msprogress: 4.0 s, 6066.9 tps, lat 1.061 ms stddev 0.184, lag 0.072 msprogress: 5.0 s, 6178.1 tps, lat 1.060 ms stddev 0.189, lag 0.076 msprogress: 6.0 s, 6079.0 tps, lat 1.075 ms stddev 0.195, lag 0.075 msprogress: 7.0 s, 6246.0 tps, lat 1.069 ms stddev 0.194, lag 0.076 msprogress: 8.0 s, 6046.0 tps, lat 1.050 ms stddev 0.181, lag 0.073 msprogress: 9.0 s, 1255.0 tps, lat 79.114 ms stddev 189.686, lag 63.194 msprogress: 10.0 s, 4696.0 tps, lat 1015.294 ms stddev 36.291, lag 1009.983 msprogress: 11.0 s, 6031.0 tps, lat 1001.354 ms stddev 59.379, lag 997.375 msprogress: 12.0 s, 6013.0 tps, lat 961.725 ms stddev 104.536, lag 957.736 msprogress: 13.0 s, 6098.0 tps, lat 936.516 ms stddev 140.039, lag 932.580 msprogress: 14.0 s, 6032.0 tps, lat 935.867 ms stddev 137.761, lag 931.892 msprogress: 15.0 s, 5975.0 tps, lat 950.911 ms stddev 153.438, lag 946.895 msprogress: 16.0 s, 6044.0 tps, lat 953.380 ms stddev 146.601, lag 949.413 msprogress: 17.0 s, 6105.0 tps, lat 956.524 ms stddev 134.940, lag 952.593 msprogress: 18.0 s, 6097.0 tps, lat 950.913 ms stddev 135.902, lag 946.980 msprogress: 19.0 s, 6004.9 tps, lat 933.010 ms stddev 142.037, lag 929.014 msprogress: 20.0 s, 6078.1 tps, lat 920.415 ms stddev 157.117, lag 916.469 msprogress: 21.0 s, 5402.0 tps, lat 945.490 ms stddev 145.262, lag 941.048 msprogress: 22.0 s, 5226.0 tps, lat 1082.013 ms stddev 141.718, lag 1077.423 msprogress: 23.0 s, 12794.1 tps, lat 479.046 ms stddev 434.510, lag 478.106 msprogress: 24.0 s, 5914.8 tps, lat 0.604 ms stddev 0.075, lag 0.067 msprogress: 25.0 s, 5994.0 tps, lat 0.596 ms stddev 0.071, lag 0.066 msprogress: 26.0 s, 6126.9 tps, lat 0.598 ms stddev 0.072, lag 0.067 msprogress: 27.0 s, 6076.2 tps, lat 0.601 ms stddev 0.072, lag 0.068 msprogress: 28.0 s, 6035.0 tps, lat 0.608 ms stddev 0.077, lag 0.068 msAfter delete (9s) latency increases significantly for up to 1000ms until autovacuum comes and performs index cleanup (23s).From EXPLAIN ANALYZE we could see, that we have significantly increased Planning time:# explain (analyze, verbose, timing, buffers) SELECT * FROM polygon_table polygon INNER JOIN second_table second ON 

Re: [HACKERS] [PostgreSQL 10] default of hot_standby should be "on"?

2017-04-27 Thread Michael Paquier
On Thu, Apr 27, 2017 at 4:33 PM, Masahiko Sawada  wrote:
> On Thu, Apr 27, 2017 at 1:58 PM, Huong Dangminh
>  wrote:
>>> On Thu, Apr 27, 2017 at 11:48 AM, Masahiko Sawada 
>>> wrote:
>>> > Thank you for updating the patch. Also maybe we can update line in
>>> > PostgresNode.pm where hot_standby is set to on explicitly.
>>>
>>> I would refrain from doing that, having some parameters listed in the
>>> tests makes the intention behind those perl routines clear.
>
> Hmm, you've got a point. But when we changed the default values
> related to replication we dropped some explicitly settings from the
> regression test code.

Looking at the patch. This is fine:
-  # Change a setting and restart
-  $node->append_conf('postgresql.conf', 'hot_standby = on');
-  $node->restart();

But not that:
 print $conf "wal_log_hints = on\n";
-print $conf "hot_standby = on\n";
 print $conf "max_connections = 10\n";

This is a minor point though.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-04-27 Thread Ashutosh Bapat
On Wed, Apr 26, 2017 at 9:30 PM, Robert Haas  wrote:
> On Mon, Apr 24, 2017 at 7:06 AM, Ashutosh Bapat
>  wrote:
>> This assumes that datums in partition bounds have one to one mapping
>> with the partitions, which isn't true for list partitions. For list
>> partitions we have multiple datums corresponding to the items listed
>> associated with a given partition. So, simply looping over the
>> partitions of outer relations doesn't work; in fact there are two
>> outer relations for a full outer join, so we have to loop over both of
>> them together in a merge join fashion.
>
> Maybe so, but my point is that it can be done with the original types,
> without converting anything to a different type.
>

Theoretically, I agree with this. But practically the implementation
is lot more complex than what you have described in the earlier mails.
I am afraid, that the patch with those changes will be a lot harder to
review and commit. Later in this mail, I will try to explain some of
the complexities.

>
> I'm going to say this one more time: I really, really, really think
> you need to avoid trying to convert the partition bounds to a common
> type.  I said before that the infrastructure to do that is not present
> in our type system, and I'm pretty sure that statement is 100%
> correct.  The fact that you can find other cases where we do something
> sorta like that but in a different case with different requirements
> doesn't make that false.

Ok. Thanks for the explanation.

The current design and implementation is for a restricted case where
the partition bounds, partition key types and numbers match exactly.
We want to commit an implementation which is reasonably extensible and
doesn't require a lot of changes when we add more capabilities. Some
of the extensions we discussed are as follows:
1. Partition-wise join when the existing partitions have matching
bounds/lists but there can be extra partitions on either side of the
join (between base relations or join relations) without a matching
partition on the other side.\
2. Partition-wise join when the partition bounds/lists do not match
exactly but there is 1:1 or 1:0 or 0:1 mapping between the partitions
which can contribute to the final result. E.g. A (0-100, 100 - 150,
200-300), B (0-50, 125-200, 300-400)
3. Partition-wise join when the partition key types do not match, but
there's a single opfamily being used for partitioning.
4. Partition-wise join where 1:m or m:n mapping exists between
partitions of the joining relations.


First one is clearly something that we will need. We may add it in the
first commit or next commit, but it will be needed pretty soon (v11?).
To me 2nd is more important than the 3rd one. You may have a different
view. We will expect 3rd optimization to work with all the prior
optimizations. I am restricting myself from thinking about 4th one
since that requires ganging together multiple RelOptInfos as a single
RelOptInfo while joining, something we don't have infrastruture for.

In case of first goal, supporting INNER joins and OUTER joins where
the partitions are missing on the OUTER side but not inner side are
easier. In those cases we just drop those partitions and corresponding
bounds/lists from the join. For a FULL OUTER join, where both sides
act as OUTER as well as INNER, we will need exact mapping between the
partitions. For supporting OUTER joins where partitions on the INNER
sides can be missing, we need to create some "dummy" relations
representing the missing partitions so that we have OUTER rows with
NULL inner side. This requires giving those dummy relations some
relids and thus in case of base relations we may need to inject some
dummy children. This may mean that we have to expand simple_rel_array
as part of outer join, may or may not require adding new
AppendRelInfos and so on. We are basically breaking an assumption that
base relations can not be introduced while planning joins and that
might require some rework in the existing infrastructure. There might
be other ways to introduce dummy relations during join planning, but I
haven't really thought through the problem.

The third goal requires that the partition bounds be compared based on
the partition keys present in the equi-join. While matching the
partitions to be joined, the partition bounds corresponding the base
relation whose partition keys appear in the equi-join are used for
comparison using support function corresponding to the data types of
partition keys. This requires us to associate the partitions of a join
with the bounds of base relation. E.g. A(A1, A2) with bounds (X1, X3)
(notice missing X2), B (B1, B2) bounds (X1, X2), C (C1, C2, C3) bounds
(X1, X2, X3) and the join is A LJ B on A.a = B.b LJ C on B.b = C.c
assuming strict operators this can be executed as (AB)C or A(BC). AB
will have partitions A1B1, A2B3 since there is no matching bound of A
for B2 and A is outer relation. A1B1 is associated 

Re: [HACKERS] [PostgreSQL 10] default of hot_standby should be "on"?

2017-04-27 Thread Masahiko Sawada
On Thu, Apr 27, 2017 at 1:58 PM, Huong Dangminh
 wrote:
>> On Thu, Apr 27, 2017 at 11:48 AM, Masahiko Sawada 
>> wrote:
>> > Thank you for updating the patch. Also maybe we can update line in
>> > PostgresNode.pm where hot_standby is set to on explicitly.
>>
>> I would refrain from doing that, having some parameters listed in the
>> tests makes the intention behind those perl routines clear.

Hmm, you've got a point. But when we changed the default values
related to replication we dropped some explicitly settings from the
regression test code.

>
> Thanks, attached patch update PostgresNode.pm file.
> I also did the regression test and found no problem.

Looks good to me.

>
> # sorry Sawada-san, Michael-san, because of security restriction
> # we could not mail to gmail address from our environment.

No problem!

> ---
> Thanks and best regards,
> Dang Minh Huong
> NEC Solution Innovators, Ltd.
> http://www.nec-solutioninnovators.co.jp/en/
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-04-27 Thread Masahiko Sawada
On Fri, Apr 21, 2017 at 6:24 AM, Claudio Freire  wrote:
> On Wed, Apr 12, 2017 at 4:35 PM, Robert Haas  wrote:
>> On Tue, Apr 11, 2017 at 4:38 PM, Claudio Freire  
>> wrote:
>>> In essence, the patch as it is proposed, doesn't *need* a binary
>>> search, because the segment list can only grow up to 15 segments at
>>> its biggest, and that's a size small enough that linear search will
>>> outperform (or at least perform as well as) binary search. Reducing
>>> the initial segment size wouldn't change that. If the 12GB limit is
>>> lifted, or the maximum segment size reduced (from 1GB to 128MB for
>>> example), however, that would change.
>>>
>>> I'd be more in favor of lifting the 12GB limit than of reducing the
>>> maximum segment size, for the reasons above. Raising the 12GB limit
>>> has concrete and readily apparent benefits, whereas using bigger (or
>>> smaller) segments is far more debatable. Yes, that will need a binary
>>> search. But, I was hoping that could be a second (or third) patch, to
>>> keep things simple, and benefits measurable.
>>
>> To me, it seems a bit short-sighted to say, OK, let's use a linear
>> search because there's this 12GB limit so we can limit ourselves to 15
>> segments.  Because somebody will want to remove that 12GB limit, and
>> then we'll have to revisit the whole thing anyway.  I think, anyway.
>
> Ok, attached an updated patch that implements the binary search
>
>> What's not clear to me is how sensitive the performance of vacuum is
>> to the number of cycles used here.  For a large index, the number of
>> searches will presumably be quite large, so it does seem worth
>> worrying about performance.  But if we just always used a binary
>> search, would that lose enough performance with small numbers of
>> segments that anyone would care?  If so, maybe we need to use linear
>> search for small numbers of segments and switch to binary search with
>> larger numbers of segments.
>
> I just went and tested.
>
> I implemented the hybrid binary search attached, and ran a few tests
> with and without the sequential code enabled, at small scales.
>
> The difference is statistically significant, but small (less than 3%).
> With proper optimization of the binary search, however, the difference
> flips:
>
> claudiofreire@klaumpp:~/src/postgresql.vacuum> fgrep shufp80
> fullbinary.s100.times
> vacuum_bench_s100.1.shufp80.log:CPU: user: 6.20 s, system: 1.42 s,
> elapsed: 18.34 s.
> vacuum_bench_s100.2.shufp80.log:CPU: user: 6.44 s, system: 1.40 s,
> elapsed: 19.75 s.
> vacuum_bench_s100.3.shufp80.log:CPU: user: 6.28 s, system: 1.41 s,
> elapsed: 18.48 s.
> vacuum_bench_s100.4.shufp80.log:CPU: user: 6.39 s, system: 1.51 s,
> elapsed: 20.60 s.
> vacuum_bench_s100.5.shufp80.log:CPU: user: 6.26 s, system: 1.42 s,
> elapsed: 19.16 s.
>
> claudiofreire@klaumpp:~/src/postgresql.vacuum> fgrep shufp80
> hybridbinary.s100.times
> vacuum_bench_s100.1.shufp80.log:CPU: user: 6.49 s, system: 1.39 s,
> elapsed: 19.15 s.
> vacuum_bench_s100.2.shufp80.log:CPU: user: 6.36 s, system: 1.33 s,
> elapsed: 18.40 s.
> vacuum_bench_s100.3.shufp80.log:CPU: user: 6.36 s, system: 1.31 s,
> elapsed: 18.87 s.
> vacuum_bench_s100.4.shufp80.log:CPU: user: 6.59 s, system: 1.35 s,
> elapsed: 26.43 s.
> vacuum_bench_s100.5.shufp80.log:CPU: user: 6.54 s, system: 1.28 s,
> elapsed: 20.02 s.
>
> That's after inlining the compare on both the linear and sequential
> code, and it seems it lets the compiler optimize the binary search to
> the point where it outperforms the sequential search.
>
> That's not the case when the compare isn't inlined.
>
> That seems in line with [1], that show the impact of various
> optimizations on both algorithms. It's clearly a close enough race
> that optimizations play a huge role.
>
> Since we're not likely to go and implement SSE2-optimized versions, I
> believe I'll leave the binary search only. That's the attached patch
> set.
>
> I'm running the full test suite, but that takes a very long while.
> I'll post the results when they're done.
>
> [1] https://schani.wordpress.com/2010/04/30/linear-vs-binary-search/

Thank you for updating the patch.

I've read this patch again and here are some review comments.

+ * Lookup in that structure proceeds sequentially in the list of segments,
+ * and with a binary search within each segment. Since segment's size grows
+ * exponentially, this retains O(log N) lookup complexity (2 log N to be
+ * precise).

IIUC we now do binary search even over the list of segments.

-

We often fetch a particular dead tuple segment. How about providing a
macro for easier understanding?
For example,

 #define GetDeadTuplsSegment(lvrelstats, seg) \
  (&(lvrelstats)->dead_tuples.dt_segments[(seg)])

-

+   if (vacrelstats->dead_tuples.num_segs == 0)
+   return;
+

+   /* If uninitialized, we have no tuples to delete from the indexes */
+   if 

Re: [HACKERS] scram and \password

2017-04-27 Thread Heikki Linnakangas

On 04/26/2017 07:57 PM, Tom Lane wrote:

Robert Haas  writes:

On Wed, Apr 26, 2017 at 6:22 AM, Heikki Linnakangas  wrote:

* If algorithm is not given explicitly, PQencryptPasswordConn() queries
"SHOW password_encryption", and uses that. That is documented, and it is
also documented that it will *not* issue queries, and hence will not block,
if the algorithm is given explicitly. That's an important property for some
applications. If you want the default behavior without blocking, query "SHOW
password_encryption" yourself, and pass the result as the 'algorithm'.



TBH, I'd just require the user to specify the algorithm explicitly.
Having it run SHOW on the server seems wonky.  It introduces a bunch
of failure modes for ... no real benefit, I think.


Yeah.  Blocking is the least of your worries --- what about being in
a failed transaction, for instance?


Well, the "ALTER USER" command that you will surely run next, using the 
same connection, would fail anyway. I don't think running a query is a 
problem, as long as it's documented, and there's a documented way to 
avoid it.


You could argue, that since we need to document how to avoid the query 
and the blocking, we might as well always require the application to run 
the "show password_encryption" query before calling 
PQencryptPasswordConn(). But I'd like to offer the convenience for the 
majority of applications that don't mind blocking.



However, it's not entirely true that there's no real benefit.  If the
client app has to specify the algorithm then client code will need
extension every time we add another algorithm.  Maybe that's going to
happen so seldom that it's not a big deal, but it would be nice to
avoid that.


Yeah, I'd like to be prepared. Hopefully we don't need to add another 
algorithm any time soon, but maybe we will, or maybe we want to add an 
option for the SCRAM iteration count, for example.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers