Re: [HACKERS] walsender termination error messages worse in v10

2017-06-07 Thread Noah Misch
On Fri, Jun 02, 2017 at 11:51:26AM -0700, Andres Freund wrote:
> commit 7c4f52409a8c7d85ed169bbbc1f6092274d03920
> Author: Peter Eisentraut 
> Date:   2017-03-23 08:36:36 -0400
> 
> Logical replication support for initial data copy
> 
> made walreceiver emit worse messages in v10 than before when the master
> gets shut down.  Before 10 you'll get something like:
> 
> 2017-06-02 11:46:07.173 PDT [16143][] LOG:  replication terminated by primary 
> server
> 2017-06-02 11:46:07.173 PDT [16143][] DETAIL:  End of WAL reached on timeline 
> 1 at 0/1B7FB8C8.
> 2017-06-02 11:46:07.173 PDT [16143][] FATAL:  could not send end-of-streaming 
> message to primary: no COPY in progress
> 
> the last bit is a bit superflous, but it still kinda makes sense.  Now
> you get:
> 2017-06-02 11:47:09.362 PDT [17061][] FATAL:  unexpected result after 
> CommandComplete: server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Peter,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] walsender & parallelism

2017-06-07 Thread Noah Misch
On Fri, Jun 02, 2017 at 11:06:29PM -0700, Andres Freund wrote:
> On 2017-06-02 22:12:46 -0700, Noah Misch wrote:
> > On Fri, Jun 02, 2017 at 11:27:55PM -0400, Peter Eisentraut wrote:
> > > On 5/31/17 23:54, Peter Eisentraut wrote:
> > > > On 5/29/17 22:01, Noah Misch wrote:
> > > >> On Tue, May 23, 2017 at 01:45:59PM -0400, Andres Freund wrote:
> > > >>> On May 23, 2017 1:42:41 PM EDT, Petr Jelinek 
> > > >>>  wrote:
> > >  Hi,
> > > 
> > >  so this didn't really move anywhere AFAICS, do we think the approach
> > >  I've chosen is good or do we want to do something else here?
> > > >>>
> > > >>> Can you add it to the open items list?
> > > >>
> > > >> [Action required within three days.  This is a generic notification.]
> > > > 
> > > > I have posted an update.  The next update will be Friday.
> > > 
> > > Andres is now working on this.  Let's give him a bit of time. ;-)
> > 
> > If you intended this as your soon-due status update, it is missing a 
> > mandatory
> > bit.
> 
> I'm now owning this item.  I've posted patches, await review.  If none
> were to be forthcoming, I'll do another pass Monday, and commit.

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-06-07 Thread Mengxing Liu


> From: "Kevin Grittner" 
>  wrote:
> 
> > "vmstat 1" output is as follow. Because I used only 30 cores (1/4 of all),  
> > cpu user time should be about 12*4 = 48.
> > There seems to be no process blocked by IO.
> >
> > procs ---memory-- ---swap-- -io -system-- 
> > --cpu-
> >  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id 
> > wa st
> > 28  0  0 981177024 315036 7084376000 0 900  1  
> > 0 99  0  0
> > 21  1  0 981178176 315036 7084378400 0 0 25482 329020 
> > 12  3 85  0  0
> > 18  1  0 981179200 315036 7084379200 0 0 26569 323596 
> > 12  3 85  0  0
> > 17  0  0 981175424 315036 7084380800 0 0 25374 322992 
> > 12  4 85  0  0
> > 12  0  0 981174208 315036 7084382400 0 0 24775 321577 
> > 12  3 85  0  0
> >  8  0  0 981179328 315036 7084533600 0 0 13115 199020  
> > 6  2 92  0  0
> > 13  0  0 981179200 315036 7084579200 0 0 22893 301373 
> > 11  3 87  0  0
> > 11  0  0 981179712 315036 7084580800 0 0 26933 325728 
> > 12  4 84  0  0
> > 30  0  0 981178304 315036 7084582400 0 0 23691 315821 
> > 11  4 85  0  0
> > 12  1  0 981177600 315036 7084583200 0 0 29485 320166 
> > 12  4 84  0  0
> > 32  0  0 981180032 315036 7084584800 0 0 25946 316724 
> > 12  4 84  0  0
> > 21  0  0 981176384 315036 7084586400 0 0 24227 321938 
> > 12  4 84  0  0
> > 21  0  0 981178880 315036 7084588000 0 0 25174 326943 
> > 13  4 83  0  0
> 
> This machine has 120 cores?  Is hyperthreading enabled?  If so, what
> you are showing might represent a total saturation of the 30 cores.
> Context switches of about 300,000 per second is pretty high.  I can't
> think of when I've seen that except when there is high spinlock
> contention.
> 

Yes, and the hyper-threading is closed. 

> Just to put the above in context, how did you limit the test to 30
> cores?  How many connections were open during the test?
> 

I used numactl to limit the test in the first two sockets (15 cores in each 
socket)
And there are 90 concurrent connections. 

> > The flame graph is attached. I use 'perf' to generate the flame graph. Only 
> > the CPUs running PG server are profiled.
> > I'm not familiar with other part of PG. Can you find anything unusual in 
> > the graph?
> 
> Two SSI functions stand out:
> 10.86% PredicateLockTuple
>  3.51% CheckForSerializableConflictIn
> 
> In both cases, most of that seems to go to lightweight locking.  Since
> you said this is a CPU graph, that again suggests spinlock contention
> issues.
> 
> -- 

Yes. Is there any other kind of locks besides spinlock? I'm reading locks in PG 
now. If all locks are spinlock, the CPU should be used 100%. But now only 50% 
CPU are used. 
I'm afraid there are extra time waiting for mutex or semaphore.
These SSI functions will cost more time than reported, because perf doesn't 
record the time sleeping & waiting for locks. 
CheckForSerializableConflictIn takes 10% of running time. (refer to my last 
email) 

--
Mengxing Liu










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Race conditions with WAL sender PID lookups

2017-06-07 Thread Michael Paquier
v

On Thu, Jun 8, 2017 at 3:31 AM, Robert Haas  wrote:
> I think if you're going to fix it so that we take spinlocks on
> MyWalSnd in a bunch of places that we didn't take them before, it
> would make sense to fix all the places where we're accessing those
> fields without a spinlock instead of only some of them, unless there
> are good reasons why we only need it in some cases and not others, in
> which case I think the patch should add comments to each place where
> the lock was not taken explaining why it's OK.  It didn't take me and
> my copy of vi very long to find instances that you didn't change.

I take that you are referring to the two lookups in
WalSndWaitForWal(), one in exec_replication_command(), one in
WalSndLoop() and one in WalSndDone(). First let's remember that all
the fields protected by the spinlock are only updated in a WAL sender
process, so reading them directly is safe in this context: we know
that no other processes will write them. And all those functions are
called only in the context of a WAL sender process. So the copies of
MyWalSnd that you are referring to here don't need a spinlock. Is
there something I am missing?

Actually, perhaps it would make sense to add some Assert(am_walsender)
in this code?

> I also think that should provide some analysis regarding the question
> Thomas asked - are these actually bugs, or are they OK for some
> reason?  We shouldn't just plaster the code with spinlock acquisitions
> unless it's really necessary.  It seems at least possible that these
> changes could cause performance regressions, and it doesn't look like
> you've done any testing or provided any analysis indicating whether
> that's likely to happen or not.

Now taking a step back on the patch, v3 that I sent 3 weeks ago.
Additional spinlocks are taken in:
1) SyncRepReleaseWaiters(), which is called in a WAL sender. So indeed
no spinlocks are needed here. The last patch is incorrect here.
2) SyncRepGetSyncStandbysPriority() and SyncRepGetSyncStandbysQuorum()
as of HEAD via SyncRepGetSyncStandbys(), something that's not good if
done lock-less as this gets used for pg_stat_replication for
pg_stat_get_wal_senders(). This can return a garbage list of sync
standbys to the client as WAL senders update their flush, write and
apply positions in parallel to that, and don't care about SyncRepLock
as this gets calculated with the standby feedback messages. The
problem here is the gap between walsnd->sync_standby_priority and
walsnd->flush:
-- With a primary, and more or more sync standbys, take a checkpoint
in SyncRepGetSyncStandbysPriority() after the first check is passed.
-- Shutdown the standby and restart it.
-- The standby reconnects, initializes the WAL sender slot with
InitWalSenderSlot().
-- Let the process of SyncRepGetSyncStandbysPriority() continue, the
sync standby does not show up.
That would be really unlikely to happen, but the code is written in
such a way that it could happen. One could also say that this gives a
closer idea that the standby disconnected but it looks wrong to me to
do this value lookup without a required lock.
3) In walreceiver.c's pg_stat_get_wal_receiver's:
- Launch pg_stat_get_wal_receiver and take a checkpoint on it.
- Pass the lookups of pid and ready_to_display.
- Make the WAL receiver stop.
- The view reports inconsistent data. If the WAL receiver data was
just initialized, the caller would get back PID values or similar 0.
Particularly a WAL receiver marked with !ready_to_display could show
data to the caller. That's not cool.
4) KeepFileRestoredFromArchive() needs a lock, as this is called from
the startup process. That's harmless, the worst that can happen is
that needreload is switched to true for a process that has been marked
with a PID of 0 because of a WAL sender restart (slot taken again and
initialized). But that will just process an extra reload in a worst
case.

> Basically, I don't think this patch is ready to commit.  We have
> neither evidence that it is necessary nor evidence that it is
> complete.  I don't want to be unfair, here, but it seems to me you
> ought to do a little more legwork before throwing this over the wall
> to the pool of committers.

Fair enough. Is the analysis written above more convincing?
-- 
Michael


walsnd-pid-races-v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Does pg_upgrade really support "make installcheck"?

2017-06-07 Thread Tom Lane
src/bin/pg_upgrade/TESTING claims (much further down in the file
than I'd like):

The shell script test.sh in this directory performs more or less this
procedure.  You can invoke it by running
make check
or by running
make installcheck
if "make install" (or "make install-world") were done beforehand.

However, the second alternative doesn't really work:

$ make installcheck
make: Nothing to be done for `installcheck'.
$

Did this ever work, or could it easily be made to work?
If not, we need to fix that documentation.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix a typo in snapmgr.c

2017-06-07 Thread Masahiko Sawada
On Thu, Jun 8, 2017 at 2:17 AM, Andres Freund  wrote:
> On 2017-05-08 09:12:13 -0400, Tom Lane wrote:
>> Simon Riggs  writes:
>> > So rearranged code a little to keep it lean.
>>
>> Didn't you break it with that?  As it now stands, the memcpy will
>> copy the nonzero value.
>
> I've not seen a fix and/or alleviating comment about this so far.  Did I
> miss something?
>

I think we don't have the fix for the comment from Tom so far, too.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 10 changes in exclusion constraints - did something change? CASE WHEN behavior oddity

2017-06-07 Thread Tom Lane
"Regina Obe"  writes:
> I'm not a fan of either solution, but I think what Tom proposes of throwing
> an error sounds like least invasive and confusing.

> I'd much prefer an error thrown than silent behavior change. Given that we
> ran into this in 3 places in PostGIS code, I'm not convinced the issue is
> all that rare.

> Make sure to point out the breaking change in the release notes though and
> syntax to remedy it.

As far as that goes, the best fix I could think of after a few minutes is
to integrate your conditional logic into a custom set-returning function.
For example,

select x, case when y > 0 then generate_series(1, z) else 5 end from tt;

could become

create function mysrf(cond bool, start int, fin int, els int)
  returns setof int as $$
begin
  if cond then
return query select generate_series(start, fin);
  else
return query select els;
  end if;
end$$ language plpgsql;

select x, mysrf(y > 0, 1, z, 5) from tt;

(adjust the amount of parameterization to taste, of course)

Now, the fact that a fairly mechanical conversion like this is possible
suggests that we *could* solve the problem if we had to, at least for
simple cases like this one.  But it'd be a lot of work, not least because
we'd presumably not want core-defined syntax to depend on an extension
like plpgsql --- and I don't see a way to do this with straight SQL
functions.  So my feeling is that we should not expend that effort.
If it turns out that a lot more people are affected than I currently
think will be the case, maybe we'll have to revisit that choice.

But this line of thinking does strengthen my feeling that throwing an
error is the right thing to do for the moment.  If we allow v10 to accept
such cases but do something different from what we used to, that will
greatly complicate any future attempt to try to restore the old behavior.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notes on testing Postgres 10b1

2017-06-07 Thread Petr Jelinek
On 08/06/17 03:50, Josh Berkus wrote:
> On 06/07/2017 06:25 PM, Petr Jelinek wrote:
>> On 08/06/17 03:19, Josh Berkus wrote:
>>>
>>> Peter and Petr:
>>>
>>> On 06/07/2017 05:24 PM, Peter Eisentraut wrote:
 On 6/7/17 01:01, Josh Berkus wrote:
> * Having defaults on the various _workers all devolve from max_workers
> is also great.

 I'm not aware of anything like that happening.

> P1. On the publishing node, logical replication relies on the *implied*
> correspondence of the application_name and the replication_slot both
> being named the same as the publication in order to associate a
> particular publication with a particular replication connection.
> However, there's absolutely nothing preventing me from also creating a
> binary replication connection by the same name  It really seems like we
> need a field in pg_stat_replication or pg_replication_slots which lists
> the publication.

 I'm not quite sure what you are getting at here.  The application_name
 seen on the publisher side is the subscription name.  You can create a
 binary replication connection using the same application_name, but
 that's already been possible before.  But the publications don't care
 about any of this.
>>>
>>> My point is that there is no system view where I can see, on the origin
>>> node, what subscribers are subscribing to which publications.  You can
>>> kinda guess that from pg_stat_replication etc., but it's not dependable
>>> information.
>>>
>>
>> That's like wanting the foreign server to show you which foreign tables
>> exist on the local server. This is not a tightly coupled system and you
>> are able to setup both sides without them being connected to each other
>> at the time of setup, so there is no way publisher can know anything.
> 
> Why wouldn't the publisher know who's connected once the replication
> connection as been made and the subscription has started?  Or is it just
> a log position, and the publisher really has no idea how many
> publications are being consumed?
> 

Plugin knows while the connection exists, but that's the thing, it goes
through pluggable interface (that can be used by other plugins, without
publications) so there would have to be some abstracted way for plugins
to give some extra information for the pg_stat_replication or similar
view. I am afraid it's bit too late to design something like that in
PG10 cycle.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notes on testing Postgres 10b1

2017-06-07 Thread Josh Berkus
On 06/07/2017 06:25 PM, Petr Jelinek wrote:
> On 08/06/17 03:19, Josh Berkus wrote:
>>
>> Peter and Petr:
>>
>> On 06/07/2017 05:24 PM, Peter Eisentraut wrote:
>>> On 6/7/17 01:01, Josh Berkus wrote:
 * Having defaults on the various _workers all devolve from max_workers
 is also great.
>>>
>>> I'm not aware of anything like that happening.
>>>
 P1. On the publishing node, logical replication relies on the *implied*
 correspondence of the application_name and the replication_slot both
 being named the same as the publication in order to associate a
 particular publication with a particular replication connection.
 However, there's absolutely nothing preventing me from also creating a
 binary replication connection by the same name  It really seems like we
 need a field in pg_stat_replication or pg_replication_slots which lists
 the publication.
>>>
>>> I'm not quite sure what you are getting at here.  The application_name
>>> seen on the publisher side is the subscription name.  You can create a
>>> binary replication connection using the same application_name, but
>>> that's already been possible before.  But the publications don't care
>>> about any of this.
>>
>> My point is that there is no system view where I can see, on the origin
>> node, what subscribers are subscribing to which publications.  You can
>> kinda guess that from pg_stat_replication etc., but it's not dependable
>> information.
>>
> 
> That's like wanting the foreign server to show you which foreign tables
> exist on the local server. This is not a tightly coupled system and you
> are able to setup both sides without them being connected to each other
> at the time of setup, so there is no way publisher can know anything.

Why wouldn't the publisher know who's connected once the replication
connection as been made and the subscription has started?  Or is it just
a log position, and the publisher really has no idea how many
publications are being consumed?


-- 
Josh Berkus
Containers & Databases Oh My!


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP: allow overriding PostgresNode in get_new_node

2017-06-07 Thread Chapman Flack
On 06/02/17 15:51, Chapman Flack wrote:
> But what it buys you is then if your MyExtraPGNode has PostgresNode
> as a base, the familiar idiom
> 
>   MyExtraPGNode->get_new_node('foo');
> 
> works, as it inserts the class as the first argument.
> 
> As a bonus, you then don't need to complicate get_new_node
> with a test for (not ($node->isa("PostgresNode"))) because
> if it weren't, it wouldn't have inherited get_new_node

Any takers if I propose this amendment in the form of a patch?

Relying on the perl idiom instead of a $node->isa() test shortens
the patch; does that ameliorate at all the concern about complicating
core for the benefit of modules?

I'm not fully persuaded that just re-blessing a PostgresNode suffices
as a workaround ... if the subclass expects to have additional state
set up by its own constructor, the deception seems likely to be exposed.
So I think there are more than just cosmetic grounds for allowing this.

-Chap
--- src/test/perl/PostgresNode.pm	2017-06-07 20:15:52.827639829 -0400
+++ src/test/perl/PostgresNode.pm	2017-06-07 20:57:49.205145761 -0400
@@ -853,20 +853,27 @@
 
 =pod
 
-=item get_new_node(node_name)
+=item get_new_node(node_name) I PostgresNode->get_new_node(node_name)
 
-Build a new PostgresNode object, assigning a free port number. Standalone
-function that's automatically imported.
+Build a new PostgresNode object, assigning a free port number. This can be
+called either as a standalone function that's automatically imported, or as
+a class method on PostgresNode or any subclass.
 
 Remembers the node, to prevent its port number from being reused for another
 node, and to ensure that it gets shut down when the test script exits.
 
 You should generally use this instead of PostgresNode::new(...).
 
+If you have a subclass of PostgresNode you want created, use the class-method
+form of the call, as in Cget_new_node(node_name)>.
+The standalone-function form creates an instance of PostgresNode.
+
 =cut
 
 sub get_new_node
 {
+	my $class = 'PostgresNode';
+	$class = shift if 1 < scalar @_;
 	my $name  = shift;
 	my $found = 0;
 	my $port  = $last_port_assigned;
@@ -911,7 +918,7 @@
 	print "# Found free port $port\n";
 
 	# Lock port number found by creating a new node
-	my $node = new PostgresNode($name, $test_pghost, $port);
+	my $node = $class->new($name, $test_pghost, $port);
 
 	# Add node to list of nodes
 	push(@all_nodes, $node);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notes on testing Postgres 10b1

2017-06-07 Thread Peter Eisentraut
On 6/7/17 21:19, Josh Berkus wrote:
> The user's first thought is going to be a network issue, or a bug, or
> some other problem, not a missing PK.  Yeah, they can find that
> information in the logs, but only if they think to look for it in the
> first place, and in some environments (AWS, containers, etc.) logs can
> be very hard to access.

You're not going to get very far with using this feature if you are not
looking in the logs for errors.  These are asynchronously operating
background workers, so the only way they can communicate problems is
through the log.

I don't disagree with your general premise.  We have done a fair amount
of fiddling already to show some errors as early as possible.  But we
can't know all of them, and we shouldn't give the impression that we do.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notes on testing Postgres 10b1

2017-06-07 Thread Petr Jelinek
On 08/06/17 03:19, Josh Berkus wrote:
> 
> Peter and Petr:
> 
> On 06/07/2017 05:24 PM, Peter Eisentraut wrote:
>> On 6/7/17 01:01, Josh Berkus wrote:
>>> * Having defaults on the various _workers all devolve from max_workers
>>> is also great.
>>
>> I'm not aware of anything like that happening.
>>
>>> P1. On the publishing node, logical replication relies on the *implied*
>>> correspondence of the application_name and the replication_slot both
>>> being named the same as the publication in order to associate a
>>> particular publication with a particular replication connection.
>>> However, there's absolutely nothing preventing me from also creating a
>>> binary replication connection by the same name  It really seems like we
>>> need a field in pg_stat_replication or pg_replication_slots which lists
>>> the publication.
>>
>> I'm not quite sure what you are getting at here.  The application_name
>> seen on the publisher side is the subscription name.  You can create a
>> binary replication connection using the same application_name, but
>> that's already been possible before.  But the publications don't care
>> about any of this.
> 
> My point is that there is no system view where I can see, on the origin
> node, what subscribers are subscribing to which publications.  You can
> kinda guess that from pg_stat_replication etc., but it's not dependable
> information.
> 

That's like wanting the foreign server to show you which foreign tables
exist on the local server. This is not a tightly coupled system and you
are able to setup both sides without them being connected to each other
at the time of setup, so there is no way publisher can know anything.

> 
>>> P2: If I create a subscription on a table with no primary key, I do not
>>> recieve a warning.  There should be a warning, since in most cases such
>>> a subscription will not work.  I suggest the text:
>>>
>>> "logical replication target relation "public.fines" has no primary key.
>>> Either create one, or set REPLICA IDENTITY index and set the published
>>> relation to REPLICA IDENTITY FULL."
>>
>> At that point, we don't know what is being published.  If only inserts
>> are being published or REPLICA IDENTITY FULL is set, then it will work.
>> We don't want to give warnings about things that might not be true.
>>
>> More guidance on some of the potential failure cases would be good, but
>> it would need more refinement.
> 
> Hmmm, yah, I see.  Let me explain why this is a UX issue as-is though:
> 
> 1. User forgets to create a PK on the subscriber node.
> 
> 2. User starts a subscription to the tables.
> 
> 3. Subscription is successful.
> 
> 4. First update hits the publisher node.
> 
> 5. Subscription fails and disconnects.
> 
> The user's first thought is going to be a network issue, or a bug, or
> some other problem, not a missing PK.  Yeah, they can find that
> information in the logs, but only if they think to look for it in the
> first place, and in some environments (AWS, containers, etc.) logs can
> be very hard to access.
> 
> We really need the subscription to fail at step (2), not wait for the
> first update to fail.  And if it doesn't fail at step 2, then we should
> at least give a warning.
> 

Yes, I actually mentioned somewhere at some point that we should call
the checks we call during the replication also from the appropriate DDL
commands when possible (the information might not be available when the
DDL is executed), but never got to actually implementing it.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notes on testing Postgres 10b1

2017-06-07 Thread Josh Berkus

Peter and Petr:

On 06/07/2017 05:24 PM, Peter Eisentraut wrote:
> On 6/7/17 01:01, Josh Berkus wrote:
>> * Having defaults on the various _workers all devolve from max_workers
>> is also great.
> 
> I'm not aware of anything like that happening.
> 
>> P1. On the publishing node, logical replication relies on the *implied*
>> correspondence of the application_name and the replication_slot both
>> being named the same as the publication in order to associate a
>> particular publication with a particular replication connection.
>> However, there's absolutely nothing preventing me from also creating a
>> binary replication connection by the same name  It really seems like we
>> need a field in pg_stat_replication or pg_replication_slots which lists
>> the publication.
> 
> I'm not quite sure what you are getting at here.  The application_name
> seen on the publisher side is the subscription name.  You can create a
> binary replication connection using the same application_name, but
> that's already been possible before.  But the publications don't care
> about any of this.

My point is that there is no system view where I can see, on the origin
node, what subscribers are subscribing to which publications.  You can
kinda guess that from pg_stat_replication etc., but it's not dependable
information.


>> P2: If I create a subscription on a table with no primary key, I do not
>> recieve a warning.  There should be a warning, since in most cases such
>> a subscription will not work.  I suggest the text:
>>
>> "logical replication target relation "public.fines" has no primary key.
>> Either create one, or set REPLICA IDENTITY index and set the published
>> relation to REPLICA IDENTITY FULL."
> 
> At that point, we don't know what is being published.  If only inserts
> are being published or REPLICA IDENTITY FULL is set, then it will work.
> We don't want to give warnings about things that might not be true.
> 
> More guidance on some of the potential failure cases would be good, but
> it would need more refinement.

Hmmm, yah, I see.  Let me explain why this is a UX issue as-is though:

1. User forgets to create a PK on the subscriber node.

2. User starts a subscription to the tables.

3. Subscription is successful.

4. First update hits the publisher node.

5. Subscription fails and disconnects.

The user's first thought is going to be a network issue, or a bug, or
some other problem, not a missing PK.  Yeah, they can find that
information in the logs, but only if they think to look for it in the
first place, and in some environments (AWS, containers, etc.) logs can
be very hard to access.

We really need the subscription to fail at step (2), not wait for the
first update to fail.  And if it doesn't fail at step 2, then we should
at least give a warning.

-- 
Josh Berkus
Containers & Databases Oh My!


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use of snapshot in logical replication

2017-06-07 Thread Andres Freund
On 2017-06-08 03:14:55 +0200, Petr Jelinek wrote:
> On 08/06/17 03:08, Craig Ringer wrote:
> > On 7 June 2017 at 18:16, sanyam jain  wrote:
> >> Hi,
> >>
> >> Can someone explain the usage of exporting snapshot when a logical
> >> replication slot is created?
> > 
> > It's used to pg_dump the schema at a consistent point in history where
> > all xacts are known to be in the snapshot (and thus dumped) or known
> > not to be (so they'll be streamed out on the slot).
> > 
> 
> Not just schema, it can also be used to get existing data at consistent
> point in history so that changes will follow without gaps or duplicates.
> 
> That being said, the built-in logical replication isn't using the
> exported snapshots at all.

Just because it uses them in the same session doesn't really change the
picture, that's more or less just an optimization.  It's essentially
just a cheaper version of importing a snapshot that'd have to be
exported in same session.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use of snapshot in logical replication

2017-06-07 Thread Petr Jelinek
On 08/06/17 03:08, Craig Ringer wrote:
> On 7 June 2017 at 18:16, sanyam jain  wrote:
>> Hi,
>>
>> Can someone explain the usage of exporting snapshot when a logical
>> replication slot is created?
> 
> It's used to pg_dump the schema at a consistent point in history where
> all xacts are known to be in the snapshot (and thus dumped) or known
> not to be (so they'll be streamed out on the slot).
> 

Not just schema, it can also be used to get existing data at consistent
point in history so that changes will follow without gaps or duplicates.

That being said, the built-in logical replication isn't using the
exported snapshots at all.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notes on testing Postgres 10b1

2017-06-07 Thread Petr Jelinek
Hi,

On 07/06/17 07:01, Josh Berkus wrote:
> Folks,
> 
> I've put together some demos on PostgreSQL 10beta1.  Here's a few
> feedback notes based on my experience with it.
> [...snip...]
> 
> Problems
> 
> 
> P1. On the publishing node, logical replication relies on the *implied*
> correspondence of the application_name and the replication_slot both
> being named the same as the publication in order to associate a
> particular publication with a particular replication connection.
> However, there's absolutely nothing preventing me from also creating a
> binary replication connection by the same name  It really seems like we
> need a field in pg_stat_replication or pg_replication_slots which lists
> the publication.
> 

What do you mean implied correspondence of application_name and the
replication_slot? We only use subscription_name as default value for
those when user does not specify something else, all three of those can
have different value if user sets it up that way. And there is no
correspondence whatsoever to names of publications. The upstream only
knows which publications to replicate because subscription gives list of
requested publications as option to START_REPLICATION walsender command.
The list of publications associated with a subscription are only stored
on the subscriber and publisher has no idea what those are.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tap tests on older branches fail if concurrency is used

2017-06-07 Thread Craig Ringer
On 7 June 2017 at 13:39, Michael Paquier  wrote:
> On Thu, Jun 1, 2017 at 10:48 PM, Tom Lane  wrote:
>> Andres Freund  writes:
>>> when using
>>> $ cat ~/.proverc
>>> -j9
>>> some tests fail for me in 9.4 and 9.5.
>>
>> Weren't there fixes specifically intended to make that safe, awhile ago?
>
> 60f826c has not been back-patched. While this would fix parallel runs
> with make's --jobs, PROVE_FLAGS="-j X" would still fail.

Ah, that's why I didn't find it.

I think applying Michael's patch makes sense now, and if we decide to
backpatch PostgresNode (and I get the time to do it) we can clobber
that fix quite happily with the full backport. Thanks Michael for the
workaround.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use of snapshot in logical replication

2017-06-07 Thread Craig Ringer
On 7 June 2017 at 18:16, sanyam jain  wrote:
> Hi,
>
> Can someone explain the usage of exporting snapshot when a logical
> replication slot is created?

It's used to pg_dump the schema at a consistent point in history where
all xacts are known to be in the snapshot (and thus dumped) or known
not to be (so they'll be streamed out on the slot).

See snapbuild.c etc.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication - possible remaining problem

2017-06-07 Thread Petr Jelinek
Hi,

On 07/06/17 22:49, Erik Rijkers wrote:
> I am not sure whether what I found here amounts to a bug, I might be
> doing something dumb.
> 
> During the last few months I did tests by running pgbench over logical
> replication.  Earlier emails have details.
> 
> The basic form of that now works well (and the fix has been comitted)
> but as I looked over my testing program I noticed one change I made to
> it, already many weeks ago:
> 
> In the cleanup during startup (pre-flight check you might say) and also
> before the end, instead of
> 
>   echo "delete from pg_subscription;" | psql -qXp $port2 -- (1)
> 
> I changed that (as I say, many weeks ago) to:
> 
>   echo "delete from pg_subscription;
> delete from pg_subscription_rel;
> delete from pg_replication_origin; " | psql -qXp $port2   -- (2)
> 
> This occurs (2x) inside the bash function clean_pubsub(), in main test
> script pgbench_detail2.sh
> 
> This change was an effort to ensure to arrive at a 'clean' start (and
> end-) state which would always be the same.
> 
> All my more recent testing (and that of Mark, I have to assume) was thus
> done with (2).
> 
> Now, looking at the script again I am thinking that it would be
> reasonable to expect that after issuing
>delete from pg_subscription;
> 
> the other 2 tables are /also/ cleaned, automatically, as a consequence. 
> (Is this reasonable? this is really the main question of this email).
> 

Hmm, they are not cleaned automatically, deleting from system catalogs
manually like this never propagates to related tables, we don't use FKs
there.

> So I removed the latter two delete statements again, and ran the tests
> again with the form in  (1)
> 
> I have established that (after a number of successful cycles) the test
> stops succeeding with in the replica log repetitions of:
> 
> 2017-06-07 22:10:29.057 CEST [2421] LOG:  logical replication apply
> worker for subscription "sub1" has started
> 2017-06-07 22:10:29.057 CEST [2421] ERROR:  could not find free
> replication state slot for replication origin with OID 11
> 2017-06-07 22:10:29.057 CEST [2421] HINT:  Increase
> max_replication_slots and try again.
> 2017-06-07 22:10:29.058 CEST [2061] LOG:  worker process: logical
> replication worker for subscription 29235 (PID 2421) exited with exit
> code 1
> 
> when I manually 'clean up' by doing:
>delete from pg_replication_origin;
> 

Yeah because you consumed all the origins (I am still not huge fan of
how that limit works, but that's separate discussion).

> then, and only then, does the session finish and succeed ('replica ok').
> 
> So to me it looks as if there is an omission of
> pg_replication_origin-cleanup when pg_description is deleted.
> 

There is no omission, origin is not supposed to be deleted automatically
unless you use DROP SUBSCRIPTION.


-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notes on testing Postgres 10b1

2017-06-07 Thread Peter Eisentraut
On 6/7/17 01:01, Josh Berkus wrote:
> * Having defaults on the various _workers all devolve from max_workers
> is also great.

I'm not aware of anything like that happening.

> P1. On the publishing node, logical replication relies on the *implied*
> correspondence of the application_name and the replication_slot both
> being named the same as the publication in order to associate a
> particular publication with a particular replication connection.
> However, there's absolutely nothing preventing me from also creating a
> binary replication connection by the same name  It really seems like we
> need a field in pg_stat_replication or pg_replication_slots which lists
> the publication.

I'm not quite sure what you are getting at here.  The application_name
seen on the publisher side is the subscription name.  You can create a
binary replication connection using the same application_name, but
that's already been possible before.  But the publications don't care
about any of this.

> P2: If I create a subscription on a table with no primary key, I do not
> recieve a warning.  There should be a warning, since in most cases such
> a subscription will not work.  I suggest the text:
> 
> "logical replication target relation "public.fines" has no primary key.
> Either create one, or set REPLICA IDENTITY index and set the published
> relation to REPLICA IDENTITY FULL."

At that point, we don't know what is being published.  If only inserts
are being published or REPLICA IDENTITY FULL is set, then it will work.
We don't want to give warnings about things that might not be true.

More guidance on some of the potential failure cases would be good, but
it would need more refinement.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is ECPG's SET CONNECTION really not thread-aware?

2017-06-07 Thread Tsunakawa, Takayuki
Dear Meskes,

From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Meskes
> Done.

Thanks, I confirmed the commit messages.


> My standard workflow is to wait a couple days to see if everything works
> nicely before backporting. Obviously this doesn't make any sense for a doc
> patch, sigh.

I see.  That's a good idea.

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BEFORE trigger can cause undetected partition constraint violation

2017-06-07 Thread Amit Langote
On 2017/06/08 2:07, Robert Haas wrote:
> On Wed, Jun 7, 2017 at 1:23 AM, Amit Langote
>  wrote:
>> On 2017/06/07 11:57, Amit Langote wrote:
>>> How about we export ExecPartitionCheck() out of execMain.c and call it
>>> just before ExecFindPartition() using the root table's ResultRelInfo?
>>
>> Turns out there wasn't a need to export ExecPartitionCheck after all.
>> Instead of calling it from execModifyTable.c and copy.c, it's better to
>> call it at the beginning of ExecFindPartition() itself.  That way, there
>> is no need to add the same code both in CopyFrom() and ExecInsert(), nor
>> is there need to make ExecPartitionCheck() public.  That's how the patch
>> attached with the previous email does it anyway.
> 
> Cool.  I think this is a sensible approach, and have committed the patch.

Thanks a lot.

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-07 Thread Thomas Munro
On Thu, Jun 8, 2017 at 11:05 AM, Thomas Munro
 wrote:
> 1.  Keep the current behaviour. [...]
>
> 2.  Make a code change that would split the 'new table' tuplestore in
> two: an insert tuplestore and an update tuplestore (for new images;
> old images could remain in the old tuplestore that is also used for
> deletes) as Peter suggests.  That raises two questions for me: (1)
> Does it really make sense for the 'when' clause, which sounds like it
> only controls when we fire a trigger, also to affect which transition
> tuples it sees?  There is something slightly fishy about that.  (2)
> Assuming yes, should an AFTER INSERT OR UPDATE trigger see the union
> of the the two tuplestores?  Trigger authors would need to be aware a
> single statement can produce a mixture of updates and inserts, but
> only if they explicitly invite such problems into their lives with
> that incantation.

A third option would be for an AFTER INSERT OR UPDATE trigger to be
invoked twice, once for the inserts and once again for the updates.
No union required, but also surprising.

Any other ideas?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-07 Thread Thomas Munro
On Thu, Jun 8, 2017 at 10:21 AM, Kevin Grittner  wrote:
> On Wed, Jun 7, 2017 at 5:00 PM, Peter Geoghegan  wrote:
>
>> My assumption about how transition tables ought to behave here is
>> based on the simple fact that we already fire both AFTER
>> statement-level triggers, plus my sense of aesthetics, or bias. I
>> admit that I might be missing the point, but if I am it would be
>> useful to know how.
>
> Well, either will work.  My inclination is that a single statement
> should cause one execution of the FOR EACH STATEMENT trigger, but if
> a good case can be made that we should have a FOR EACH STATEMENT
> trigger fire for each clause within a statement -- well, it won't be
> a problem for matview maintenance.  The biggest hurt there would be
> to *my* sense of aesthetics.  ;-)

Ok, I think we have two options here:

1.  Keep the current behaviour.  ON CONFLICT statements cause inserted
and updated tuples to be mixed together in the same 'new table'
tuplestore.  Trigger authors will need to be aware that AFTER INSERT
triggers might see some rows that were actually updated, and vice
versa, which seems moderately surprising.  If you defined one AFTER
INSERT trigger and one AFTER UPDATE trigger, you'll see the same rows
in both if someone runs such a statement (as I showed upthread), which
also seems to violate the PoLA.

2.  Make a code change that would split the 'new table' tuplestore in
two: an insert tuplestore and an update tuplestore (for new images;
old images could remain in the old tuplestore that is also used for
deletes) as Peter suggests.  That raises two questions for me: (1)
Does it really make sense for the 'when' clause, which sounds like it
only controls when we fire a trigger, also to affect which transition
tuples it sees?  There is something slightly fishy about that.  (2)
Assuming yes, should an AFTER INSERT OR UPDATE trigger see the union
of the the two tuplestores?  Trigger authors would need to be aware a
single statement can produce a mixture of updates and inserts, but
only if they explicitly invite such problems into their lives with
that incantation.

I think the code change for 2 is not enormous.  Split the tuplestore
in two, route tuples appropriately, and teach NamedTuplestoreScan to
scan both if a union is needed.  It does sound like the less
user-hostile of the options.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HACKERS[PROPOSAL] split ProcArrayLock into multiple parts

2017-06-07 Thread Sokolov Yura

Good day Robert, Jim, and everyone.

On 2017-06-08 00:06, Jim Van Fleet wrote:

Robert Haas  wrote on 06/07/2017 12:12:02 PM:


> OK -- would love the feedback and any suggestions on how to

mitigate the low

> end problems.

Did you intend to attach a patch?

Yes I do -- tomorrow or Thursday -- needs a little cleaning up ...


> Sokolov Yura has a patch which, to me, looks good for pgbench rw
> performance.  Does not do so well with hammerdb (about the same as

base) on

> single socket and two socket.

Any idea why?  I think we will have to understand *why* certain

things

help in some situations and not others, not just *that* they do, in
order to come up with a good solution to this problem.


My patch improves acquiring contended/blocking LWLock on NUMA cause:
a. patched procedure generates much lesser writes, especially because
  taking WaitListLock is unified with acquiring the lock itself.
  Access to modified memory is very expensive on NUMA, so less writes
  leads to less wasted time.
b. it spins several time on lock->state in attempts to acquire lock
  before starting attempts to queue self to wait list. It is really the
  cause of some speedup. Without spinning patch just removes
  degradation on contention.
  I don't know why spinning doesn't improves single socket performance
  though :-) Probably still because all algorithmic overhead (system
  calls, sleeping and awakening process) is not too expensive until
  NUMA is involved.


Looking at the data now -- LWLockAquire philosophy is different -- at
first glance I would have guessed "about the same" as the base, but I
can not yet explain why we have super pgbench rw performance and "the
same" hammerdb performance


My patch improves only blocking contention, ie when a lot of EXCLUSIVE
locks are involved. pgbench rw generates a lot of write traffic, so
there is a lot of contention and waiting on WALInsertLocks (in
XLogInsertRecord, and waiting in XLogFlush), WalWriteLock (in
XLogFlush), CLogControlLock (in TransactionIdSetTreeStatus).

The case when SHARED lock is much more common than EXCLUSIVE is not
affected by patch, because SHARED is acquired then on the fast path
in both original and patched version.

So, looks like hammerdb doesn't produce much EXCLUSIVE contention on
LWLocks, so it is not improved with the patch.

Splitting ProcArrayLock helps with acquiring SHARED lock on NUMA in
absence of EXCLUSIVE lock because of the same reason why my patch
improves acquiring of blocking lock: less writes to same memory.
Since every process writes to some one part of ProcArrayLock, there
is a lot less writes to each part of ProcArrayLock, so acquiring
SHARED lock pays lesser for accessing to modified memory on NUMA.

Probably I'm mistaken somewhere.





--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-07 Thread Peter Geoghegan
On Wed, Jun 7, 2017 at 3:00 PM, Peter Geoghegan  wrote:
> My assumption would be that since you have as many as two
> statement-level triggers firing that could reference transition tables
> when ON CONFLICT DO UPDATE is used (one AFTER UPDATE statement level
> trigger, and another AFTER INSERT statement level trigger), there'd be
> separation at that level. You'd see updated tuples with one, and
> inserted within the other. While INSERT ON CONFLICT DO UPDATE is
> essentially one statement, it behaves as two statements in certain
> limited ways.

BTW, as you probably already know INSERT ON CONFLICT DO UPDATE is
unable to affect the same row version a second time -- if we go on to
update a row that the same upsert statement itself actually originally
inserted, then a cardinality violation error is raised (recall that
upsert does not actually use MVCC semantics to detect conflicting
tuples). This differs from the UPDATE situation (I should really say
the UPDATE ... FROM situation), where the implementation has always
silently not updated the row a second time (to avoid the halloween
problem).

This cardinality violation restriction is a useful one for us to have
imposed, because there is now no question about the same row appearing
more than once. We know that a given row (any tuple from a single
update chain) cannot appear once for an INSERT statement level trigger
firing (appearing in the transition table there), while also making a
second appearance for the UPDATE statement level trigger.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-07 Thread Kevin Grittner
On Wed, Jun 7, 2017 at 5:00 PM, Peter Geoghegan  wrote:

> My assumption about how transition tables ought to behave here is
> based on the simple fact that we already fire both AFTER
> statement-level triggers, plus my sense of aesthetics, or bias. I
> admit that I might be missing the point, but if I am it would be
> useful to know how.

Well, either will work.  My inclination is that a single statement
should cause one execution of the FOR EACH STATEMENT trigger, but if
a good case can be made that we should have a FOR EACH STATEMENT
trigger fire for each clause within a statement -- well, it won't be
a problem for matview maintenance.  The biggest hurt there would be
to *my* sense of aesthetics.  ;-)

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-07 Thread Kevin Grittner
On Wed, Jun 7, 2017 at 4:48 PM, Thomas Munro
 wrote:

> Is there anything about that semantics that is incompatible with the
> incremental matview use case?

Nothing incompatible at all.  If we had separate "new" tables for
UPDATE and DELETE we would logically need to do a "counting"-style
UNION of them just like we will want to do with the OLD (with a
count of -1) and NEW (with a count of 1) to get to a delta now.  So
keeping them separate would be marginally more work for incremental
matview, but not a big deal.

> For example, are the "counting" and
> "DRed" algorithms you've proposed (for non-recursive and recursive
> views) driven entirely by deletions and insertions, so that updates
> look like deletes followed by inserts anyway?

Counting is best done from a "delta" relation which has old and new
combined with opposite counts.  I'm sure that will be fine with
DRed, too.

> If so, I think our
> current treatment of ON CONFLICT DO UPDATE should be fine: you can't
> tell whether the tuples in the new table result from insert or update
> without also consulting the old table, but if the algorithm treats all
> tuples in the old table as deletes (even though in this case they
> result from updates) and all tuples in the new table as inserts (even
> though in this case *some* of them result from updates) anyway then I
> don't think it matters.

I don't think so, either.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication - possible remaining problem

2017-06-07 Thread Erik Rijkers

On 2017-06-07 23:18, Alvaro Herrera wrote:

Erik Rijkers wrote:

Now, looking at the script again I am thinking that it would be 
reasonable

to expect that after issuing
   delete from pg_subscription;

the other 2 tables are /also/ cleaned, automatically, as a 
consequence.  (Is

this reasonable? this is really the main question of this email).


I don't think it's reasonable to expect that the system recovers
automatically from what amounts to catalog corruption.  You should be
using the DDL that removes subscriptions instead.


You're right, that makes sense.
Thanks.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-07 Thread Peter Geoghegan
On Wed, Jun 7, 2017 at 2:19 PM, Kevin Grittner  wrote:
> The idea of transition tables is that you see all changes to the
> target of a single statement in the form of delta relations -- with
> and "old" table for any rows affected by a delete or update and a
> "new" table for any rows affected by an update or delete.  If we
> have a single statement that combines INSERT and UPDATE behaviors,
> it might make sense to have an "old" table for updates and a single
> "new" table for both.

My assumption would be that since you have as many as two
statement-level triggers firing that could reference transition tables
when ON CONFLICT DO UPDATE is used (one AFTER UPDATE statement level
trigger, and another AFTER INSERT statement level trigger), there'd be
separation at that level. You'd see updated tuples with one, and
inserted within the other. While INSERT ON CONFLICT DO UPDATE is
essentially one statement, it behaves as two statements in certain
limited ways. IIRC MERGE as implemented in other systems has severe
restrictions on things like row level triggers work (i.e. they simply
don't work), and so they don't provide much in the way of guidance.

My assumption about how transition tables ought to behave here is
based on the simple fact that we already fire both AFTER
statement-level triggers, plus my sense of aesthetics, or bias. I
admit that I might be missing the point, but if I am it would be
useful to know how.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-07 Thread Thomas Munro
On Thu, Jun 8, 2017 at 9:19 AM, Kevin Grittner  wrote:
> On Wed, Jun 7, 2017 at 3:42 AM, Thomas Munro
>  wrote:
>> On Wed, Jun 7, 2017 at 7:27 PM, Thomas Munro
>>  wrote:
>>> On Wed, Jun 7, 2017 at 10:47 AM, Peter Geoghegan  wrote:
 I suppose you'll need two tuplestores for the ON CONFLICT DO UPDATE
 case -- one for updated tuples, and the other for inserted tuples.
>>>
>>> Hmm.  Right.  INSERT ... ON CONFLICT DO UPDATE causes both AFTER
>>> INSERT and AFTER UPDATE statement-level triggers to be fired, but then
>>> both kinds see all the tuples -- those that were inserted and those
>>> that were updated.  That's not right.
>>
>> Or maybe it is right.
>
> The idea of transition tables is that you see all changes to the
> target of a single statement in the form of delta relations -- with
> and "old" table for any rows affected by a delete or update and a
> "new" table for any rows affected by an update or delete.  If we
> have a single statement that combines INSERT and UPDATE behaviors,
> it might make sense to have an "old" table for updates and a single
> "new" table for both.

That would be cool because that's the behaviour we have.

Is there anything about that semantics that is incompatible with the
incremental matview use case?  For example, are the "counting" and
"DRed" algorithms you've proposed (for non-recursive and recursive
views) driven entirely by deletions and insertions, so that updates
look like deletes followed by inserts anyway?  If so, I think our
current treatment of ON CONFLICT DO UPDATE should be fine: you can't
tell whether the tuples in the new table result from insert or update
without also consulting the old table, but if the algorithm treats all
tuples in the old table as deletes (even though in this case they
result from updates) and all tuples in the new table as inserts (even
though in this case *some* of them result from updates) anyway then I
don't think it matters.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-07 Thread Kevin Grittner
On Tue, Jun 6, 2017 at 4:42 PM, Robert Haas  wrote:

> So, are you willing and able to put any effort into this, like say
> reviewing the patch Thomas posted, and if so when and how much?  If
> you're just done and you aren't going to put any more work into
> maintaining it (for whatever reasons), then I think you should say so
> straight out.  People shouldn't have to guess whether you're going to
> maintain your patch or not.

It has become clear that the scope of problems being found now
exceed what I can be sure of being able to fix in time to make for a
stable release, in spite of the heroic efforts Thomas has been
putting in.  I had hoped to get this into the first or second CF of
this release, same with the release before, and same with the
release before that.  At least landing it in the final CF drew the
level of review and testing needed to polish it, but it's far from
ideal timing (or procedure).  I can revert from v10 and deal with
all of this for the first CF of some future release, but if someone
feels they can deal with it in v10 I'll stand back and offer what
help I can.

You mentioned blame earlier.  That seems pointless to me.  I'm
looking to see what works and what doesn't.  When I went to develop
SSI it was because my employer needed it, and they asked what it
would take to get that done; I said one year of my time without
being drawn off for other tasks to get it to a point where they
would be better off using it than not, and then two years half time
work to address community concerns and get it committed, and follow
up on problems.  They gave me that and it worked, and worked well.
In this case, resources to carry it through were not reserved when I
started, and when I became full-up with other things, then problems
surfaced.  That doesn't work.  I don't want to start something big
again until I have resources set up and reserved, as a priority, to
see it through.  It's a matter of what works for both me and the
community and what doesn't.

In the meantime I'll try to keep to small enough issues that the
resources required to support what I do is not beyond what I can
deliver.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-07 Thread Kevin Grittner
On Wed, Jun 7, 2017 at 3:42 AM, Thomas Munro
 wrote:
> On Wed, Jun 7, 2017 at 7:27 PM, Thomas Munro
>  wrote:
>> On Wed, Jun 7, 2017 at 10:47 AM, Peter Geoghegan  wrote:
>>> I suppose you'll need two tuplestores for the ON CONFLICT DO UPDATE
>>> case -- one for updated tuples, and the other for inserted tuples.
>>
>> Hmm.  Right.  INSERT ... ON CONFLICT DO UPDATE causes both AFTER
>> INSERT and AFTER UPDATE statement-level triggers to be fired, but then
>> both kinds see all the tuples -- those that were inserted and those
>> that were updated.  That's not right.
>
> Or maybe it is right.

The idea of transition tables is that you see all changes to the
target of a single statement in the form of delta relations -- with
and "old" table for any rows affected by a delete or update and a
"new" table for any rows affected by an update or delete.  If we
have a single statement that combines INSERT and UPDATE behaviors,
it might make sense to have an "old" table for updates and a single
"new" table for both.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication - possible remaining problem

2017-06-07 Thread Alvaro Herrera
Erik Rijkers wrote:

> Now, looking at the script again I am thinking that it would be reasonable
> to expect that after issuing
>delete from pg_subscription;
> 
> the other 2 tables are /also/ cleaned, automatically, as a consequence.  (Is
> this reasonable? this is really the main question of this email).

I don't think it's reasonable to expect that the system recovers
automatically from what amounts to catalog corruption.  You should be
using the DDL that removes subscriptions instead.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HACKERS[PROPOSAL] split ProcArrayLock into multiple parts

2017-06-07 Thread Jim Van Fleet
Robert Haas  wrote on 06/07/2017 12:12:02 PM:


> > OK -- would love the feedback and any suggestions on how to mitigate 
the low
> > end problems.
> 
> Did you intend to attach a patch?
Yes I do -- tomorrow or Thursday -- needs a little cleaning up ...

> > Sokolov Yura has a patch which, to me, looks good for pgbench rw
> > performance.  Does not do so well with hammerdb (about the same as 
base) on
> > single socket and two socket.
> 
> Any idea why?  I think we will have to understand *why* certain things
> help in some situations and not others, not just *that* they do, in
> order to come up with a good solution to this problem.
Looking at the data now -- LWLockAquire philosophy is different -- at 
first glance I would have guessed "about the same" as the base, but I can 
not yet explain why we have super pgbench rw performance and "the same" 
hammerdb performance. 

> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
> 




Re: [HACKERS] PostgreSQL 10 changes in exclusion constraints - did something change? CASE WHEN behavior oddity

2017-06-07 Thread Regina Obe

> After chewing on this for awhile, I'm starting to come to the conclusion
that we'd be best off to throw an error for SRF-inside-CASE (or COALESCE).
Mark is correct that the simplest case of

>   SELECT x, CASE WHEN y THEN generate_series(1,z) ELSE 5 END
>   FROM table_with_columns_x_and_y_and_z;

> behaves just intuitively enough that people might be using it.  The new
implementation method cannot reasonably duplicate the old semantics for
that, which means that if we let it stand as-is we will be 

> silently breaking queries, even if we fix up some of the weirder corner
cases like what happens when the CASE can be const-simplified.  So I think
we'd be better off to make this throw an error, and force any 
> affected users to rewrite in a way that will work in both v10 and older
releases.

> As to *how* to throw an error, I think it should be possible to teach
parse analysis to detect such cases, with something like the ParseExprKind
mechanism that could be checked to see if we're inside a 
> subexpression that restricts what's allowed.  There are some other checks
like no-nested-aggregates that perhaps could be folded in as well.  Checking
at parse analysis ought to be sufficient because 
> rule rewriting could not introduce such a case where it wasn't before, and
planner subquery flattening won't introduce one either because we don't
flatten subqueries with SRFs in their tlists.

> If people are on board with throwing an error, I'll go see about writing a
patch.

>   regards, tom lane

+1

I'm not a fan of either solution, but I think what Tom proposes of throwing
an error sounds like least invasive and confusing.

I'd much prefer an error thrown than silent behavior change. Given that we
ran into this in 3 places in PostGIS code, I'm not convinced the issue is
all that rare.

Make sure to point out the breaking change in the release notes though and
syntax to remedy it.

Thanks,
Regina 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Race conditions with WAL sender PID lookups

2017-06-07 Thread Erik Rijkers

On 2017-06-07 20:31, Robert Haas wrote:


[...]

[ Side note: Erik's report on this thread initially seemed to suggest
that we needed this patch to make logical decoding stable.  But my
impression is that this is belied by subsequent developments on other
threads, so my theory is that this patch was never really related to
the problem, but rather than by the time Erik got around to testing
this patch, other fixes had made the problems relatively rare, and the
apparently-improved results with this patch were just chance.  If that
theory is wrong, it would be good to hear about it. ]


Yes, agreed; I was probably mistaken.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] logical replication - possible remaining problem

2017-06-07 Thread Erik Rijkers
I am not sure whether what I found here amounts to a bug, I might be 
doing something dumb.


During the last few months I did tests by running pgbench over logical 
replication.  Earlier emails have details.


The basic form of that now works well (and the fix has been comitted) 
but as I looked over my testing program I noticed one change I made to 
it, already many weeks ago:


In the cleanup during startup (pre-flight check you might say) and also 
before the end, instead of


  echo "delete from pg_subscription;" | psql -qXp $port2 -- (1)

I changed that (as I say, many weeks ago) to:

  echo "delete from pg_subscription;
delete from pg_subscription_rel;
delete from pg_replication_origin; " | psql -qXp $port2   -- (2)

This occurs (2x) inside the bash function clean_pubsub(), in main test 
script pgbench_detail2.sh


This change was an effort to ensure to arrive at a 'clean' start (and 
end-) state which would always be the same.


All my more recent testing (and that of Mark, I have to assume) was thus 
done with (2).


Now, looking at the script again I am thinking that it would be 
reasonable to expect that after issuing

   delete from pg_subscription;

the other 2 tables are /also/ cleaned, automatically, as a consequence.  
(Is this reasonable? this is really the main question of this email).


So I removed the latter two delete statements again, and ran the tests 
again with the form in  (1)


I have established that (after a number of successful cycles) the test 
stops succeeding with in the replica log repetitions of:


2017-06-07 22:10:29.057 CEST [2421] LOG:  logical replication apply 
worker for subscription "sub1" has started
2017-06-07 22:10:29.057 CEST [2421] ERROR:  could not find free 
replication state slot for replication origin with OID 11
2017-06-07 22:10:29.057 CEST [2421] HINT:  Increase 
max_replication_slots and try again.
2017-06-07 22:10:29.058 CEST [2061] LOG:  worker process: logical 
replication worker for subscription 29235 (PID 2421) exited with exit 
code 1


when I manually 'clean up' by doing:
   delete from pg_replication_origin;

then, and only then, does the session finish and succeed ('replica ok').

So to me it looks as if there is an omission of 
pg_replication_origin-cleanup when pg_description is deleted.


Does that make sense?  All this is probably vague and I am only posting 
in the hope that Petr (or someone else) perhaps immediately understands 
what goes wrong, with even his limited amount of info.


In the meantime I will try to dig up more detailed info...


thanks,


Erik Rijkers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Alter subscription..SET - NOTICE message is coming for table which is already removed

2017-06-07 Thread Peter Eisentraut
On 5/30/17 13:25, Masahiko Sawada wrote:
> I think this cause is that the relation status entry could be deleted
> by ALTER SUBSCRIPTION REFRESH before corresponding table sync worker
> starting. Attached patch fixes issues reported on this thread so far.

I have committed the part of the patch that changes the
SetSubscriptionRelState() calls.

I think there was a mistake in your patch, in that the calls in
LogicalRepSyncTableStart() used true once and false once.  I think all
the calls in tablesync.c should be the same.

(If you look at the patch again, notice that I have changed the
insert_ok argument to update_only, so true and false are flipped.)

I'm not convinced about the change to the GetSubscriptionRelState()
argument.  In the examples given, no tables are removed from any
publications, so I don't see how the claimed situation can happen.  I
would like to see more reproducible examples.

Right now, if the subscription rel state disappears before the sync
worker starts, the error kills the sync worker, so things should
continue working correctly.  Perhaps the error message isn't the best.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-06-07 Thread Robert Haas
On Wed, Jun 7, 2017 at 5:46 AM, Amit Kapila  wrote:
> As far as I understand, it is to ensure that for deleted rows, nothing
> more needs to be done.  For example, see the below check in
> ExecUpdate/ExecDelete.
> if (!ItemPointerEquals(tupleid, ))
> {
> ..
> }
> ..
>
> Also a similar check in ExecLockRows.  Now for deleted rows, if the
> t_ctid wouldn't point to itself, then in the mentioned functions, we
> were not in a position to conclude that the row is deleted.

Right, so we would have to find all such checks and change them to use
some other method to conclude that the row is deleted.  What method
would we use?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] statement_timeout is not working as expected with postgres_fdw

2017-06-07 Thread Robert Haas
On Wed, Jun 7, 2017 at 11:20 AM, Amit Kapila  wrote:
> On Sat, Jun 3, 2017 at 1:03 AM, Robert Haas  wrote:
>> On Fri, Jun 2, 2017 at 3:48 AM, Rafia Sabih
>>  wrote:
>> I don't see how to do that.  It could possibly be done with the TAP
>> framework, but that exceeds my abilities.
>>
>> Here's an updated patch with a bunch of cosmetic fixes, plus I
>> adjusted things so that do_sql_command() is more interruptible.  I
>> tested it manually and it seems to work OK.  I'll commit and
>> back-patch this version, barring objections or further suggestions for
>> improvement.
>
> No objections from me, updated patch looks good.

OK, done.  Further testing nevertheless appreciated, if anyone feels the urge.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-06-07 Thread Alex K
Hi pgsql-hackers,

Thank you again for all these replies. I have started working under this
project
and learnt a lot of new stuff last month, so here are some new thoughts
about
ERRORS handling in COPY. I decided to stick to the same thread, since it
has a neutral subject.

(1) One of my mentors--Alvaro Herrera--suggested me to have a look on the
UPSERT. It may be a good point to be able to achieve the same functionality
as during the ON CONFLICT DO NOTHING, when COPY actually inserts tuples
and errors handling is turned on. It could additionally reduce number of
failed
subtransactions and reduce XIDs consumption, while still ignoring some
common
errors like unique index violation.

Adding a full support of ON CONFLICT DO NOTHING/UPDATE to COPY seems
to be a large separated task and is out of the current project scope, but
maybe there is
a relatively simple way to somehow perform internally tuples insert with
ON CONFLICT DO NOTHING? I have added Peter Geoghegan to cc, as
I understand he is the major contributor of UPSERT in PostgreSQL. It would
be great
if he will answer this question.

(2) Otherwise, I am still going to use subtransactions via
BeginInternalSubTransaction
and PG_TRY / PG_CATCH with
ReleaseCurrentSubTransaction / RollbackAndReleaseCurrentSubTransaction.
To minimize XIDs consumption I will try to insert tuples in batches and
pre-validate
them as much as possible (as was suggested in the thread before).



Alexey


Re: [HACKERS] PostgreSQL 10 changes in exclusion constraints - did something change? CASE WHEN behavior oddity

2017-06-07 Thread David G. Johnston
On Wed, Jun 7, 2017 at 11:57 AM, Tom Lane  wrote:

> If people are on board with throwing an error, I'll go see about
> writing a patch.
>

+1 from me.

David J.​


Re: [HACKERS] PostgreSQL 10 changes in exclusion constraints - did something change? CASE WHEN behavior oddity

2017-06-07 Thread Tom Lane
Mark Dilger  writes:
>> On Jun 4, 2017, at 2:19 PM, Andres Freund  wrote:
>> Seems very unlikely that we'd ever want to do that.  The right way to do
>> this is to simply move the SRF into the from list.  Having the executor
>> support arbitrary sources of tuples would just complicate and slow down
>> already complicated and slow code...

> In my example, the aggregate function is taking a column from the table as
> an argument, so the output of the aggregate function needs to be computed per 
> row,
> not just once.  And if the function is expensive, or has side-effects, you 
> might
> only want it to execute for those rows where the CASE statement is true, 
> rather
> than for all of them.  You may get that same behavior using lateral or some 
> such,
> I'm uncertain, but in a complicated CASE statement, it be more straightforward
> to write something like:

> SELECT
>   CASE
>   WHEN t.x = 'foo' THEN expensive_aggfunc1(srf1(t.y,t.z))
>   WHEN t.x = 'bar' THEN expensive_aggfunc2(srf2(t.y,t.z))
>   WHEN t.x = 'baz' THEN expensive_aggfunc3(srf3(t.y,t.z))
>   
>   WHEN t.x = 'zzz' THEN expensive_aggfuncN(srfN(t.y,t.z))
>   ELSE 5
>   END
> FROM mytable t;

I think the correct way to do that already exists, namely to use a
correlated sub-select to wrap each SRF+aggregate:

...
WHEN t.x = 'foo' THEN (SELECT expensive_aggfunc1(s) FROM srf1(t.y,t.z) s)
...

I don't really feel a need to invent some other notation for that.

After chewing on this for awhile, I'm starting to come to the conclusion
that we'd be best off to throw an error for SRF-inside-CASE (or
COALESCE).  Mark is correct that the simplest case of

SELECT x, CASE WHEN y THEN generate_series(1,z) ELSE 5 END
FROM table_with_columns_x_and_y_and_z;

behaves just intuitively enough that people might be using it.  The new
implementation method cannot reasonably duplicate the old semantics for
that, which means that if we let it stand as-is we will be silently
breaking queries, even if we fix up some of the weirder corner cases like
what happens when the CASE can be const-simplified.  So I think we'd be
better off to make this throw an error, and force any affected users to
rewrite in a way that will work in both v10 and older releases.

As to *how* to throw an error, I think it should be possible to teach
parse analysis to detect such cases, with something like the
ParseExprKind mechanism that could be checked to see if we're inside
a subexpression that restricts what's allowed.  There are some other
checks like no-nested-aggregates that perhaps could be folded in as
well.  Checking at parse analysis ought to be sufficient because
rule rewriting could not introduce such a case where it wasn't before,
and planner subquery flattening won't introduce one either because we
don't flatten subqueries with SRFs in their tlists.

If people are on board with throwing an error, I'll go see about
writing a patch.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Directory pg_replslot is not properly cleaned

2017-06-07 Thread Fabrízio de Royes Mello
On Wed, Jun 7, 2017 at 3:30 PM, Andres Freund  wrote:
>
>
>
> On June 7, 2017 11:29:28 AM PDT, "Fabrízio de Royes Mello" <
fabriziome...@gmail.com> wrote:
> >On Fri, Jun 2, 2017 at 6:37 PM, Fabrízio de Royes Mello <
> >fabriziome...@gmail.com> wrote:
> >>
> >>
> >> On Fri, Jun 2, 2017 at 6:32 PM, Fabrízio de Royes Mello <
> >fabriziome...@gmail.com> wrote:
> >> >
> >> > Hi all,
> >> >
> >> > This week I faced a out of disk space trouble in 8TB production
> >cluster. During investigation we notice that pg_replslot was the
> >culprit
> >growing more than 1TB in less than 1 (one) hour.
> >> >
> >> > We're using PostgreSQL 9.5.6 with pglogical 1.2.2 replicating to a
> >new
> >9.6 instance and planning the upgrade soon.
> >> >
> >> > What I did? I freed some disk space just to startup PostgreSQL and
> >begin the investigation. During the 'startup recovery' simply the files
> >inside the pg_replslot was tottaly removed. So our trouble with 'out of
> >disk space' disappear. Then the server went up and physical slaves
> >attached
> >normally to master but logical slaves doesn't, staying stalled in
> >'catchup'
> >state.
> >> >
> >> > At this moment the "pg_replslot" directory started growing fast
> >again
> >and forced us to drop the logical replication slot and we lost the
> >logical
> >slave.
> >> >
> >> > Googling awhile I found this thread [1] about a similar issue
> >reported
> >by Dmitriy Sarafannikov and replied by Andres and Álvaro.
> >> >
> >> > I ran the test case provided by Dmitriy [1] against branches:
> >> > - REL9_4_STABLE
> >> > - REL9_5_STABLE
> >> > - REL9_6_STABLE
> >> > - master
> >> >
> >> > After all test the issue remains... and also using the new Logical
> >Replication stuff (CREATE PUB/CREATE SUB). Just after a restart the
> >"pg_replslot" was properly cleaned. The typo in
> >ReorderBufferIterTXNInit
> >complained by Dimitriy was fixed but the issue remains.
> >> >
> >> > Seems no one complain again about this issue and the thread was
> >lost.
> >> >
> >> > The attached is a reworked version of Dimitriy's patch that seems
> >solve
> >the issue. I confess I don't know enough about replication slots code
> >to
> >really know if it's the best solution.
> >> >
> >> > Regards,
> >> >
> >> > [1]
> >
https://www.postgresql.org/message-id/1457621358.355011041%40f382.i.mail.ru
> >> >
> >>
> >> Just adding Dimitriy to conversation... previous email I provided was
> >wrong.
> >>
> >
> >Does anyone have some thought about this critical issue?
> >
>
> I plan to look into it over the next few days.
>

Thanks...


--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


Re: [HACKERS] Race conditions with WAL sender PID lookups

2017-06-07 Thread Robert Haas
On Sat, May 20, 2017 at 8:40 AM, Michael Paquier
 wrote:
> On Fri, May 19, 2017 at 3:01 PM, Masahiko Sawada  
> wrote:
>> Also, as Horiguchi-san pointed out earlier, walreceiver seems need the
>> similar fix.
>
> Actually, now that I look at it, ready_to_display should as well be
> protected by the lock of the WAL receiver, so it is incorrectly placed
> in walreceiver.h. As you are pointing out, pg_stat_get_wal_receiver()
> is lazy as well, and that's new in 10, so we have an open item here
> for both of them. And I am the author for both things. No issues
> spotted in walreceiverfuncs.c after review.
>
> I am adding an open item so as both issues are fixed in PG10. With the
> WAL sender part, I think that this should be a group shot.
>
> So what do you think about the attached?

I think if you're going to fix it so that we take spinlocks on
MyWalSnd in a bunch of places that we didn't take them before, it
would make sense to fix all the places where we're accessing those
fields without a spinlock instead of only some of them, unless there
are good reasons why we only need it in some cases and not others, in
which case I think the patch should add comments to each place where
the lock was not taken explaining why it's OK.  It didn't take me and
my copy of vi very long to find instances that you didn't change.

I also think that should provide some analysis regarding the question
Thomas asked - are these actually bugs, or are they OK for some
reason?  We shouldn't just plaster the code with spinlock acquisitions
unless it's really necessary.  It seems at least possible that these
changes could cause performance regressions, and it doesn't look like
you've done any testing or provided any analysis indicating whether
that's likely to happen or not.

Basically, I don't think this patch is ready to commit.  We have
neither evidence that it is necessary nor evidence that it is
complete.  I don't want to be unfair, here, but it seems to me you
ought to do a little more legwork before throwing this over the wall
to the pool of committers.

[ Side note: Erik's report on this thread initially seemed to suggest
that we needed this patch to make logical decoding stable.  But my
impression is that this is belied by subsequent developments on other
threads, so my theory is that this patch was never really related to
the problem, but rather than by the time Erik got around to testing
this patch, other fixes had made the problems relatively rare, and the
apparently-improved results with this patch were just chance.  If that
theory is wrong, it would be good to hear about it. ]

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Directory pg_replslot is not properly cleaned

2017-06-07 Thread Andres Freund


On June 7, 2017 11:29:28 AM PDT, "Fabrízio de Royes Mello" 
 wrote:
>On Fri, Jun 2, 2017 at 6:37 PM, Fabrízio de Royes Mello <
>fabriziome...@gmail.com> wrote:
>>
>>
>> On Fri, Jun 2, 2017 at 6:32 PM, Fabrízio de Royes Mello <
>fabriziome...@gmail.com> wrote:
>> >
>> > Hi all,
>> >
>> > This week I faced a out of disk space trouble in 8TB production
>cluster. During investigation we notice that pg_replslot was the
>culprit
>growing more than 1TB in less than 1 (one) hour.
>> >
>> > We're using PostgreSQL 9.5.6 with pglogical 1.2.2 replicating to a
>new
>9.6 instance and planning the upgrade soon.
>> >
>> > What I did? I freed some disk space just to startup PostgreSQL and
>begin the investigation. During the 'startup recovery' simply the files
>inside the pg_replslot was tottaly removed. So our trouble with 'out of
>disk space' disappear. Then the server went up and physical slaves
>attached
>normally to master but logical slaves doesn't, staying stalled in
>'catchup'
>state.
>> >
>> > At this moment the "pg_replslot" directory started growing fast
>again
>and forced us to drop the logical replication slot and we lost the
>logical
>slave.
>> >
>> > Googling awhile I found this thread [1] about a similar issue
>reported
>by Dmitriy Sarafannikov and replied by Andres and Álvaro.
>> >
>> > I ran the test case provided by Dmitriy [1] against branches:
>> > - REL9_4_STABLE
>> > - REL9_5_STABLE
>> > - REL9_6_STABLE
>> > - master
>> >
>> > After all test the issue remains... and also using the new Logical
>Replication stuff (CREATE PUB/CREATE SUB). Just after a restart the
>"pg_replslot" was properly cleaned. The typo in
>ReorderBufferIterTXNInit
>complained by Dimitriy was fixed but the issue remains.
>> >
>> > Seems no one complain again about this issue and the thread was
>lost.
>> >
>> > The attached is a reworked version of Dimitriy's patch that seems
>solve
>the issue. I confess I don't know enough about replication slots code
>to
>really know if it's the best solution.
>> >
>> > Regards,
>> >
>> > [1]
>https://www.postgresql.org/message-id/1457621358.355011041%40f382.i.mail.ru
>> >
>>
>> Just adding Dimitriy to conversation... previous email I provided was
>wrong.
>>
>
>Does anyone have some thought about this critical issue?
>

I plan to look into it over the next few days.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Directory pg_replslot is not properly cleaned

2017-06-07 Thread Fabrízio de Royes Mello
On Fri, Jun 2, 2017 at 6:37 PM, Fabrízio de Royes Mello <
fabriziome...@gmail.com> wrote:
>
>
> On Fri, Jun 2, 2017 at 6:32 PM, Fabrízio de Royes Mello <
fabriziome...@gmail.com> wrote:
> >
> > Hi all,
> >
> > This week I faced a out of disk space trouble in 8TB production
cluster. During investigation we notice that pg_replslot was the culprit
growing more than 1TB in less than 1 (one) hour.
> >
> > We're using PostgreSQL 9.5.6 with pglogical 1.2.2 replicating to a new
9.6 instance and planning the upgrade soon.
> >
> > What I did? I freed some disk space just to startup PostgreSQL and
begin the investigation. During the 'startup recovery' simply the files
inside the pg_replslot was tottaly removed. So our trouble with 'out of
disk space' disappear. Then the server went up and physical slaves attached
normally to master but logical slaves doesn't, staying stalled in 'catchup'
state.
> >
> > At this moment the "pg_replslot" directory started growing fast again
and forced us to drop the logical replication slot and we lost the logical
slave.
> >
> > Googling awhile I found this thread [1] about a similar issue reported
by Dmitriy Sarafannikov and replied by Andres and Álvaro.
> >
> > I ran the test case provided by Dmitriy [1] against branches:
> > - REL9_4_STABLE
> > - REL9_5_STABLE
> > - REL9_6_STABLE
> > - master
> >
> > After all test the issue remains... and also using the new Logical
Replication stuff (CREATE PUB/CREATE SUB). Just after a restart the
"pg_replslot" was properly cleaned. The typo in ReorderBufferIterTXNInit
complained by Dimitriy was fixed but the issue remains.
> >
> > Seems no one complain again about this issue and the thread was lost.
> >
> > The attached is a reworked version of Dimitriy's patch that seems solve
the issue. I confess I don't know enough about replication slots code to
really know if it's the best solution.
> >
> > Regards,
> >
> > [1]
https://www.postgresql.org/message-id/1457621358.355011041%40f382.i.mail.ru
> >
>
> Just adding Dimitriy to conversation... previous email I provided was
wrong.
>

Does anyone have some thought about this critical issue?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


Re: [HACKERS] [BUGS] BUG #14682: row level security not work with partitioned table

2017-06-07 Thread Mike Palmiotto
On Wed, Jun 7, 2017 at 9:49 AM, Mike Palmiotto
 wrote:
> One thing that concerns me is the first EXPLAIN plan from regress_rls_dave:
> +EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
> + QUERY PLAN
> +
> + Append
> +   InitPlan 1 (returns $0)
> + ->  Index Scan using uaccount_pkey on uaccount
> +   Index Cond: (pguser = CURRENT_USER)
> +   ->  Seq Scan on part_document_fiction
> + Filter: ((cid <> 55) AND (cid <> 55) AND (cid < 99) AND
> (dlevel <= $0) AND f_leak(dtitle))
> +   ->  Seq Scan on part_document_satire
> + Filter: ((cid <> 55) AND (cid <> 55) AND (cid < 99) AND
> (dlevel <= $0) AND f_leak(dtitle))
> +(8 rows)
>
> I would expect that both part_document_satire (cid == 55) and
> part_document_nonfiction (cid == 99) would be excluded from the
> explain, but only cid < 99 seems to work. Interestingly, when I change
> policy pp1r to cid < 55, I see the following:
>
> +EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
> +QUERY PLAN
> +---
> + Append
> +   InitPlan 1 (returns $0)
> + ->  Index Scan using uaccount_pkey on uaccount
> +   Index Cond: (pguser = CURRENT_USER)
> +   ->  Seq Scan on part_document_fiction
> + Filter: ((cid < 55) AND (cid <> 55) AND (cid < 99) AND
> (dlevel <= $0) AND f_leak(dtitle))
> +(6 rows)
>
> Is this a demonstration of a non-immutable function backing the
> operator and thus not being able to filter it from the planner, or is
> it a bug?

Assuming my digging is correct, there's some other explanation for
this not working as expected...
postgres=> select po.oprname, pp.proname, pp.provolatile from pg_proc
pp join pg_operator po on pp.proname::text = po.oprcode::text where
po.oprname = '<>' and pp.proname like 'int%ne';
 oprname |   proname   | provolatile
-+-+-
 <>  | int4ne  | i
 <>  | int2ne  | i
 <>  | int24ne | i
 <>  | int42ne | i
 <>  | int8ne  | i
 <>  | int84ne | i
 <>  | int48ne | i
 <>  | interval_ne | i
 <>  | int28ne | i
 <>  | int82ne | i
(10 rows)

Thoughts?

Thanks,
-- 
Mike Palmiotto
Software Engineer
Crunchy Data Solutions
https://crunchydata.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-06-07 Thread Kevin Grittner
On Sun, Jun 4, 2017 at 11:27 AM, Mengxing Liu
 wrote:

> "vmstat 1" output is as follow. Because I used only 30 cores (1/4 of all),  
> cpu user time should be about 12*4 = 48.
> There seems to be no process blocked by IO.
>
> procs ---memory-- ---swap-- -io -system-- 
> --cpu-
>  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa 
> st
> 28  0  0 981177024 315036 7084376000 0 900  1  0 
> 99  0  0
> 21  1  0 981178176 315036 7084378400 0 0 25482 329020 12  
> 3 85  0  0
> 18  1  0 981179200 315036 7084379200 0 0 26569 323596 12  
> 3 85  0  0
> 17  0  0 981175424 315036 7084380800 0 0 25374 322992 12  
> 4 85  0  0
> 12  0  0 981174208 315036 7084382400 0 0 24775 321577 12  
> 3 85  0  0
>  8  0  0 981179328 315036 7084533600 0 0 13115 199020  6  
> 2 92  0  0
> 13  0  0 981179200 315036 7084579200 0 0 22893 301373 11  
> 3 87  0  0
> 11  0  0 981179712 315036 7084580800 0 0 26933 325728 12  
> 4 84  0  0
> 30  0  0 981178304 315036 7084582400 0 0 23691 315821 11  
> 4 85  0  0
> 12  1  0 981177600 315036 7084583200 0 0 29485 320166 12  
> 4 84  0  0
> 32  0  0 981180032 315036 7084584800 0 0 25946 316724 12  
> 4 84  0  0
> 21  0  0 981176384 315036 7084586400 0 0 24227 321938 12  
> 4 84  0  0
> 21  0  0 981178880 315036 7084588000 0 0 25174 326943 13  
> 4 83  0  0

This machine has 120 cores?  Is hyperthreading enabled?  If so, what
you are showing might represent a total saturation of the 30 cores.
Context switches of about 300,000 per second is pretty high.  I can't
think of when I've seen that except when there is high spinlock
contention.

Just to put the above in context, how did you limit the test to 30
cores?  How many connections were open during the test?

> The flame graph is attached. I use 'perf' to generate the flame graph. Only 
> the CPUs running PG server are profiled.
> I'm not familiar with other part of PG. Can you find anything unusual in the 
> graph?

Two SSI functions stand out:
10.86% PredicateLockTuple
 3.51% CheckForSerializableConflictIn

In both cases, most of that seems to go to lightweight locking.  Since
you said this is a CPU graph, that again suggests spinlock contention
issues.

-- 
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Effect of changing the value for PARALLEL_TUPLE_QUEUE_SIZE

2017-06-07 Thread Robert Haas
On Fri, Jun 2, 2017 at 9:15 AM, Amit Kapila  wrote:
> On Fri, Jun 2, 2017 at 6:38 PM, Robert Haas  wrote:
>> On Fri, Jun 2, 2017 at 9:01 AM, Amit Kapila  wrote:
>>> Your reasoning sounds sensible to me.  I think the other way to attack
>>> this problem is that we can maintain some local queue in each of the
>>> workers when the shared memory queue becomes full.  Basically, we can
>>> extend your "Faster processing at Gather node" patch [1] such that
>>> instead of fixed sized local queue, we can extend it when the shm
>>> queue become full.  I think that way we can handle both the problems
>>> (worker won't stall if shm queues are full and workers can do batched
>>> writes in shm queue to avoid the shm queue communication overhead) in
>>> a similar way.
>>
>> We still have to bound the amount of memory that we use for queueing
>> data in some way.
>
> Yeah, probably till work_mem (or some percentage of work_mem).  If we
> want to have some extendable solution then we might want to back it up
> with some file, however, we might not need to go that far.  I think we
> can do some experiments to see how much additional memory is
> sufficient to give us maximum benefit.

Yes, I think that's important.  Also, I think we still need a better
understanding of in which cases the benefit is there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-06-07 Thread Robert Haas
On Tue, Jun 6, 2017 at 12:16 PM, Mengxing Liu
 wrote:
> I think disk I/O is not the bottleneck in our experiment, but the global lock 
> is.

A handy way to figure this kind of thing out is to run a query like
this repeatedly during the benchmark:

SELECT wait_event_type, wait_event FROM pg_stat_activity;

I often do this by using psql's \watch command, often \watch 0.5 to
run it every half-second.  I save all the results collected during the
benchmark using 'script' and then analyze them to see which wait
events are most frequent.  If your theory is right, you ought to see
that SerializableXactHashLock occurs as a wait event very frequently.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Aggregation support for aggregate functions that use transitions not implemented for array_agg

2017-06-07 Thread Robert Haas
On Tue, Jun 6, 2017 at 3:23 PM, David Fetter  wrote:
> I'd bet on lack of tuits.

I expect that was part of it.  Another thing to consider is that, for
numeric aggregates, the transition values don't generally get larger
as you aggregate, but for something like string_agg(), they will.
It's not clear how much work we'll really save by parallelizing that
sort of thing.  Maybe it will be great?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix a typo in snapmgr.c

2017-06-07 Thread Andres Freund
On 2017-05-08 09:12:13 -0400, Tom Lane wrote:
> Simon Riggs  writes:
> > So rearranged code a little to keep it lean.
> 
> Didn't you break it with that?  As it now stands, the memcpy will
> copy the nonzero value.

I've not seen a fix and/or alleviating comment about this so far.  Did I
miss something?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HACKERS[PROPOSAL] split ProcArrayLock into multiple parts

2017-06-07 Thread Robert Haas
On Wed, Jun 7, 2017 at 12:29 PM, Jim Van Fleet  wrote:
>> The basic idea is clear from your description, but it will be better
>> if you share the patch as well.  It will not only help people to
>> review and provide you feedback but also allow them to test and see if
>> they can reproduce the numbers you have mentioned in the mail.
>
> OK -- would love the feedback and any suggestions on how to mitigate the low
> end problems.

Did you intend to attach a patch?

> Sokolov Yura has a patch which, to me, looks good for pgbench rw
> performance.  Does not do so well with hammerdb (about the same as base) on
> single socket and two socket.

Any idea why?  I think we will have to understand *why* certain things
help in some situations and not others, not just *that* they do, in
order to come up with a good solution to this problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Challenges preventing us moving to 64 bit transaction id (XID)?

2017-06-07 Thread Robert Haas
On Wed, Jun 7, 2017 at 12:49 PM, Andres Freund  wrote:
> On 2017-06-07 07:49:00 -0300, Alvaro Herrera wrote:
>> Instead of adding a second 64 bit counter for multixacts, how about
>> first implementing something like TED which gets rid of multixacts (and
>> freezing thereof) altogether?
>
> -1 - that seems like a too high barrier. We've punted on improvements on
> this because of CSN, xid-lsn ranges, and at some point we're going to
> have to make pragmatic choices, rather than strive for something more ideal.

What is the problem that we are trying to solve with this change?  Is
there a practical use case for setting autovacuum_freeze_max_age >
20, or is this just so that when autovacuum fails to vacuum
things in time, we can bloat clog instead of performing an emergency
shutdown?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BEFORE trigger can cause undetected partition constraint violation

2017-06-07 Thread Robert Haas
On Wed, Jun 7, 2017 at 1:23 AM, Amit Langote
 wrote:
> On 2017/06/07 11:57, Amit Langote wrote:
>> How about we export ExecPartitionCheck() out of execMain.c and call it
>> just before ExecFindPartition() using the root table's ResultRelInfo?
>
> Turns out there wasn't a need to export ExecPartitionCheck after all.
> Instead of calling it from execModifyTable.c and copy.c, it's better to
> call it at the beginning of ExecFindPartition() itself.  That way, there
> is no need to add the same code both in CopyFrom() and ExecInsert(), nor
> is there need to make ExecPartitionCheck() public.  That's how the patch
> attached with the previous email does it anyway.

Cool.  I think this is a sensible approach, and have committed the patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench tap tests & minor fixes

2017-06-07 Thread Nikolay Shaplov
В письме от 30 мая 2017 17:24:26 Вы написали:


> > I still have three more questions. A new one:
> > 
> > 
> > 
> >my_command->line = expr_scanner_get_substring(sstate,
> >
> >  start_offset,
> > 
> > - end_offset);
> > + end_offset + 1);
> > 
> > 
> > I do not quite understand what are you fixing here,
> 
> I fix a truncation which appears in an error message later on.
> 
> > I did not find any mention of it in the patch introduction letter.
> 
> Indeed. Just a minor bug fix to avoid an error message to be truncated. If
> you remove it, then you can get:
> 
>   missing argument in command "sleep"
>   \slee
> 
> Note the missing "p"...
> 
> > And more, expr_scanner_get_substring is called twice in very similar
> > code, and it is fixed only once. Can you tell more about this fix.
> 
> I fixed the one which was generating truncated messages. I did not notice
> other truncated messages while testing, so I assume that other calls are
> correct, but I did not investigate further, so I may be wrong. Maybe in
> other instances the truncation removes a "\n" which is intended?

I did some investigation: The code there really suppose that there is always 
\n at the end of the line, and truncates the line. It is done in 

/* Get location of the ending newline */
end_offset = expr_scanner_offset(sstate) - 1;

just two lines above the code we are discussing.

When you have one line code /sleep 2ms with no "end of line" symbol at the 
end, it will cut off "s" instead of "\n"

You've fix it, but now it will leave \n, in all sleeps in multiline scripts.

So this should be fixed in both expr_scanner_get_substring cases, and keep last 
symbol only if it is not "\n".

> Here is a v6.
> 
>   - it uses "progress == 0"
> 
>   - it does not change "nxacts <= 0" because of possible wrapping
> 
>   - it fixes two typos in a perl comment about the checks_all function
> 
>   - I kept "checks all" because this talks more to me than "like more"
> if a native English speaker or someone else has an opinion, it is
> welcome.
Ok, let's leave this for commiter to make final decision.

> Also, if someone could run a test on windows, it would be great.
I'll try to ask a friend of mine to run this on windows...

-- 
Nikolay Shaplov, independent Perl & C/C++ developer. Available for hire.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Challenges preventing us moving to 64 bit transaction id (XID)?

2017-06-07 Thread Andres Freund
On 2017-06-07 07:49:00 -0300, Alvaro Herrera wrote:
> Instead of adding a second 64 bit counter for multixacts, how about
> first implementing something like TED which gets rid of multixacts (and
> freezing thereof) altogether?

-1 - that seems like a too high barrier. We've punted on improvements on
this because of CSN, xid-lsn ranges, and at some point we're going to
have to make pragmatic choices, rather than strive for something more ideal.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A bug in mapping attributes in ATExecAttachPartition()

2017-06-07 Thread Robert Haas
On Wed, Jun 7, 2017 at 7:47 AM, Ashutosh Bapat
 wrote:
> In ATExecAttachPartition() there's following code
>
> 13715 partnatts = get_partition_natts(key);
> 13716 for (i = 0; i < partnatts; i++)
> 13717 {
> 13718 AttrNumber  partattno;
> 13719
> 13720 partattno = get_partition_col_attnum(key, i);
> 13721
> 13722 /* If partition key is an expression, must not skip
> validation */
> 13723 if (!partition_accepts_null &&
> 13724 (partattno == 0 ||
> 13725  !bms_is_member(partattno, not_null_attrs)))
> 13726 skip_validate = false;
> 13727 }
>
> partattno obtained from the partition key is the attribute number of
> the partitioned table but not_null_attrs contains the attribute
> numbers of attributes of the table being attached which have NOT NULL
> constraint on them. But the attribute numbers of partitioned table and
> the table being attached may not agree i.e. partition key attribute in
> partitioned table may have a different position in the table being
> attached. So, this code looks buggy. Probably we don't have a test
> which tests this code with different attribute order between
> partitioned table and the table being attached. I didn't get time to
> actually construct a testcase and test it.

I think this code could be removed entirely in light of commit
3ec76ff1f2cf52e9b900349957b42d28128b7bc7.

Amit?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] strange error message from slave when connection to master cannot be established

2017-06-07 Thread Pavel Stehule
Hi

I got strange error message - false message - max connection is less on
slave than on master, although these numbers was same. The issue was in
wrong connection string in recovery conf and slave cannot to check master
and used some defaults.

Regards

Pavel


Re: [HACKERS] HACKERS[PROPOSAL] split ProcArrayLock into multiple parts

2017-06-07 Thread Jim Van Fleet
Amit Kapila  wrote on 06/07/2017 07:34:06 AM:

...

> > The down side is that on smaller configurations (single socket) where 
there
> > is less "lock thrashing" in the storage subsystem and there are 
multiple
> > Lwlocks to take for an exclusive acquire, there is a decided downturn 
in
> > performance. On  hammerdb, the prototype was 6% worse than the base on 
a
> > single socket power configuration.
> >
> 
> I think any patch having 6% regression on one machine configuration
> and 16% improvement on another machine configuration is not a net win.
> However, if there is a way to address the regression, then it will
> look much attractive.

I have to agree.
> 
> > If there is interest in this approach, I will submit a patch.
> >
> 
> The basic idea is clear from your description, but it will be better
> if you share the patch as well.  It will not only help people to
> review and provide you feedback but also allow them to test and see if
> they can reproduce the numbers you have mentioned in the mail.

OK -- would love the feedback and any suggestions on how to mitigate the 
low end problems.
> 
> There is some related work which was previously proposed in this area
> ("Cache the snapshot") [1] and it claims to reduce contention around
> ProcArrayLock.  I am not sure if that patch still applies, however, if
> you find it relevant and you are interested in evaluating the same,
> then we can request the author to post a rebased version if it doesn't
> apply.

Sokolov Yura has a patch which, to me, looks good for pgbench rw 
performance.  Does not do so well with hammerdb (about the same as base) 
on single socket and two socket.


> 
> [1] - https://www.postgresql.org/message-id/
> CAD__OuiwEi5sHe2wwQCK36Ac9QMhvJuqG3CfPN%2BOFCMb7rdruQ%40mail.gmail.com
> 
> -- 
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
> 




Re: [HACKERS] Challenges preventing us moving to 64 bit transaction id (XID)?

2017-06-07 Thread Masahiko Sawada
On Wed, Jun 7, 2017 at 4:47 PM, Heikki Linnakangas  wrote:
> On 06/06/2017 07:24 AM, Ashutosh Bapat wrote:
>>
>> On Tue, Jun 6, 2017 at 9:48 AM, Craig Ringer 
>> wrote:
>>>
>>> On 6 June 2017 at 12:13, Ashutosh Bapat 
>>> wrote:
>>>
 What happens when the epoch is so low that the rest of the XID does
 not fit in 32bits of tuple header? Or such a case should never arise?
>>>
>>>
>>> Storing an epoch implies that rows can't have (xmin,xmax) different by
>>> more than one epoch. So if you're updating/deleting an extremely old
>>> tuple you'll presumably have to set xmin to FrozenTransactionId if it
>>> isn't already, so you can set a new epoch and xmax.
>>
>>
>> If the page has multiple such tuples, updating one tuple will mean
>> updating headers of other tuples as well? This means that those tuples
>> need to be locked for concurrent scans? May be not, since such tuples
>> will be anyway visible to any concurrent scans and updating xmin/xmax
>> doesn't change the visibility. But we might have to prevent multiple
>> updates to the xmin/xmax because of concurrent updates on the same
>> page.
>
>
> "Store the epoch in the page header" is actually a slightly
> simpler-to-visualize, but incorrect, version of what we actually need to do.
> If you only store the epoch, then all the XIDs on a page need to belong to
> the same epoch, which causes trouble when the current epoch changes. Just
> after the epoch changes, you cannot necessarily freeze all the tuples from
> the previous epoch, because they would not yet be visible to everyone.
>
> The full picture is that we need to store one 64-bit XID "base" value in the
> page header, and all the xmin/xmax values in the tuple headers are offsets
> relative to that base. With that, you effectively have 64-bit XIDs, as long
> as the *difference* between any two XIDs on a page is not greater than 2^32.
> That can be guaranteed, as long as we don't allow a transaction to be
> in-progress for more than 2^32 XIDs. That seems like a reasonable
> limitation.
>
> But yes, when the "current XID - base XID in page header" becomes greater
> than 2^32, and you need to update a tuple on that page, you need to first
> freeze the page, update the base XID on the page header to a more recent
> value, and update the XID offsets on every tuple on the page accordingly.
> And to do that, you need to hold a lock on the page. If you don't move any
> tuples around at the same time, but just update the XID fields, and
> exclusive lock on the page is enough, i.e. you don't need to take a
> super-exclusive or vacuum lock. In any case, it happens so infrequently that
> it should not become a serious burden.
>

Freezing a page is required when modifying a tuple on the page by a
transaction with greater than 2^32 XID. Is that right?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [BUGS] BUG #14682: row level security not work with partitioned table

2017-06-07 Thread Joe Conway
On 06/07/2017 06:49 AM, Mike Palmiotto wrote:
> I ended up narrowing it down to 4 tables (one parent and 3 partitions)
> in order to demonstrate policy sorting and order of RLS/partition
> constraint checking. It should be much more straight-forward now, but
> let me know if there are any further recommended changes.

Thanks, will take a look towards the end of the day.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] statement_timeout is not working as expected with postgres_fdw

2017-06-07 Thread Amit Kapila
On Sat, Jun 3, 2017 at 1:03 AM, Robert Haas  wrote:
> On Fri, Jun 2, 2017 at 3:48 AM, Rafia Sabih
>  wrote:
>
> I don't see how to do that.  It could possibly be done with the TAP
> framework, but that exceeds my abilities.
>
> Here's an updated patch with a bunch of cosmetic fixes, plus I
> adjusted things so that do_sql_command() is more interruptible.  I
> tested it manually and it seems to work OK.  I'll commit and
> back-patch this version, barring objections or further suggestions for
> improvement.
>

No objections from me, updated patch looks good.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw cost estimation defaults and documentation

2017-06-07 Thread Robert Haas
On Tue, Jun 6, 2017 at 9:24 PM, Jim Finnerty  wrote:
> In some MPP systems, networking costs are modeled separately from I/O costs,
> processor costs, or memory access costs.  I think this is what Ashutosh may
> have been getting at with /per-packet/ costs:  in a more sophisticated fdw
> cost model there could be a  network cost per /packet/ that would be
> independent of the cost of reading the next page or a random page from local
> storage.

I agree.  I think the question is how much we'd gain in practice if we
modeled the cost more accurately.  IMHO, the bigger problem with the
FDW stuff today is that we still lack partition-wise join,
partition-wise aggregate, and asynchronous query, which means that
only relatively simple queries involving foreign tables have a chance
of getting the plan you'd really like to have.  Until that's fixed, I
don't personally think it's worth spending a lot of time trying to
tweak the costing model.  Of course, if somebody wants to take a run
at it and can show that the benefit is there, cool beans.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-06-07 Thread Amit Khandekar
On 7 June 2017 at 16:42, Amit Khandekar  wrote:
> The column bitmap set returned by GetUpdatedColumns() refer to
> attribute numbers w.r.t. to the root partition. And the
> mstate->resultRelInfo[] have attnos w.r.t. to the leaf partitions. So
> we need to do something similar to map_partition_varattnos() to change
> the updated columns attnos to the leaf partitions

I was wrong about this. Each of the mtstate->resultRelInfo[] has its
own corresponding RangeTblEntry with its own updatedCols having attnos
accordingly adjusted to refer its own table attributes. So we don't
have to do the mapping; we need to get modifedCols separately for each
of the ResultRelInfo, rather than the root relinfo.

> and walk down the
> partition constraint expressions to find if the attnos are present
> there.

But this we will need to do.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use of non-restart-safe storage by temp_tablespaces

2017-06-07 Thread Robert Haas
On Tue, Jun 6, 2017 at 3:39 AM, Alvaro Herrera  wrote:
> FWIW I don't think calling these tablespaces "temporary" is the right
> word.  It's not the tablespaces that are temporary.  Maybe "evanescent".

While I would personally find it pretty hilarious to see the
EVANESCENT in kwlist.h, I think it's probably a good idea to avoid
choosing keywords that will cause even proficient speakers of English
to need a dictionary.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is ECPG's SET CONNECTION really not thread-aware?

2017-06-07 Thread Michael Meskes
On Wed, Jun 07, 2017 at 03:45:23AM +, Tsunakawa, Takayuki wrote:
> Could you also apply it to past versions if you don't mind?  The oldest 
> supported version 9.2 is already thread-aware.

Done.

My standard workflow is to wait a couple days to see if everything works nicely 
before backporting. Obviously this doesn't make any sense for a doc patch, sigh.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: retry shm attach for windows (WAS: Re: [HACKERS] OK, so culicidae is *still* broken)

2017-06-07 Thread Tom Lane
Amit Kapila  writes:
> On Tue, Jun 6, 2017 at 10:14 PM, Tom Lane  wrote:
>> By definition, the address range we're trying to reuse worked successfully
>> in the postmaster process.  I don't see how forcing a specific address
>> could do anything but create an additional risk of postmaster startup
>> failure.

> I think it won't create an additional risk, because the idea is that
> if we fail to map the shm segment at a predefined address, then we
> will allow the system to choose the initial address as we are doing
> now.  So, it can reduce chances of doing retries.

[ shrug... ]  That would just make the patch even more complicated and
hard to test.  And it doesn't do anything to fix the ASLR issue.
Could we get on with trying to test something that *does* fix the
ASLR issue, like the draft patch I posted upthread?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #14682: row level security not work with partitioned table

2017-06-07 Thread Mike Palmiotto
On Tue, Jun 6, 2017 at 9:12 PM, Michael Paquier
 wrote:
> On Wed, Jun 7, 2017 at 9:52 AM, Joe Conway  wrote:
>> Thanks Mike. I'll take a close look to verify output correctnes, but I
>> am concerned that the new tests are unnecessarily complex. Any other
>> opinions on that?
>
> Some tests would be good to have. Now, if I read those regression
> tests correctly, this is using 10 relations where two would be enough,
> one as the parent relation and one as a partition. Then policies apply
> on the parent relation. The same kind of policy is defined 4 times,
> and there is bloat with GRANT and ALTER TABLE commands.

I ended up narrowing it down to 4 tables (one parent and 3 partitions)
in order to demonstrate policy sorting and order of RLS/partition
constraint checking. It should be much more straight-forward now, but
let me know if there are any further recommended changes.

One thing that concerns me is the first EXPLAIN plan from regress_rls_dave:
+EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
+ QUERY PLAN
+
+ Append
+   InitPlan 1 (returns $0)
+ ->  Index Scan using uaccount_pkey on uaccount
+   Index Cond: (pguser = CURRENT_USER)
+   ->  Seq Scan on part_document_fiction
+ Filter: ((cid <> 55) AND (cid <> 55) AND (cid < 99) AND
(dlevel <= $0) AND f_leak(dtitle))
+   ->  Seq Scan on part_document_satire
+ Filter: ((cid <> 55) AND (cid <> 55) AND (cid < 99) AND
(dlevel <= $0) AND f_leak(dtitle))
+(8 rows)

I would expect that both part_document_satire (cid == 55) and
part_document_nonfiction (cid == 99) would be excluded from the
explain, but only cid < 99 seems to work. Interestingly, when I change
policy pp1r to cid < 55, I see the following:

+EXPLAIN (COSTS OFF) SELECT * FROM part_document WHERE f_leak(dtitle);
+QUERY PLAN
+---
+ Append
+   InitPlan 1 (returns $0)
+ ->  Index Scan using uaccount_pkey on uaccount
+   Index Cond: (pguser = CURRENT_USER)
+   ->  Seq Scan on part_document_fiction
+ Filter: ((cid < 55) AND (cid <> 55) AND (cid < 99) AND
(dlevel <= $0) AND f_leak(dtitle))
+(6 rows)

Is this a demonstration of a non-immutable function backing the
operator and thus not being able to filter it from the planner, or is
it a bug?

>
> +SELECT * FROM part_document;
> + did | cid | dlevel |  dauthor  | dtitle
> +-+-++---+-
> +   1 |  11 |  1 | regress_rls_bob   | my first novel
> Adding an "ORDER BY did" as well here would make the test output more
> predictable.

Done.

Thanks,
-- 
Mike Palmiotto
Software Engineer
Crunchy Data Solutions
https://crunchydata.com
From 8c55045bd2d856fe4707582de0270c26d3a4c285 Mon Sep 17 00:00:00 2001
From: Mike Palmiotto 
Date: Wed, 24 May 2017 16:54:49 +
Subject: [PATCH] Add RLS support to partitioned tables

This is needed to get RLS policies to apply to the parent partitioned table.
Without this change partitioned tables are skipped.
---
 src/backend/rewrite/rewriteHandler.c  |   3 +-
 src/test/regress/expected/rowsecurity.out | 425 ++
 src/test/regress/sql/rowsecurity.sql  | 148 +++
 3 files changed, 575 insertions(+), 1 deletion(-)

diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index 35ff8bb..6cd73c1 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -1835,7 +1835,8 @@ fireRIRrules(Query *parsetree, List *activeRIRs, bool forUpdatePushedDown)
 
 		/* Only normal relations can have RLS policies */
 		if (rte->rtekind != RTE_RELATION ||
-			rte->relkind != RELKIND_RELATION)
+			(rte->relkind != RELKIND_RELATION &&
+			rte->relkind != RELKIND_PARTITIONED_TABLE))
 			continue;
 
 		rel = heap_open(rte->relid, NoLock);
diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out
index 7bf2936..792d24e 100644
--- a/src/test/regress/expected/rowsecurity.out
+++ b/src/test/regress/expected/rowsecurity.out
@@ -899,6 +899,431 @@ EXPLAIN (COSTS OFF) SELECT * FROM t1 WHERE f_leak(b);
  Filter: f_leak(b)
 (7 rows)
 
+--
+-- Partitioned Tables
+--
+SET SESSION AUTHORIZATION regress_rls_alice;
+CREATE TABLE part_document (
+did int,
+cid int,
+dlevel  int not null,
+dauthor name,
+dtitle  text
+) PARTITION BY RANGE (cid);
+GRANT ALL ON part_document TO public;
+-- Create partitions for document categories
+CREATE TABLE part_document_fiction PARTITION OF part_document FOR VALUES FROM ('11') to ('12');
+CREATE TABLE part_document_satire PARTITION OF 

Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests

2017-06-07 Thread Tom Lane
Robert Haas  writes:
> On Wed, Jun 7, 2017 at 6:36 AM, Amit Kapila  wrote:
>> I don't think so because this problem has been reported previously as
>> well [1][2] even before the commit in question.
>> 
>> [1] - 
>> https://www.postgresql.org/message-id/1ce5a19f-3b1d-bb1c-4561-0158176f65f1%40dunslane.net
>> [2] - https://www.postgresql.org/message-id/25861.1472215822%40sss.pgh.pa.us

> Oh, good point.  So this is a longstanding bug that has possibly just
> gotten easier to hit.

> I still think figuring out what's going on with the
> ParallelWorkerNumbers is a good plan.

Also, your suggestion in

https://www.postgresql.org/message-id/CA%2BTgmob29v0zASBNfgO1Mq9yJ7_TRoAjL%3DO%2B2rXS0gBZezv%2BrQ%40mail.gmail.com

for a quicker path to reproducing it might still be valid.  Although at
this point, it seems like something we've changed recently has made it
occur often enough in the buildfarm that repeating the standard regression
tests should be good enough.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why does logical replication launcher set application_name?

2017-06-07 Thread Robert Haas
On Tue, Jun 6, 2017 at 10:58 PM, Peter Eisentraut
 wrote:
> The decision was made to add background workers to pg_stat_activity, but
> no facility was provided to tell the background workers apart.  Is it
> now the job of every background worker to invent a hack to populate some
> other pg_stat_activity field with some ad hoc information?  What about
> other logical replication worker types, parallel workers, external
> background workers, auto-prewarm?

To be fair, some background workers were already shown in
pg_stat_activity; others were not.  Parallel workers, for example,
always showed up in pg_stat_activity, but before v10 they didn't show
the query string, so you had to match them up with the process that
started them using the other fields.  I think it's only workers not
connected to a database that weren't previously displayed.  You might
be right that not enough that was given to how those could identify
themselves, but you would have had the problem anyway with logical
replication workers that connect to a database.

> I think the bgw_type addition that I proposed nearby would solve this
> quite well, but it needs a bit of work.  And arguably, it's too late for
> PG10, but one could also argue that this is a design fault in the
> pg_stat_activity extension that is valid to fix in PG10.

I don't mind inventing a way for a background worker to display its
own text in place of the generic "background worker", but like others,
I didn't like splitting up the name field into two parts.  I think you
could do the former without doing the latter.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HACKERS[PROPOSAL] split ProcArrayLock into multiple parts

2017-06-07 Thread Amit Kapila
On Tue, Jun 6, 2017 at 1:00 AM, Jim Van Fleet  wrote:
> Hi,
>
> I have been experimenting with splitting  the ProcArrayLock into parts.
> That is, to Acquire the ProcArrayLock in shared mode, it is only necessary
> to acquire one of the parts in shared mode; to acquire the lock in exclusive
> mode, all of the parts must be acquired in exclusive mode. For those
> interested, I have attached a design description of the change.
>
> This approach has been quite successful on large systems with the hammerdb
> benchmark.With a prototype based on 10 master source and running on power8
> (model 8335-GCA with 2sockets, 20 core)
>  hammerdb  improved by 16%; On intel (Intel(R) Xeon(R) CPU E5-2699 v4 @
> 2.20GHz, 2 socket, 44 core) with 9.6 base and prototype hammerdb improved by
> 4%. (attached is a set of spreadsheets for power8.
>
> The down side is that on smaller configurations (single socket) where there
> is less "lock thrashing" in the storage subsystem and there are multiple
> Lwlocks to take for an exclusive acquire, there is a decided downturn in
> performance. On  hammerdb, the prototype was 6% worse than the base on a
> single socket power configuration.
>

I think any patch having 6% regression on one machine configuration
and 16% improvement on another machine configuration is not a net win.
However, if there is a way to address the regression, then it will
look much attractive.

> If there is interest in this approach, I will submit a patch.
>

The basic idea is clear from your description, but it will be better
if you share the patch as well.  It will not only help people to
review and provide you feedback but also allow them to test and see if
they can reproduce the numbers you have mentioned in the mail.

There is some related work which was previously proposed in this area
("Cache the snapshot") [1] and it claims to reduce contention around
ProcArrayLock.  I am not sure if that patch still applies, however, if
you find it relevant and you are interested in evaluating the same,
then we can request the author to post a rebased version if it doesn't
apply.

[1] - 
https://www.postgresql.org/message-id/CAD__OuiwEi5sHe2wwQCK36Ac9QMhvJuqG3CfPN%2BOFCMb7rdruQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why does logical replication launcher set application_name?

2017-06-07 Thread Magnus Hagander
On Wed, Jun 7, 2017 at 4:58 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 6/6/17 15:58, Robert Haas wrote:
> > The problem with the status quo (after Peter's commit) is that there's
> > now nothing at all to identify the logical replication launcher, apart
> > from the wait_event field, which is likely to be LogicalLauncherMain
> > fairly often if you've got the launcher.  I don't personally see why
> > we can't simply adopt Tom's original proposal of setting the query
> > string to something like "" which, while
> > maybe not as elegant as providing some way to override the
> > backend_type field, would be almost no work and substantially better
> > for v10 than what we've got now.
>
> The decision was made to add background workers to pg_stat_activity, but
> no facility was provided to tell the background workers apart.  Is it
> now the job of every background worker to invent a hack to populate some
> other pg_stat_activity field with some ad hoc information?  What about
> other logical replication worker types, parallel workers, external
> background workers, auto-prewarm?
>
> I think the bgw_type addition that I proposed nearby would solve this
> quite well, but it needs a bit of work.  And arguably, it's too late for
> PG10, but one could also argue that this is a design fault in the
> pg_stat_activity extension that is valid to fix in PG10.
>

+1. I definitely think it would be a bad idea to put in what basically
looks like a workaround into 10, since the new feature was added there. I'd
rather have the fix for pg_stat_activity.

We used to keep our query state as a text field and that was a bad idea for
many reasons. So we moved it to a separate field. Let's not repeat that
mistake here.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-06-07 Thread Robert Haas
On Tue, Jun 6, 2017 at 6:29 AM, Rafia Sabih
 wrote:
> On Mon, Jun 5, 2017 at 8:06 PM, Robert Haas  wrote:
>> Many of these seem worse, like these ones:
>>
>> - * Quit if we've reached records for another database. Unless the
>> + * Quit if we've reached records of another database. Unless the
>>
> Why is it worse? I've always encountered using "records of database"
> and not "records for database". Anyhow, I tried reframing the sentence
> altogether.

My experience is the opposite.  If I Google "from another database",
including the quotes, I get 516,000 hits; if I do the same using "of
another database", I get 110,000 hits.  I think that means the former
usage is more popular, although obviously to some degree it's a matter
of taste.

+ *database, tablespace, filenode, forknum, blocknum in

The file doesn't contain the spaces you added here, which is probably
why Mithun did it as he did, but actually we don't really need this
level of detail in the file header comment anyway.  I think you could
drop the specific mention of how blocks are identified.

+ * GUC variable to decide if autoprewarm worker has to be started when

if->whether? has to be->should be?

Actually this patch uses "if" in various places where I would use
"whether", but that's probably a matter of taste to some extent.

- * Start of prewarm per-database worker. This will try to load blocks of one
- * database starting from block info position state->prewarm_start_idx to
- * state->prewarm_stop_idx.
+ * Connect to the database and load the blocks of that database starting from
+ * the position state->prewarm_start_idx to state->prewarm_stop_idx.

That's a really good rephrasing.  It's more compact and more
imperative.  The only thing that seems a little off is that you say
"starting from" and then mention both the start and stop indexes.
Maybe say "between" instead.

- * Quit if we've reached records for another database. Unless the
- * previous blocks were of global objects which were combined with
- * next database's block infos.
+ * Quit if we've reached records for another database. If previous
+ * blocks are of some global objects, then continue pre-warming.

Good.

- * When we reach a new relation, close the old one.  Note, however,
- * that the previous try_relation_open may have failed, in which case
- * rel will be NULL.
+ * As soon as we encounter a block of a new relation, close the old
+ * relation. Note, that rel will be NULL if try_relation_open failed
+ * previously, in that case there is nothing to close.

I wrote the original comment here, so you may not be too surprised to
here that I liked it as it was.  Actually, your rewrite of the first
sentence seems like it might be better, but I think the second one is
not.  By deleting "however", you've made the comma after "Note"
redundant, and by changing "in which case" to "in that case", you've
made a dependent clause into a comma splice.  I hate comma splices.

https://en.wikipedia.org/wiki/Comma_splice

+ * $PGDATA/AUTOPREWARM_FILE to a dsm. Further, these BlockInfoRecords are

I would probably capitalize DSM here.  There's no data structure
called dsm (lower-case) and we have other examples where it is
capitalized in documentation and comment text (see, e.g.
custom-scan.sgml).

+ * Since there could be at most one worker for prewarm, locking is not

could -> can?

- * For block info of a global object whose database will be 0
- * try to combine them with next non-zero database's block
- * infos to load.
+ * For the BlockRecordInfos of a global object, combine them
+ * with BlockRecordInfos of the next non-global object.

Good.  Or even "Combine BlockRecordInfos for a global object with the
next non-global object", which I think is even more compact.

+ * If we are here with database having InvalidOid, then only
+ * BlockRecordInfos belonging to global objects exist. Since, we can
+ * not connect with InvalidOid skip prewarming for these objects.

If we reach this point with current_db == InvalidOid, ...

+ *Sub-routine to periodically call dump_now().

Every existing use of the word subroutine in our code based spells it
with no hyphen.

[rhaas pgsql]$ git grep -- Subroutine | wc -l
  47
[rhaas pgsql]$ git grep -- Sub-routine | wc -l
   0

Personally, I find this change worse, because calling something a
subroutine is totally generic, like saying that you met a "person" or
that something was "nice".  Calling it a loop gives at least a little
bit of specific information.

+ * Call dum_now() at regular intervals defined by GUC variable dump_interval.

This change introduces an obvious typographical error.

-/* One last block info dump while postmaster shutdown. */
+/* It's time for 

Re: [HACKERS] Server ignores contents of SASLInitialResponse

2017-06-07 Thread Michael Paquier
On Wed, Jun 7, 2017 at 8:27 PM, Heikki Linnakangas  wrote:
> Ok, I committed your patch, with some minor changes.

Thanks for the commit.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests

2017-06-07 Thread Robert Haas
On Wed, Jun 7, 2017 at 6:36 AM, Amit Kapila  wrote:
> I don't think so because this problem has been reported previously as
> well [1][2] even before the commit in question.
>
> [1] - 
> https://www.postgresql.org/message-id/1ce5a19f-3b1d-bb1c-4561-0158176f65f1%40dunslane.net
> [2] - https://www.postgresql.org/message-id/25861.1472215822%40sss.pgh.pa.us

Oh, good point.  So this is a longstanding bug that has possibly just
gotten easier to hit.

I still think figuring out what's going on with the
ParallelWorkerNumbers is a good plan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG 10 release notes

2017-06-07 Thread Bruce Momjian
On Wed, Jun  7, 2017 at 03:18:49PM +1000, Neha Khatri wrote:
> 
> On Mon, May 15, 2017 at 12:45 PM, Bruce Momjian  wrote:
> 
> On Thu, May 11, 2017 at 11:50:03PM -0400, Tom Lane wrote:
> > Michael Paquier  writes:
> > > Bruce, the release notes do not mention yet that support for cleartext
> > > passwords is removed. This has been done recently by Heikki in
> > > eb61136d. This has as consequence that CREATE ROLE PASSWORD
> > > UNENCRYPTED returns an error, and that password_encryption loses its
> > > value 'off' compared to last releases.
> >
> > The release notes only say that they're current through 4-22.  The
> > usual process is to update them in batches every so often.  It'd be
> > great if Bruce has time to do another round before Monday, but the
> > current situation is not really broken.
> 
> Done.  Applied patch attached.
> 
> 
> 
> The patch introduced one release note item twice in 
> https://www.postgresql.org/
> docs/10/static/release-10.html :
> 
> 
>   • Rename WAL-related functions and views to use lsn instead of location 
> (David Rowley)
> 
>   • RenameWAL-related functions and views to use lsn instead of location 
> (David
> Rowley)
> 
> Perhaps just one entry is good.

Yes, this has been fixed already, see:

https://www.postgresql.org/docs/devel/static/release-10.html

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] A bug in mapping attributes in ATExecAttachPartition()

2017-06-07 Thread Ashutosh Bapat
In ATExecAttachPartition() there's following code

13715 partnatts = get_partition_natts(key);
13716 for (i = 0; i < partnatts; i++)
13717 {
13718 AttrNumber  partattno;
13719
13720 partattno = get_partition_col_attnum(key, i);
13721
13722 /* If partition key is an expression, must not skip
validation */
13723 if (!partition_accepts_null &&
13724 (partattno == 0 ||
13725  !bms_is_member(partattno, not_null_attrs)))
13726 skip_validate = false;
13727 }

partattno obtained from the partition key is the attribute number of
the partitioned table but not_null_attrs contains the attribute
numbers of attributes of the table being attached which have NOT NULL
constraint on them. But the attribute numbers of partitioned table and
the table being attached may not agree i.e. partition key attribute in
partitioned table may have a different position in the table being
attached. So, this code looks buggy. Probably we don't have a test
which tests this code with different attribute order between
partitioned table and the table being attached. I didn't get time to
actually construct a testcase and test it.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Broken hint bits (freeze)

2017-06-07 Thread Sergey Burladyan
Vladimir Borodin  writes:

> > 6 июня 2017 г., в 23:30, Sergey Burladyan  написал(а):
> > 
> > Dmitriy Sarafannikov  writes:
> > 
> >> Starting and stopping master after running pg_upgrade but before rsync to 
> >> collect statistics
> >> was a bad idea.
> > 
> > But, starting and stopping master after running pg_upgrade is *required*
> > by documentation:
> > https://www.postgresql.org/docs/9.6/static/pgupgrade.html
> >> f. Start and stop the new master cluster
> >> In the new master cluster, change wal_level to replica in the 
> >> postgresql.conf file and then start and stop the cluster.
> > 
> > and there is no any suggestion to disable autovacuum for it.

> Yep. This should probably be fixed in the documentation?

I think so. There is some problem in pg_upgrade documentation, nothing about:
1. preventing heap change by vacuum, analyze, something also when master
   restarted after pg_upgrade but before rsync
2. log-shipping only standby cannot shutdown at the same checkpoint with
   master

I try to start discuss about this: 
https://www.postgresql.org/message-id/87y3ta49zp.fsf%40seb.koffice.internal
but without luck :-) 

PS: I CC'd Bruce here.

-- 
Sergey Burladyan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server ignores contents of SASLInitialResponse

2017-06-07 Thread Heikki Linnakangas

On 06/06/2017 06:09 AM, Michael Paquier wrote:

On Thu, Jun 1, 2017 at 4:58 AM, Heikki Linnakangas  wrote:

To fix, I suppose we can do what you did for SASL in your patch, and move
the cleanup of conn->gctx from closePGconn to pgDropConnection. And I
presume we need to do the same for the SSPI state too, but I don't have a
Windows set up to test that at the moment.


SSPI does not complain with sslmode=prefer as each time
pg_SSPI_startup() is called conn->sspictx is enforced to NULL. This
looks wrong to me by the way as pg_SSPI_startup() is invoked only once
per authentication, and it leaks memory this way. That's also
inconsistent with SASL and GSS. At the same time this inconsistency is
not causing actual problems except a leak with SSPI in libpq, so not
doing anything except on HEAD looks fine to me.


Ok, I committed your patch, with some minor changes. I added a line to 
also clear "conn->usesspi". Without that, if the server on first attempt 
asked for SSPI authentication, but GSS on the second attempt, we would 
incorrectly try to continue SSPI authentication during the second 
attempt. Also, I kept the existing code to discard the input and output 
data together, and added the new code after that, instead of in the 
middle. And added some newlines to pqDropConnection for beauty.


BTW, multiple connection attempts if "host" is a list of hostnames, 
which is now possible in version 10, also had the same issue. On master, 
that was the easiest way to reproduce this.


I decided to backpatch this down to 9.3, after all. It is clearly a bug, 
although unlikely to be hit in typical configurations. One configuration 
where this can be reproduced, is if you have separate "hostnossl" and 
"hostssl" lines in pg_hba.conf, for Kerberos authentication, but with 
different options. If the options are such that the first 
authentication, with SSL, fails, but the second one should succeed, 
before this fix the second attempt would nevertheless fail with the 
"duplicate authentication request".


The code in 9.2 was sufficiently different that I didn't backport it 
there, out of conservatism (ok, laziness).


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-06-07 Thread Amit Khandekar
On 6 June 2017 at 23:52, Robert Haas  wrote:
> On Fri, Jun 2, 2017 at 7:07 AM, Amit Khandekar  wrote:
>> So, according to that, below would be the logic :
>>
>> Run partition constraint check on the original NEW row.
>> If it succeeds :
>> {
>> Fire BR UPDATE trigger on the original partition.
>> Run partition constraint check again with the modified NEW row
>> (may be do this only if the trigger modified the partition key)
>> If it fails,
>> abort.
>> Else
>> proceed with the usual local update.
>> }
>> else
>> {
>> Fire BR UPDATE trigger on original partition.
>> Find the right partition for the modified NEW row.
>> If it is the same partition,
>> proceed with the usual local update.
>> else
>> do the row movement.
>> }
>
> Sure, that sounds about right, although the "Fire BR UPDATE trigger on
> the original partition." is the same in both branches, so I'm not
> quite sure why you have that in the "if" block.

Actually after coding this logic, it looks a bit different. See
ExecUpdate() in the attached file  trigger_related_changes.patch



Now that we are making sure trigger won't change the partition of the
tuple, next thing we need to do is, make sure the tuple routing setup
is done *only* if the UPDATE is modifying partition keys. Otherwise,
this will degrade normal update performance.

Below is the logic I am implementing for determining whether the
UPDATE is modifying partition keys.

In ExecInitModifyTable() ...
Call GetUpdatedColumns(mtstate->rootResultRelInfo, estate) to get
updated_columns.
For each of the updated_columns :
{
Check if the column is part of partition key quals of any of
the relations in mtstate->resultRelInfo[] array.
/*
 * mtstate->resultRelInfo[] contains exactly those leaf partitions
 * which qualify the update quals.
 */

If (it is part of partition key quals of at least one of the relations)
{
   Do ExecSetupPartitionTupleRouting() for the root partition.
   break;
}
}

Few things need to be considered :

Use Relation->rd_partcheck to get partition check quals of each of the
relations in mtstate->resultRelInfo[].

The Relation->rd_partcheck of the leaf partitions would include the
ancestors' partition quals as well. So we are good: we don't have to
explicitly get the upper partition constraints. Note that an UPDATE
can modify a column which is not used in a partition constraint
expressions of any of the partitions or partitioned tables in the
subtree, but that column may have been used in partition constraint of
a partitioned table belonging to upper subtree.

All of the relations in mtstate->resultRelInfo are already open. So we
don't need to re-open any more relations to get the partition quals.

The column bitmap set returned by GetUpdatedColumns() refer to
attribute numbers w.r.t. to the root partition. And the
mstate->resultRelInfo[] have attnos w.r.t. to the leaf partitions. So
we need to do something similar to map_partition_varattnos() to change
the updated columns attnos to the leaf partitions and walk down the
partition constraint expressions to find if the attnos are present
there.


Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


trigger_related_changes.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Challenges preventing us moving to 64 bit transaction id (XID)?

2017-06-07 Thread Alvaro Herrera
Alexander Korotkov wrote:

> Right.  I used the term "64-bit epoch" during developer unconference, but
> that was ambiguous.  It would be more correct to call it a "64-bit base".
> BTW, we will have to store two 64-bit bases: for xids and for multixacts,
> because they are completely independent counters.

So this takes us from 4 additional bytes per page, to 16 additional
bytes per page.  With the proposal to require 4 free bytes it seemed
quite unlikely that many pages would fail to comply (so whatever
fallback mechanism was needed during page upgrade would be seldom used),
but now that they are 16, the likelihood of needing to run that page
upgrade seems a tad high.

Instead of adding a second 64 bit counter for multixacts, how about
first implementing something like TED which gets rid of multixacts (and
freezing thereof) altogether?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Challenges preventing us moving to 64 bit transaction id (XID)?

2017-06-07 Thread Alexander Korotkov
On Wed, Jun 7, 2017 at 10:47 AM, Heikki Linnakangas  wrote:

> On 06/06/2017 07:24 AM, Ashutosh Bapat wrote:
>
>> On Tue, Jun 6, 2017 at 9:48 AM, Craig Ringer 
>> wrote:
>>
>>> On 6 June 2017 at 12:13, Ashutosh Bapat 
>>> wrote:
>>>
>>> What happens when the epoch is so low that the rest of the XID does
 not fit in 32bits of tuple header? Or such a case should never arise?

>>>
>>> Storing an epoch implies that rows can't have (xmin,xmax) different by
>>> more than one epoch. So if you're updating/deleting an extremely old
>>> tuple you'll presumably have to set xmin to FrozenTransactionId if it
>>> isn't already, so you can set a new epoch and xmax.
>>>
>>
>> If the page has multiple such tuples, updating one tuple will mean
>> updating headers of other tuples as well? This means that those tuples
>> need to be locked for concurrent scans? May be not, since such tuples
>> will be anyway visible to any concurrent scans and updating xmin/xmax
>> doesn't change the visibility. But we might have to prevent multiple
>> updates to the xmin/xmax because of concurrent updates on the same
>> page.
>>
>
> "Store the epoch in the page header" is actually a slightly
> simpler-to-visualize, but incorrect, version of what we actually need to
> do. If you only store the epoch, then all the XIDs on a page need to belong
> to the same epoch, which causes trouble when the current epoch changes.
> Just after the epoch changes, you cannot necessarily freeze all the tuples
> from the previous epoch, because they would not yet be visible to everyone.
>
> The full picture is that we need to store one 64-bit XID "base" value in
> the page header, and all the xmin/xmax values in the tuple headers are
> offsets relative to that base. With that, you effectively have 64-bit XIDs,
> as long as the *difference* between any two XIDs on a page is not greater
> than 2^32. That can be guaranteed, as long as we don't allow a transaction
> to be in-progress for more than 2^32 XIDs. That seems like a reasonable
> limitation.
>

Right.  I used the term "64-bit epoch" during developer unconference, but
that was ambiguous.  It would be more correct to call it a "64-bit base".
BTW, we will have to store two 64-bit bases: for xids and for multixacts,
because they are completely independent counters.

But yes, when the "current XID - base XID in page header" becomes greater
> than 2^32, and you need to update a tuple on that page, you need to first
> freeze the page, update the base XID on the page header to a more recent
> value, and update the XID offsets on every tuple on the page accordingly.
> And to do that, you need to hold a lock on the page. If you don't move any
> tuples around at the same time, but just update the XID fields, and
> exclusive lock on the page is enough, i.e. you don't need to take a
> super-exclusive or vacuum lock. In any case, it happens so infrequently
> that it should not become a serious burden.


Yes, exclusive lock seems to be enough for single page freeze.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests

2017-06-07 Thread Amit Kapila
On Wed, Jun 7, 2017 at 12:37 AM, Robert Haas  wrote:
> On Tue, Jun 6, 2017 at 2:21 PM, Tom Lane  wrote:
>>> One thought is that the only places where shm_mq_set_sender() should
>>> be getting invoked during the main regression tests are
>>> ParallelWorkerMain() and ExecParallelGetReceiver, and both of those
>>> places using ParallelWorkerNumber to figure out what address to pass.
>>> So if ParallelWorkerNumber were getting set to the same value in two
>>> different parallel workers - e.g. because the postmaster went nuts and
>>> launched two processes instead of only one - or if
>>> ParallelWorkerNumber were not getting initialized at all or were
>>> getting initialized to some completely bogus value, it could cause
>>> this symptom.
>>
>> Hmm.  With some generous assumptions it'd be possible to think that
>> aa1351f1eec4adae39be59ce9a21410f9dd42118 triggered this.  That commit was
>> present in 20 successful lorikeet runs before the first of these failures,
>> which is a bit more than the MTBF after that, but not a huge amount more.
>>
>> That commit in itself looks innocent enough, but could it have exposed
>> some latent bug in bgworker launching?
>
> Hmm, that's a really interesting idea, but I can't quite put together
> a plausible theory around it.  I mean, it seems like that commit could
> make launching bgworkers faster, which could conceivably tickle some
> heretofore-latent timing-related bug.  But it wouldn't, IIUC, make the
> first worker start any faster than before - it would just make them
> more closely-spaced thereafter, and it's not very obvious how that
> would cause a problem.
>
> Another idea is that the commit in question is managing to corrupt
> BackgroundWorkerList somehow.
>

I don't think so because this problem has been reported previously as
well [1][2] even before the commit in question.


[1] - 
https://www.postgresql.org/message-id/1ce5a19f-3b1d-bb1c-4561-0158176f65f1%40dunslane.net
[2] - https://www.postgresql.org/message-id/25861.1472215822%40sss.pgh.pa.us

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Use of snapshot in logical replication

2017-06-07 Thread sanyam jain
Hi,

Can someone explain the usage of exporting snapshot when a logical replication 
slot is created?


Thanks,

Sanyam Jain


Re: retry shm attach for windows (WAS: Re: [HACKERS] OK, so culicidae is *still* broken)

2017-06-07 Thread Amit Kapila
On Tue, Jun 6, 2017 at 10:14 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> I think the idea of retrying process creation (and I definitely agree
>> with Tom and Magnus that we have to retry process creation, not just
>> individual mappings) is a good place to start.  Now if we find that we
>> are having to retry frequently, then I think we might need to try
>> something along the lines of what Andres proposed and what nginx
>> apparently did.  However, any fixed address will be prone to
>> occasional failures (or maybe, on some systems, regular failures) if
>> that particular address happens to get claimed by something.  I don't
>> think we can say that there is any address where that definitely won't
>> happen.  So I would say let's do this retry thing first, and then if
>> that proves inadequate, we can also try moving the mappings to a range
>> where conflicts are less likely.
>
> By definition, the address range we're trying to reuse worked successfully
> in the postmaster process.  I don't see how forcing a specific address
> could do anything but create an additional risk of postmaster startup
> failure.
>

I think it won't create an additional risk, because the idea is that
if we fail to map the shm segment at a predefined address, then we
will allow the system to choose the initial address as we are doing
now.  So, it can reduce chances of doing retries.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-06-07 Thread Petr Jelinek
On 07/06/17 03:00, Andres Freund wrote:
> On 2017-06-06 19:36:13 +0200, Petr Jelinek wrote:
> 
>> As a side note, we are starting to have several IsSomeTypeOfProcess
>> functions for these kind of things. I wonder if bgworker infrastructure
>> should somehow provide this type of stuff (the proposed bgw_type might
>> help there) as well as maybe being able to register interrupt handler
>> for the worker (it's really hard to do it via custom SIGTERM handler as
>> visible on worker, launcher and walsender issues we are fixing).
>> Obviously this is PG11 related thought.
> 
> Maybe it's also a sign that the bgworker infrastructure isn't really the
> best thing to use for internal processes like parallel workers and lrep
> processes - it's really built so core code *doesn't* know anything
> specific about them, which isn't really what we want in some of these
> cases.  That's not to say that bgworkers and parallelism/lrep shouldn't
> share infrastructure, don't get me wrong.
>

Well the nice thing about bgworkers is that it provides the basic
infrastructure. Main reason lrep stuff is using it is that I didn't want
to add another special backend kind to 20 different places, but turns
out it still needs to be in some. So either we need to add more support
for these kind of things to bgworkers or write something for internal
workers that shares the infrastructure with bgworkers as you say.

> 
>>  
>> -LogicalRepCtx->launcher_pid = 0;
>> -
>> -/* ... and if it returns, we're done */
>> -ereport(DEBUG1,
>> -(errmsg("logical replication launcher shutting down")));
>> +/* Not reachable */
>> +}
> 
> Maybe put a pg_unreachable() there?
> 

Ah didn't know it existed.

> 
>> @@ -2848,6 +2849,14 @@ ProcessInterrupts(void)
>>  ereport(FATAL,
>>  (errcode(ERRCODE_ADMIN_SHUTDOWN),
>>   errmsg("terminating logical 
>> replication worker due to administrator command")));
>> +else if (IsLogicalLauncher())
>> +{
>> +ereport(DEBUG1,
>> +(errmsg("logical replication launcher 
>> shutting down")));
>> +
>> +/* The logical replication launcher can be stopped at 
>> any time. */
>> +proc_exit(0);
>> +}
> 
> We could use PgBackendStatus->st_backendType for these, merging these
> different paths.
> 

Hmm, that's not that easy, st_backendType will be generic type for
bgworker as the bgw_type patch didn't land yet (which is discussed in
yet another thread [1]). It seems like an argument for going forward
with it (the bgw_type patch) in PG10.

> I can take care of this one, if you/Peter want.
> 

I don't mind, it has some overlap with your proposed fixes for latching
so if you are willing go ahead.

[1]
https://www.postgresql.org/message-id/flat/0d795703-a885-2193-2331-f00d7a3a4...@2ndquadrant.com#0d795703-a885-2193-2331-f00d7a3a4...@2ndquadrant.com

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding support for Default partition in partitioning

2017-06-07 Thread Ashutosh Bapat
On Sat, Jun 3, 2017 at 2:11 AM, Robert Haas  wrote:
>
> + errmsg("default partition contains row(s)
> that would overlap with partition being created")));
>
> It doesn't really sound right to talk about rows overlapping with a
> partition.  Partitions can overlap with each other, but not rows.
> Also, it's not really project style to use ambiguously plural forms
> like "row(s)" in error messages.  Maybe something like:
>
> new partition constraint for default partition \"%s\" would be
> violated by some row
>

Partition constraint is implementation detail here. We enforce
partition bounds through constraints and we call such constraints as
partition constraints. But a user may not necessarily understand this
term or may interpret it different. Adding "new" adds to the confusion
as the default partition is not new. My suggestion in an earlier mail
was ""default partition contains rows that conflict with the partition
bounds of "part_xyz"", with a note that we should use a better word
than "conflict". So, Jeevan seems to have used overlap, which again is
not correct. How about "default partition contains row/s which would
fit the partition "part_xyz" being created or attached." with a hint
to move those rows to the new partition's table in case of attach. I
don't think hint would be so straight forward i.e. to create the table
with SELECT INTO and then ATTACH.

What do you think?

Also, the error code ERRCODE_CHECK_VIOLATION, which is an "integrity
constraint violation" code, seems misleading. We aren't violating any
integrity here. In fact I am not able to understand, how could adding
an object violate integrity constraint. The nearest errorcode seems to
be ERRCODE_INVALID_OBJECT_DEFINITION, which is also used for
partitions with overlapping bounds.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-06-07 Thread Amit Kapila
On Tue, Jun 6, 2017 at 11:54 PM, Robert Haas  wrote:
> On Mon, Jun 5, 2017 at 2:51 AM, Amit Kapila  wrote:
>>> Greg/Amit's idea of using the CTID field rather than an infomask bit
>>> seems like a possibly promising approach.  Not everything that needs
>>> bit-space can use the CTID field, so using it is a little less likely
>>> to conflict with something else we want to do in the future than using
>>> a precious infomask bit.  However, I'm worried about this:
>>>
>>> /* Make sure there is no forward chain link in t_ctid */
>>> tp.t_data->t_ctid = tp.t_self;
>>>
>>> The comment does not say *why* we need to make sure that there is no
>>> forward chain link, but it implies that some code somewhere in the
>>> system does or at one time did depend on no forward link existing.
>>
>> I think it is to ensure that EvalPlanQual mechanism gets invoked in
>> the right case.   The visibility routine will return HeapTupleUpdated
>> both when the tuple is deleted or updated (updated - has a newer
>> version of the tuple), so we use ctid to decide if we need to follow
>> the tuple chain for a newer version of the tuple.
>
> That would explain why need to make sure that there *is* a forward
> chain link in t_ctid for an update, but it doesn't explain why we need
> to make sure that there *isn't* a forward link for delete.
>

As far as I understand, it is to ensure that for deleted rows, nothing
more needs to be done.  For example, see the below check in
ExecUpdate/ExecDelete.
if (!ItemPointerEquals(tupleid, ))
{
..
}
..

Also a similar check in ExecLockRows.  Now for deleted rows, if the
t_ctid wouldn't point to itself, then in the mentioned functions, we
were not in a position to conclude that the row is deleted.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-07 Thread Thomas Munro
On Wed, Jun 7, 2017 at 7:27 PM, Thomas Munro
 wrote:
> On Wed, Jun 7, 2017 at 10:47 AM, Peter Geoghegan  wrote:
>> I suppose you'll need two tuplestores for the ON CONFLICT DO UPDATE
>> case -- one for updated tuples, and the other for inserted tuples.
>
> Hmm.  Right.  INSERT ... ON CONFLICT DO UPDATE causes both AFTER
> INSERT and AFTER UPDATE statement-level triggers to be fired, but then
> both kinds see all the tuples -- those that were inserted and those
> that were updated.  That's not right.

Or maybe it is right.  We're out of spec here, so we'd basically have
to decide (though MERGE's behaviour would be interesting to compare
with).  What we have seems reasonable for an AFTER INSERT OR UPDATE
statement-level trigger, I think.  It's just a bit questionable if you
asked for just one of those things.  The question is whether the
trigger's 'when' clause is supposed to refer to the type of statement
you ran or the way individual rows turned out to be processed in our
out-of-standard ON CONFLICT case.  If you take the former view then we
could simply decree that such a statement is both an INSERT and an
UPDATE for trigger selection purposes.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Challenges preventing us moving to 64 bit transaction id (XID)?

2017-06-07 Thread Alexander Korotkov
On Tue, Jun 6, 2017 at 4:05 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 6/6/17 08:29, Bruce Momjian wrote:
> > On Tue, Jun  6, 2017 at 06:00:54PM +0800, Craig Ringer wrote:
> >> Tom's point is, I think, that we'll want to stay pg_upgrade
> >> compatible. So when we see a pg10 tuple and want to add a new page
> >> with a new page header that has an epoch, but the whole page is full
> >> so there isn't 32 bits left to move tuples "down" the page, what do we
> >> do?
> >
> > I guess I am missing something.  If you see an old page version number,
> > you know none of the tuples are from running transactions so you can
> > just freeze them all, after consulting the pg_clog.  What am I missing?
> > If the page is full, why are you trying to add to the page?
>
> The problem is if you want to delete from such a page.  Then you need to
> update the tuple's xmax and stick the new xid epoch somewhere.
>
> We had an unconference session at PGCon about this.  These issues were
> all discussed and some ideas were thrown around.  We can expect a patch
> to appear soon, I think.
>

Right.  I'm now working on splitting my large patch for 64-bit xids into
patchset.
I'm planning to post patchset in the beginning of next week.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] [PATCH] Fixed malformed error message on malformed SCRAM message.

2017-06-07 Thread Noah Misch
On Fri, Jun 02, 2017 at 05:58:59AM +, Noah Misch wrote:
> On Mon, May 29, 2017 at 01:43:26PM -0700, Michael Paquier wrote:
> > On Mon, May 29, 2017 at 1:38 PM, Daniele Varrazzo
> >  wrote:
> > > Patch attached
> > 
> > Right. I am adding that to the list of open items, and Heikki in CC
> > will likely take care of it.
> 
> [Action required within three days.  This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 10 open item.  Heikki,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.
> 
> [1] 
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)

2017-06-07 Thread Noah Misch
On Wed, May 31, 2017 at 09:14:17AM -0500, Kevin Grittner wrote:
> On Wed, May 31, 2017 at 1:44 AM, Noah Misch  wrote:
> 
> > IMMEDIATE ATTENTION REQUIRED.
> 
> I should be able to complete review and testing by Friday.  If there
> are problems I might not take action until Monday; otherwise I
> should be able to do so on Friday.

IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is again long past
due for your status update.  Please reacquaint yourself with the policy on
open item ownership[1] and then reply immediately.  If I do not hear from you
by 2017-06-08 08:00 UTC, I will transfer this item to release management team
ownership without further notice.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Challenges preventing us moving to 64 bit transaction id (XID)?

2017-06-07 Thread Heikki Linnakangas

On 06/06/2017 07:24 AM, Ashutosh Bapat wrote:

On Tue, Jun 6, 2017 at 9:48 AM, Craig Ringer  wrote:

On 6 June 2017 at 12:13, Ashutosh Bapat  wrote:


What happens when the epoch is so low that the rest of the XID does
not fit in 32bits of tuple header? Or such a case should never arise?


Storing an epoch implies that rows can't have (xmin,xmax) different by
more than one epoch. So if you're updating/deleting an extremely old
tuple you'll presumably have to set xmin to FrozenTransactionId if it
isn't already, so you can set a new epoch and xmax.


If the page has multiple such tuples, updating one tuple will mean
updating headers of other tuples as well? This means that those tuples
need to be locked for concurrent scans? May be not, since such tuples
will be anyway visible to any concurrent scans and updating xmin/xmax
doesn't change the visibility. But we might have to prevent multiple
updates to the xmin/xmax because of concurrent updates on the same
page.


"Store the epoch in the page header" is actually a slightly 
simpler-to-visualize, but incorrect, version of what we actually need to 
do. If you only store the epoch, then all the XIDs on a page need to 
belong to the same epoch, which causes trouble when the current epoch 
changes. Just after the epoch changes, you cannot necessarily freeze all 
the tuples from the previous epoch, because they would not yet be 
visible to everyone.


The full picture is that we need to store one 64-bit XID "base" value in 
the page header, and all the xmin/xmax values in the tuple headers are 
offsets relative to that base. With that, you effectively have 64-bit 
XIDs, as long as the *difference* between any two XIDs on a page is not 
greater than 2^32. That can be guaranteed, as long as we don't allow a 
transaction to be in-progress for more than 2^32 XIDs. That seems like a 
reasonable limitation.


But yes, when the "current XID - base XID in page header" becomes 
greater than 2^32, and you need to update a tuple on that page, you need 
to first freeze the page, update the base XID on the page header to a 
more recent value, and update the XID offsets on every tuple on the page 
accordingly. And to do that, you need to hold a lock on the page. If you 
don't move any tuples around at the same time, but just update the XID 
fields, and exclusive lock on the page is enough, i.e. you don't need to 
take a super-exclusive or vacuum lock. In any case, it happens so 
infrequently that it should not become a serious burden.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-07 Thread Thomas Munro
On Wed, Jun 7, 2017 at 10:47 AM, Peter Geoghegan  wrote:
> On Mon, Jun 5, 2017 at 6:40 PM, Thomas Munro
>  wrote:
>> After sleeping on it, I don't think we need to make that decision here
>> though.  I think it's better to just move the tuplestores into
>> ModifyTableState so that each embedded DML statement has its own, and
>> have ModifyTable pass them to the trigger code explicitly.
>
> I suppose you'll need two tuplestores for the ON CONFLICT DO UPDATE
> case -- one for updated tuples, and the other for inserted tuples.

Hmm.  Right.  INSERT ... ON CONFLICT DO UPDATE causes both AFTER
INSERT and AFTER UPDATE statement-level triggers to be fired, but then
both kinds see all the tuples -- those that were inserted and those
that were updated.  That's not right.

For example:

postgres=# insert into my_table values ('ID1', 1), ('ID2', 1), ('ID3', 1)
postgres-#   on conflict (a) do
postgres-#   update set counter = my_table.counter + excluded.counter;
NOTICE:  trigger = my_update_trigger, old table = (ID1,1), (ID2,1),
new table = (ID1,2), (ID2,2), (ID3,1)
NOTICE:  trigger = my_insert_trigger, new table = (ID1,2), (ID2,2), (ID3,1)
INSERT 0 3

That's the result of the following:

create or replace function dump_insert() returns trigger language plpgsql as
$$
  begin
raise notice 'trigger = %, new table = %',
  TG_NAME,
  (select string_agg(new_table::text, ', ' order by a) from new_table);
return null;
  end;
$$;

create or replace function dump_update() returns trigger language plpgsql as
$$
  begin
raise notice 'trigger = %, old table = %, new table = %',
  TG_NAME,
  (select string_agg(old_table::text, ', ' order by a) from old_table),
  (select string_agg(new_table::text, ', ' order by a) from new_table);
return null;
  end;
$$;

create table my_table (a text primary key, counter int);

insert into my_table values ('ID1', 1), ('ID2', 1);

create trigger my_insert_trigger
  after insert on my_table
  referencing new table as new_table
  for each statement
  execute procedure dump_insert();

create trigger my_update_trigger
  after update on my_table
  referencing old table as old_table new table as new_table
  for each statement
  execute procedure dump_update();

insert into my_table values ('ID1', 1), ('ID2', 1), ('ID3', 1)
  on conflict (a) do
  update set counter = my_table.counter + excluded.counter;

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Broken hint bits (freeze)

2017-06-07 Thread Vladimir Borodin

> 6 июня 2017 г., в 23:30, Sergey Burladyan  написал(а):
> 
> Dmitriy Sarafannikov  writes:
> 
>> Starting and stopping master after running pg_upgrade but before rsync to 
>> collect statistics
>> was a bad idea.
> 
> But, starting and stopping master after running pg_upgrade is *required*
> by documentation:
> https://www.postgresql.org/docs/9.6/static/pgupgrade.html
>> f. Start and stop the new master cluster
>> In the new master cluster, change wal_level to replica in the 
>> postgresql.conf file and then start and stop the cluster.
> 
> and there is no any suggestion to disable autovacuum for it.


Yep. This should probably be fixed in the documentation?


--
May the force be with you…
https://simply.name



Re: [HACKERS] Adding support for Default partition in partitioning

2017-06-07 Thread Ashutosh Bapat
On Wed, Jun 7, 2017 at 2:08 AM, Jeevan Ladhe
 wrote:
>>
>> This also means that we have to test PREPARED statements involving
>> default partition. Any addition/deletion/attach/detach of other partition
>> should invalidate those cached statements.
>
>
> Will add this in next version of patch.

My earlier statement requires a clarification. By "PREPARED statements
involving default partition." I mean PREPAREd statements with direct
access to the default partition, without going through the partitioned
table.

>
>>
>> The code in check_default_allows_bound() to check whether the default
>> partition
>> has any rows that would fit new partition looks quite similar to the code
>> in
>> ATExecAttachPartition() checking whether all rows in the table being
>> attached
>> as a partition fit the partition bounds. One thing that
>> check_default_allows_bound() misses is, if there's already a constraint on
>> the
>> default partition refutes the partition constraint on the new partition,
>> we can
>> skip the scan of the default partition since it can not have rows that
>> would
>> fit the new partition. ATExecAttachPartition() has code to deal with a
>> similar
>> case i.e. the table being attached has a constraint which implies the
>> partition
>> constraint. There may be more cases which check_default_allows_bound()
>> does not
>> handle but ATExecAttachPartition() handles. So, I am wondering whether
>> it's
>> better to somehow take out the common code into a function and use it. We
>> will
>> have to deal with a difference through. The first one would throw an error
>> when
>> finding a row that satisfies partition constraints whereas the second one
>> would
>> throw an error when it doesn't find such a row. But this difference can be
>> handled through a flag or by negating the constraint. This would also take
>> care
>> of Amit Langote's complaint about foreign partitions. There's also another
>> difference that the ATExecAttachPartition() queues the table for scan and
>> the
>> actual scan takes place in ATRewriteTable(), but there is not such queue
>> while
>> creating a table as a partition. But we should check if we can reuse the
>> code to
>> scan the heap for checking a constraint.
>>
>> In case of ATTACH PARTITION, probably we should schedule scan of default
>> partition in the alter table's work queue like what
>> ATExecAttachPartition() is
>> doing for the table being attached. That would fit in the way alter table
>> works.
>
>
> I am still working on this.
> But, about your comment here:
> "if there's already a constraint on the default partition refutes the
> partition
> constraint on the new partition, we can skip the scan":
> I am so far not able to imagine such a case, since default partition
> constraint
> can be imagined something like "minus infinity to positive infinity with
> some finite set elimination", and any new non-default partition being added
> would simply be a set of finite values(at-least in case of list, but I think
> range
> should not also differ here). Hence one cannot imply the other here.
> Possibly,
> I might be missing something that you had visioned when you raised the flag,
> please correct me if I am missing something.

I am hoping that this has been clarified in other mails in this thread
between you and Amul.

>
>>
>>  /* Generate the main expression, i.e., keyCol = ANY (arr) */
>>  opexpr = make_partition_op_expr(key, 0, BTEqualStrategyNumber,
>> -keyCol, (Expr *) arr);
>> +keyCol, (Expr *) arr,
>> spec->is_default);
>>  /* Build leftop = ANY (rightop) */
>>  saopexpr = makeNode(ScalarArrayOpExpr);
>> The comments in both the places need correction, as for default partition
>> the
>> expression will be keyCol <> ALL(arr).
>
>
> Done.

Please note that this changes, if you construct the constraint as
!(keycol = ANY[]).

>
>> We have RelationGetPartitionDesc() for
>> that. Probably we should also add Asserts to check that every pointer in
>> the
>> long pointer chain is Non-null.
>
>
> I am sorry, but I did not understand which chain you are trying to point
> here.

The chain of pointers: a->b->c->d is a chain of pointers.

>
>>
>> @@ -2044,7 +2044,7 @@ psql_completion(const char *text, int start, int
>> end)
>>  COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, "");
>>  /* Limited completion support for partition bound specification */
>>  else if (TailMatches3("ATTACH", "PARTITION", MatchAny))
>> -COMPLETE_WITH_CONST("FOR VALUES");
>> +COMPLETE_WITH_LIST2("FOR VALUES", "DEFAULT");
>>  else if (TailMatches2("FOR", "VALUES"))
>>  COMPLETE_WITH_LIST2("FROM (", "IN (");
>>
>> @@ -2483,7 +2483,7 @@ psql_completion(const char *text, int start, int
>> end)
>>  COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_partitioned_tables,
>> "");
>>  /* Limited completion support for 

Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-07 Thread Thomas Munro
On Wed, Jun 7, 2017 at 12:19 PM, Peter Geoghegan  wrote:
> On Tue, Jun 6, 2017 at 5:01 PM, Peter Geoghegan  wrote:
>> Also, ISTM that the code within ENRMetadataGetTupDesc() probably
>> requires more explanation, resource management wise.
>
> Also, it's not clear why it should be okay that the new type of
> ephemeral RTEs introduced don't have permissions checks. There are
> currently cases where the user cannot see data that they inserted
> themselves (e.g., through RETURNING), on the theory that a before row
> trigger might have modified the final contents of the tuple in a way
> that the original inserter isn't supposed to know details about.
>
> As the INSERT docs say, "Use of the RETURNING clause requires SELECT
> privilege on all columns mentioned in RETURNING". Similarly, the
> excluded.* pseudo-relation requires select privilege (on the
> corresponding target relation columns) in order to be usable by ON
> CONFLICT DO UPDATE.

This is an interesting question and one that was mentioned here (near
the bottom):

https://www.postgresql.org/message-id/CACjxUsO%3DsquN2XbYBM6qLfS9co%3DbfGqQFxMfY%2BpjyAGKP_jpwQ%40mail.gmail.com

I'm pretty sure that whatever applies to the NEW and OLD variables
should also apply to the new table and old table, because the former
are conceptually range variables over the latter.  Without having
checked either the standard or our behaviour for NEW and OLD, I'll bet
you one internet point that they say we should apply the same access
restrictions as the underlying table, and we don't.  Am I wrong?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >