Re: [HACKERS] Missing comment for max_logical_replication_workers in postgresql.conf.sample

2017-07-26 Thread Masahiko Sawada
On Thu, Jul 27, 2017 at 10:14 AM, Yugo Nagata  wrote:
> Hi,
>
> I found that postgresql.conf.sample is missing a comment
> to note that changing max_logical_replication_workers requires
> restart of the server.
>
> Other such parameters has the comments, so I think the new
> parameter also needs this. Attached is a simple patch to fix
> this.
>

Good point. Similarly, dynamic_shared_memory_type and event_source are
required restarting to change but are not mentioned in
postgresql.conf.sample. Should we add a comment as well?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Log LDAP "diagnostic messages"?

2017-07-26 Thread Thomas Munro
On Wed, Jul 26, 2017 at 10:40 PM, Ashutosh Bapat
 wrote:
> On Wed, Jul 26, 2017 at 6:51 AM, Thomas Munro
>  wrote:
>> Something like the attached.
>
> The patch prints errdetail() as "No LDAP diagnostic message
> available." when LDAP doesn't provide diagnostics. May be some error
> messages do not have any diagnostic information. In that case above
> error detail may be confusing. May be we should just omit error
> details when diagnostic message is not available.

Agreed.  Here's a version that skips those useless detail messages
using a coding pattern I found elsewhere.  For example, on my system,
trying to log in with the wrong password causes an "Invalid
credentials" error with no extra "DETAIL:" line logged, but trying to
use TLS when it hasn't been configured properly logs a helpful
diagnostic message.

Thanks for the review!

I also noticed a couple of other things in passing and fixed them in
this new version:

1.  In one place we call ldap_get_option(LDAP_OPT_ERROR_NUMBER) after
ldap_unbind_s().  My man page says "Once [ldap_unbind()] is called,
the connection to the LDAP server is closed, and the ld structure is
invalid."  So I don't think we should do that, even if it didn't
return LDAP_SUCCESS.  I have no idea if any implementation would
actually fail to unbind and what state the LDAP object would be in if
it did: this is essentially the destructor function for LDAP
connections, so what are you supposed to do after that?  But using the
LDAP object again seems wrong to me.

2.  In several code paths we don't call ldap_unbind() on the way out,
which is technically leaking a connection.  Failure to authenticate is
FATAL to the backend anyway so it probably doesn't matter much, but
hanging up without saying goodbye is considered discourteous by
some[1].

Thoughts anyone?

[1] https://www.ldap.com/the-ldap-unbind-operation

-- 
Thomas Munro
http://www.enterprisedb.com


ldap-diagnostic-message-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] A couple of postgresql.conf.sample discrepancies

2017-07-26 Thread Thomas Munro
Hi hackers,

Here's a script that reminds you about GUCs you forgot to put in
postgresql.conf.sample.  It probably needs some work.  Does this
already happen somewhere else?  I guess not, because it found two
discrepancies:

$ ./src/tools/check_sample_config.pl
enable_gathermerge appears in guc.c but not in postgresql.conf.sample
trace_recovery_messages appears in guc.c but not in postgresql.conf.sample

I think the first should be listed in postgresql.conf.sample, but the
second should probably be flagged as GUC_NOT_IN_SAMPLE.  See attached.

-- 
Thomas Munro
http://www.enterprisedb.com


fix-sample-config.patch
Description: Binary data


check-sample-config-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transactions involving multiple postgres foreign servers

2017-07-26 Thread Masahiko Sawada
On Thu, Jul 27, 2017 at 10:28 AM, Robert Haas  wrote:
> On Fri, Apr 7, 2017 at 10:56 AM, Masahiko Sawada  
> wrote:
>> Vinayak, why did you marked this patch as "Move to next CF"? AFAIU
>> there is not discussion yet.
>
> I'd like to discuss this patch.  Clearly, a lot of work has been done
> here, but I am not sure about the approach.

Thank you for the comment. I'd like to reply about the goal of this
feature first.

> If we were to commit this patch set, then you could optionally enable
> two_phase_commit for a postgres_fdw foreign server.  If you did, then,
> modulo bugs and administrator shenanigans, and given proper
> configuration, you would be guaranteed that a successful commit of a
> transaction which touched postgres_fdw foreign tables would eventually
> end up committed or rolled back on all of the nodes, rather than
> committed on some and rolled back on others.  However, you would not
> be guaranteed that all of those commits or rollbacks happen at
> anything like the same time.  So, you would have a sort of eventual
> consistency.  Any given snapshot might not be consistent, but if you
> waited long enough and with all nodes online, eventually all
> distributed transactions would be resolved in a consistent manner.
> That's kinda cool, but I think what people really want is a stronger
> guarantee, namely, that they will get consistent snapshots.  It's not
> clear to me that this patch gets us any closer to that goal.  Does
> anyone have a plan for how we'd get from here to that stronger goal?
> If not, is the patch useful enough to justify committing it for what
> it can already do?  It would be particularly good to hear some
> end-user views on this functionality and whether or not they would use
> it and find it valuable.

Yeah, this patch only guarantees that if you got a commit the
transaction either committed or rollback-ed on all relevant nodes.
And subsequent transactions can see a consistent result (if the server
failed we have to recover in-doubt transactions properly from a
crash). But it doesn't guarantees that a concurrent transaction can
see a consistent result. To provide seeing cluster-wide consistent
result, I think we need a transaction manager for distributed queries
which is responsible for providing consistent snapshots. There were
some discussions of the type of transaction manager but at least we
need a new transaction manager for distributed queries. I think the
providing a consistent result to concurrent transactions and the
committing or rollback-ing atomically a transaction should be
separated features, and should be discussed separately. It's not
useful and users would complain if we provide a consistent snapshot
but a distributed transaction could commit on part of nodes. So this
patch could be also an important feature for providing consistent
result.

> On a technical level, I am pretty sure that it is not OK to call
> AtEOXact_FDWXacts() from the sections of CommitTransaction,
> AbortTransaction, and PrepareTransaction that are described as
> "non-critical resource releasing".  At that point, it's too late to
> throw an error, and it is very difficult to imagine something that
> involves a TCP connection to another machine not being subject to
> error.  You might say "well, we can just make sure that any problems
> are reporting as a WARNING rather than an ERROR", but that's pretty
> hard to guarantee; most backend code assumes it can ERROR, so anything
> you call is a potential hazard.  There is a second problem, too: any
> code that runs from here is not interruptible.  The user can hit ^C
> all day and nothing will happen.  That's a bad situation when you're
> busy doing network I/O.  I'm not exactly sure what the best thing to
> do about this problem would be.
>

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] expand_dbname in postgres_fdw

2017-07-26 Thread Ashutosh Bapat
On Thu, Jul 27, 2017 at 12:21 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Wed, Jul 26, 2017 at 5:38 AM, Ashutosh Bapat
>>  wrote:
>>> According to F.34.1.1 at [1] passing connection string as dbname
>>> option should work, so your question is valid. I am not aware of any
>>> discussion around this on hackers.
>
>> I kind of wonder if this had some security aspect to it?  But not sure.
>
> The main problem to my mind is that a connection string could possibly
> override items meant to be specified elsewhere.  In particular it ought
> not be allowed to specify the remote username or password, because those
> are supposed to come from the user mapping object not the server object.
> I suspect you could break things by trying to specify client_encoding
> there, as well.

+1.

>
> In any case, I entirely reject the argument that the existing
> documentation says this should work.  It says that you can specify (most
> of) the same fields that are allowed in a connection string, not that one
> of those fields might be taken to *be* a connection string.
>

Section F.34.1.1. at [1] says
"A foreign server using the postgres_fdw foreign data wrapper can have
the same options that libpq accepts in connection strings, as
described in Section 33.1.2, except that these options are not
allowed:". When it says, " accepts same options", users would
interpret it as "accept in the same manner as specified in the
referenced section". Also, dbname is not one of the listed exceptions,
so a user would expect same behaviour when the same value for dbname
option is provided in foreign server options and libpq connection
string. In the referenced section "dbname" is described as

--
dbname

The database name. Defaults to be the same as the user name. In
certain contexts, the value is checked for extended formats; see
Section 33.1.1 for more details on those.
--

There is some grey area where different people will interpret those
sentences in different manner. So, may be better to say that "dbname"
option in foreign server accepts only database names.

[1] 
https://www.postgresql.org/docs/10/static/postgres-fdw.html#idm44880567492496
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility

2017-07-26 Thread Chapman Flack
On 07/17/17 11:29, Michael Paquier wrote:

> FWIW, I would rather see any optimization done in
> AdvanceXLInsertBuffer() instead of seeing a second memset re-zeroing
> the WAL page header after its data has been initialized by
> AdvanceXLInsertBuffer() once.

Is that an aesthetic 'rather', or is there a technical advantage you
have in mind?

I also began by looking at how to stop AdvanceXLInsertBuffer()
initializing headers and taking locks when neither is needed.
But Heikki's just-rezero-them suggestion has a definite simplicity
advantage. It can be implemented entirely with a tight group of
lines added to CopyXLogRecordToWAL, as opposed to modifying
AdvanceXLInsertBuffer in a few distinct places, adding a parameter,
and changing its call sites.

There's a technical appeal to making the changes in AdvanceXLInsertBuffer
(who wants to do unnecessary initialization and locking?), but the amount
of unnecessary work that can be avoided is proportional to the number of
unused pages at switch time, meaning it is largest when the system
is least busy, and may be of little practical concern.

Moreover, optimizing AdvanceXLInsertBuffer would reveal one more
complication: some of the empty pages about to be written out may
have been initialized opportunistically in earlier calls to
AdvanceXLInsertBuffer, so those already have populated headers, and
would need rezeroing anyway. And not necessarily just an insignificant
few of them: if XLOGChooseNumBuffers chose the maximum, it could even
be all of them.

That might also be handled by yet another conditional within
AdvanceXLInsertBuffer. But with all of that in view, maybe it is
just simpler to have one loop in CopyXLogRecordToWAL simply zero them all,
and leave AdvanceXLInsertBuffer alone, so no complexity is added when it
is called from other sites that are arguably hotter.

Zeroing SizeOfXLogShortPHD bytes doesn't cost a whole lot.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server

2017-07-26 Thread Noah Misch
On Fri, Jul 21, 2017 at 07:04:32PM -0400, Stephen Frost wrote:
> Masahiko, all,
> 
> * Masahiko Sawada (sawada.m...@gmail.com) wrote:
> > On Tue, Jul 18, 2017 at 1:28 PM, Stephen Frost  wrote:
> > > Masahiko, Michael,
> > >
> > > * Masahiko Sawada (sawada.m...@gmail.com) wrote:
> > >> > This is beginning to shape.
> > >>
> > >> Sorry, I missed lots of typo in the last patch. All comments from you
> > >> are incorporated into the attached latest patch and I've checked it
> > >> whether there is other typos. Please review it.
> > >
> > > I've taken an initial look through the patch and it looks pretty
> > > reasonable.  I need to go over it in more detail and work through
> > > testing it myself next.
> > >
> > > I expect to commit this (with perhaps some minor tweaks primairly around
> > > punctuation/wording), barring any issues found, on Wednesday or Thursday
> > > of this week.
> > 
> > I understood. Thank you for looking at this!
> 
> I started discussing this with David off-list and he'd like a chance to
> review it in a bit more detail (he's just returning from being gone for
> a few weeks).  That review will be posted to this thread on Monday, and
> I'll wait until then to move forward with the patch.
> 
> Next update will be before Tuesday, July 25th, COB.

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] segfault in HEAD when too many nested functions call

2017-07-26 Thread Noah Misch
On Mon, Jul 24, 2017 at 08:04:30AM +0100, Andres Freund wrote:
> 
> 
> On July 24, 2017 7:10:19 AM GMT+01:00, Noah Misch  wrote:
> >On Tue, Jul 18, 2017 at 01:04:10PM -0700, Andres Freund wrote:
> >> Ok, I'll flesh out the patch till Thursday.  But I do think we're
> >going
> >> to have to do something about the back branches, too.
> >
> >This PostgreSQL 10 open item is past due for your status update. 
> >Kindly send
> >a status update within 24 hours, and include a date for your subsequent
> >status
> >update.  Refer to the policy on open item ownership:
> >https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> 
> I sent out a note fleshed out patch last week, which Tom reviewed. Planning 
> to update it to address that review today or tomorrow.

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A bug in mapping attributes in ATExecAttachPartition()

2017-07-26 Thread Amit Langote
Thanks for looking at this again.

On 2017/07/26 23:31, Ashutosh Bapat wrote:
> On Wed, Jul 12, 2017 at 7:17 AM, Amit Langote
>  wrote:
>>
>> Thanks for the review.  Patch updated taking care of the comments.
> 
> The patches still apply and compile. make check runs well. I do not
> have any further review comments. Given that they address a bug,
> should we consider those for v10?

At least patch 0001 does address a bug.  Not sure if we can say that 0002
addresses a bug; it implements a feature that might be a
nice-to-have-in-PG-10.

Attaching rebased patches.

Thanks,
Amit
From 3edfe27c6a972b09fa9ec369e7dc33d1014bfef8 Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 14 Jun 2017 11:32:01 +0900
Subject: [PATCH 1/2] Cope with differing attnos in ATExecAttachPartition code

If the table being attached has attnos different from the parent for
the partitioning columns which are present in the partition constraint
expressions, then predicate_implied_by() will prematurely return false
due to structural inequality of the corresponding Var expressions in the
the partition constraint and those in the table's check constraint
expressions.  Fix this by mapping the partition constraint's expressions
to bear the partition's attnos.

Further, if the validation scan needs to be performed after all and
the table being attached is a partitioned table, we will need to map
the constraint expression again to change the attnos to the individual
leaf partition's attnos from those of the table being attached.
---
 src/backend/commands/tablecmds.c  | 87 +++
 src/test/regress/expected/alter_table.out | 45 
 src/test/regress/sql/alter_table.sql  | 38 ++
 3 files changed, 137 insertions(+), 33 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index bb00858ad1..8047c9a7bc 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -13421,7 +13421,7 @@ ATExecAttachPartition(List **wqueue, Relation rel, 
PartitionCmd *cmd)
 {
RelationattachRel,
catalog;
-   List   *childrels;
+   List   *attachRel_children;
TupleConstr *attachRel_constr;
List   *partConstraint,
   *existConstraint;
@@ -13489,10 +13489,25 @@ ATExecAttachPartition(List **wqueue, Relation rel, 
PartitionCmd *cmd)
/*
 * Prevent circularity by seeing if rel is a partition of attachRel. (In
 * particular, this disallows making a rel a partition of itself.)
+*
+* We do that by checking if rel is a member of the list of attachRel's
+* partitions provided the latter is partitioned at all.  We want to 
avoid
+* having to construct this list again, so we request the strongest lock
+* on all partitions.  We need the strongest lock, because we may decide
+* to scan them if we find out that the table being attached (or its 
leaf
+* partitions) may contain rows that violate the partition constraint.
+* If the table has a constraint that would prevent such rows, which by
+* definition is present in all the partitions, we need not scan the
+* table, nor its partitions.  But we cannot risk a deadlock by taking a
+* weaker lock now and the stronger one only when needed.
+*
+* XXX - Do we need to lock the partitions here if we already have the
+* strongest lock on attachRel?  The information we need here to check
+* for circularity cannot change without taking a lock on attachRel.
 */
-   childrels = find_all_inheritors(RelationGetRelid(attachRel),
-   
AccessShareLock, NULL);
-   if (list_member_oid(childrels, RelationGetRelid(rel)))
+   attachRel_children = find_all_inheritors(RelationGetRelid(attachRel),
+   
 AccessExclusiveLock, NULL);
+   if (list_member_oid(attachRel_children, RelationGetRelid(rel)))
ereport(ERROR,
(errcode(ERRCODE_DUPLICATE_TABLE),
 errmsg("circular inheritance not allowed"),
@@ -13603,6 +13618,13 @@ ATExecAttachPartition(List **wqueue, Relation rel, 
PartitionCmd *cmd)
partConstraint = list_make1(make_ands_explicit(partConstraint));
 
/*
+* Adjust the generated constraint to match this partition's attribute
+* numbers.
+*/
+   partConstraint = map_partition_varattnos(partConstraint, 1, attachRel,
+   
 rel);
+
+   /*
 * Check if we can do away with having to scan the table being attached 
to
 * validate the partition 

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2017-07-26 Thread Robert Haas
On Fri, Apr 7, 2017 at 10:56 AM, Masahiko Sawada  wrote:
> Vinayak, why did you marked this patch as "Move to next CF"? AFAIU
> there is not discussion yet.

I'd like to discuss this patch.  Clearly, a lot of work has been done
here, but I am not sure about the approach.

If we were to commit this patch set, then you could optionally enable
two_phase_commit for a postgres_fdw foreign server.  If you did, then,
modulo bugs and administrator shenanigans, and given proper
configuration, you would be guaranteed that a successful commit of a
transaction which touched postgres_fdw foreign tables would eventually
end up committed or rolled back on all of the nodes, rather than
committed on some and rolled back on others.  However, you would not
be guaranteed that all of those commits or rollbacks happen at
anything like the same time.  So, you would have a sort of eventual
consistency.  Any given snapshot might not be consistent, but if you
waited long enough and with all nodes online, eventually all
distributed transactions would be resolved in a consistent manner.
That's kinda cool, but I think what people really want is a stronger
guarantee, namely, that they will get consistent snapshots.  It's not
clear to me that this patch gets us any closer to that goal.  Does
anyone have a plan for how we'd get from here to that stronger goal?
If not, is the patch useful enough to justify committing it for what
it can already do?  It would be particularly good to hear some
end-user views on this functionality and whether or not they would use
it and find it valuable.

On a technical level, I am pretty sure that it is not OK to call
AtEOXact_FDWXacts() from the sections of CommitTransaction,
AbortTransaction, and PrepareTransaction that are described as
"non-critical resource releasing".  At that point, it's too late to
throw an error, and it is very difficult to imagine something that
involves a TCP connection to another machine not being subject to
error.  You might say "well, we can just make sure that any problems
are reporting as a WARNING rather than an ERROR", but that's pretty
hard to guarantee; most backend code assumes it can ERROR, so anything
you call is a potential hazard.  There is a second problem, too: any
code that runs from here is not interruptible.  The user can hit ^C
all day and nothing will happen.  That's a bad situation when you're
busy doing network I/O.  I'm not exactly sure what the best thing to
do about this problem would be.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Missing comment for max_logical_replication_workers in postgresql.conf.sample

2017-07-26 Thread Yugo Nagata
Hi,

I found that postgresql.conf.sample is missing a comment
to note that changing max_logical_replication_workers requires
restart of the server.

Other such parameters has the comments, so I think the new
parameter also needs this. Attached is a simple patch to fix
this.

Regards,

-- 
Yugo Nagata 
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 2b1ebb7..d624ad3 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -277,6 +277,7 @@
 # These settings are ignored on a publisher.
 
 #max_logical_replication_workers = 4	# taken from max_worker_processes
+		# (change requires restart)
 #max_sync_workers_per_subscription = 2	# taken from max_logical_replication_workers
 
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refreshing subscription relation state inside a transaction block

2017-07-26 Thread Masahiko Sawada
On Wed, Jul 26, 2017 at 10:29 PM, Robert Haas  wrote:
> On Mon, Jun 19, 2017 at 4:30 AM, Masahiko Sawada  
> wrote:
>>> I think that either of the options you suggested now would be better.
>>> We'll need that for stopping the tablesync which is in progress during
>>> DropSubscription as well as those will currently still keep running. I
>>> guess we could simply just register syscache callback, the only problem
>>> with that is we'd need to AcceptInvalidationMessages regularly while we
>>> do the COPY which is not exactly free so maybe killing at the end of
>>> transaction would be better (both for refresh and drop)?
>>
>> Attached patch makes table sync worker check its relation subscription
>> state at the end of COPY and exits if it has disappeared, instead of
>> killing table sync worker in DDL. There is still a problem that a
>> table sync worker for a large table can hold a slot for a long time
>> even after its state is deleted. But it would be for PG11 item.
>
> Do we still need to do something about this?  Should it be an open item?
>

Thank you for looking at this.

Yeah, I think it should be added to the open item list. The patch is
updated by Petr and discussed on another thread[1] that also addresses
other issues of subscription codes. 0004 patch of that thread is an
updated patch of the patch attached on this thread.

[1] 
https://www.postgresql.org/message-id/4c6e7ab3-091e-6336-df38-28937b153e64%402ndquadrant.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More race conditions in logical replication

2017-07-26 Thread Alvaro Herrera
Alvaro Herrera wrote:

> Hmm, yeah, that's not good.  However, I didn't like the idea of putting
> it inside the locked area, as it's too much code.  Instead I added just
> before acquiring the spinlock.  We cancel the sleep unconditionally
> later on if we didn't need to sleep after all.

I just noticed that Jacana failed again in the subscription test, and it
looks like it's related to this.  I'll take a look tomorrow if no one
beats me to it.
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana=2017-07-26%2014%3A39%3A54

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] asynchronous execution

2017-07-26 Thread Tom Lane
Robert Haas  writes:
> Ostensibly, the advantage of this framework over my previous proposal
> is that it avoids inserting anything into ExecProcNode(), which is
> probably a good thing to avoid given how frequently ExecProcNode() is
> called.  Unless the parent and the child both know about asynchronous
> execution and choose to use it, everything runs exactly as it does
> today and so there is no possibility of a complaint about a
> performance hit.  As far as it goes, that is good.

> However, at a deeper level, I fear we haven't really solved the
> problem.  If an Append is directly on top of a ForeignScan node, then
> this will work.  But if an Append is indirectly on top of a
> ForeignScan node with some other stuff in the middle, then it won't -
> unless we make whichever nodes appear between the Append and the
> ForeignScan async-capable.

I have not been paying any attention to this thread whatsoever,
but I wonder if you can address your problem by building on top of
the ExecProcNode replacement that Andres is working on,
https://www.postgresql.org/message-id/20170726012641.bmhfcp5ajpudi...@alap3.anarazel.de

The scheme he has allows $extra_stuff to be injected into ExecProcNode at
no cost when $extra_stuff is not needed, because you simply don't insert
the wrapper function when it's not needed.  I'm not sure that it will
scale well to several different kinds of insertions though, for instance
if you wanted both instrumentation and async support on the same node.
But maybe those two couldn't be arms-length from each other anyway,
in which case it might be fine as-is.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bug in locking an update tuple chain

2017-07-26 Thread Alvaro Herrera
> The attached patch fixes the problem.  When locking some old tuple version of
> the chain, if we detect that we already hold that lock
> (test_lockmode_for_conflict returns HeapTupleSelfUpdated), do not try to lock
> it again but instead skip ahead to the next version.  This fixes the synthetic
> case in my isolationtester as well as our customer's production case.

Pushed.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] asynchronous execution

2017-07-26 Thread Robert Haas
On Tue, Jul 25, 2017 at 5:11 AM, Kyotaro HORIGUCHI
 wrote:
>  [ new patches ]

I spent some time today refreshing my memory of what's going with this
thread today.

Ostensibly, the advantage of this framework over my previous proposal
is that it avoids inserting anything into ExecProcNode(), which is
probably a good thing to avoid given how frequently ExecProcNode() is
called.  Unless the parent and the child both know about asynchronous
execution and choose to use it, everything runs exactly as it does
today and so there is no possibility of a complaint about a
performance hit.  As far as it goes, that is good.

However, at a deeper level, I fear we haven't really solved the
problem.  If an Append is directly on top of a ForeignScan node, then
this will work.  But if an Append is indirectly on top of a
ForeignScan node with some other stuff in the middle, then it won't -
unless we make whichever nodes appear between the Append and the
ForeignScan async-capable.  Indeed, we'd really want all kinds of
joins and aggregates to be async-capable so that examples like the one
Corey asked about in
http://postgr.es/m/CADkLM=fuvVdKvz92XpCRnb4=rj6bLOhSLifQ3RV=sb4q5rj...@mail.gmail.com
will work.

But if we do, then I fear we'll just be reintroducing the same
performance regression that we introduced by switching to this
framework from the previous one - or maybe a different one, but a
regression all the same.  Every type of intermediate node will have to
have a code path where it uses ExecAsyncRequest() /
ExecAyncHogeResponse() rather than ExecProcNode() to get tuples, and
it seems like that will either end up duplicating a lot of code from
the regular code path or, alternatively, polluting the regular code
path with some of the async code's concerns to avoid duplication, and
maybe slowing things down.

Maybe that concern is unjustified; I'm not sure.  Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump does not handle indirectly-granted permissions properly

2017-07-26 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> AFAICT, pg_dump has no notion that it needs to be careful about the order
>> in which permissions are granted.  I did

> I'm afraid that's correct, though I believe that's always been the case.
> I spent some time looking into this today and from what I've gathered so
> far, there's essentially an implicit (or at least, I couldn't find any
> explicit reference to it) ordering in ACLs whereby a role which was
> given a GRANT OPTION always appears in the ACL list before an ACL entry
> where that role is GRANT'ing that option to another role, and this is
> what pg_dump was (again, implicitly, it seems) depending on to get this
> correct in prior versions.

Yeah, I suspected that was what made it work before.  I think the ordering
just comes from the fact that new ACLs are added to the end, and you can't
add an entry as a non-owner unless there's a grant WGO there already.

> Pulling apart the ACL list and rebuilding it to handle adding/revoking
> of default options on objects ends up, in some cases, changing the
> ordering around for the ACLs and that's how pg_dump comes to emit the
> GRANT commands in the wrong order.

Right.

> I don't, at the moment, think we actually need to do any checks in the
> backend code to make sure that the implicit ordering is always held,
> assuming we make this change in pg_dump.  I do wonder if it might be
> possible, with the correct set of GRANTs (perhaps having role
> memberships coming into play also, as discussed in the header of
> select_best_grantor()) to generate an ACL list with an "incorrect"
> ordering which would end up causing issues in older releases with
> pg_dump.  We've had precious little complaints from the field about this
> and so, even if we were to generate such a case, I'm not sure that we'd
> want to add all the code necessary to avoid it and then back-patch it.

I suspect it would be easier to modify the backend code that munges ACL
lists so that it takes care to preserve that property, if we ever find
there are cases where it does not.  It seems plausible to me that
pg_dump is not the only code that depends on that ordering.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] segfault in HEAD when too many nested functions call

2017-07-26 Thread Tom Lane
Andres Freund  writes:
> On 2017-07-26 15:03:37 -0400, Tom Lane wrote:
>> Hm, that seems kinda backwards to me; I was envisioning the checks
>> moving to the callees not the callers.  I think it'd end up being
>> about the same number of occurrences of CHECK_FOR_INTERRUPTS(),
>> and there would be less of a judgment call about where to put them.

> Hm, that seems a bit riskier - easy to forget one of the places where we
> might need a CFI().

I would argue the contrary.  If we put a CFI at the head of each node
execution function, then it's just boilerplate that you copy-and-paste
when you invent a new node type.  The way you've coded it here, it
seems to involve a lot of judgment calls.  That's very far from being
copy and paste, and the more different it looks from one node type
to another, the easier it will be to forget it.

> We certainly are missing a bunch of them in various nodes

It's certainly possible that there are long-running loops not involving
any ExecProcNode recursion at all, but that would be a bug independent
of this issue.  The CFI in ExecProcNode itself can be replaced exactly
either by asking all callers to do it, or by asking all callees to do it.
I think the latter is going to be more uniform and harder to screw up.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testlib.pm vs msys

2017-07-26 Thread Andrew Dunstan


On 07/26/2017 11:12 AM, Tom Lane wrote:
> Andrew Dunstan  writes:
>
>>> I still find this pretty ugly :-(.
>> I'm open to alternative suggestions. But I don't want to spend a heck of
>> a lot of time on this.
> Well, let's at least do the temp files a little more safely.
> Maybe like this?
>
>   


OK  tested with that and it works.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] segfault in HEAD when too many nested functions call

2017-07-26 Thread Andres Freund
On 2017-07-26 15:03:37 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I've moved the CHECK_FOR_INTERRUPTS() to the callsites. That
> > unsurprisingly ends up being somewhat verbose, and there's a bunch of
> > minor judgement calls where exactly to place them. While doing so I've
> > also added a few extra ones.  Did this in a separate patch to make it
> > easier to review.
> 
> Hm, that seems kinda backwards to me; I was envisioning the checks
> moving to the callees not the callers.  I think it'd end up being
> about the same number of occurrences of CHECK_FOR_INTERRUPTS(),
> and there would be less of a judgment call about where to put them.

Hm, that seems a bit riskier - easy to forget one of the places where we
might need a CFI(). We certainly are missing a bunch of them in various
nodes - I tried to add ones I saw as missing, but it's quite some
code. Keeping them close to ExecProcNode() makes that call easier.  I'm
not quite seing how solely putting them in callees removes the judgement
call issue?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump does not handle indirectly-granted permissions properly

2017-07-26 Thread Stephen Frost
Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> AFAICT, pg_dump has no notion that it needs to be careful about the order
> in which permissions are granted.  I did

I'm afraid that's correct, though I believe that's always been the case.
I spent some time looking into this today and from what I've gathered so
far, there's essentially an implicit (or at least, I couldn't find any
explicit reference to it) ordering in ACLs whereby a role which was
given a GRANT OPTION always appears in the ACL list before an ACL entry
where that role is GRANT'ing that option to another role, and this is
what pg_dump was (again, implicitly, it seems) depending on to get this
correct in prior versions.

Pulling apart the ACL list and rebuilding it to handle adding/revoking
of default options on objects ends up, in some cases, changing the
ordering around for the ACLs and that's how pg_dump comes to emit the
GRANT commands in the wrong order.

Looks like what is needed is an explicit ordering to the ACLs in
pg_dump to ensure that it emits the GRANTs in the correct order, which
should be that a given GRANTOR's rights must be before any ACL which
that GRATOR granted.  Ideally, that could be crafted into the SQL query
which is sent to the server, but I haven't quite figured out if that'll
be possible or not.  If not, it shouldn't be too hard to implement in
pg_dump directly.

I don't, at the moment, think we actually need to do any checks in the
backend code to make sure that the implicit ordering is always held,
assuming we make this change in pg_dump.  I do wonder if it might be
possible, with the correct set of GRANTs (perhaps having role
memberships coming into play also, as discussed in the header of
select_best_grantor()) to generate an ACL list with an "incorrect"
ordering which would end up causing issues in older releases with
pg_dump.  We've had precious little complaints from the field about this
and so, even if we were to generate such a case, I'm not sure that we'd
want to add all the code necessary to avoid it and then back-patch it.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] segfault in HEAD when too many nested functions call

2017-07-26 Thread Tom Lane
Andres Freund  writes:
> I've moved the CHECK_FOR_INTERRUPTS() to the callsites. That
> unsurprisingly ends up being somewhat verbose, and there's a bunch of
> minor judgement calls where exactly to place them. While doing so I've
> also added a few extra ones.  Did this in a separate patch to make it
> easier to review.

Hm, that seems kinda backwards to me; I was envisioning the checks
moving to the callees not the callers.  I think it'd end up being
about the same number of occurrences of CHECK_FOR_INTERRUPTS(),
and there would be less of a judgment call about where to put them.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] expand_dbname in postgres_fdw

2017-07-26 Thread Tom Lane
Robert Haas  writes:
> On Wed, Jul 26, 2017 at 5:38 AM, Ashutosh Bapat
>  wrote:
>> According to F.34.1.1 at [1] passing connection string as dbname
>> option should work, so your question is valid. I am not aware of any
>> discussion around this on hackers.

> I kind of wonder if this had some security aspect to it?  But not sure.

The main problem to my mind is that a connection string could possibly
override items meant to be specified elsewhere.  In particular it ought
not be allowed to specify the remote username or password, because those
are supposed to come from the user mapping object not the server object.
I suspect you could break things by trying to specify client_encoding
there, as well.

In any case, I entirely reject the argument that the existing
documentation says this should work.  It says that you can specify (most
of) the same fields that are allowed in a connection string, not that one
of those fields might be taken to *be* a connection string.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] expand_dbname in postgres_fdw

2017-07-26 Thread Alvaro Herrera
Robert Haas wrote:
> On Wed, Jul 26, 2017 at 5:38 AM, Ashutosh Bapat
>  wrote:
> > According to F.34.1.1 at [1] passing connection string as dbname
> > option should work, so your question is valid. I am not aware of any
> > discussion around this on hackers. Comments in connect_pg_server()
> > don't help either. But I guess, we expect users to set up individual
> > foreign server and user mapping options instead of putting those in a
> > connection string. I can not think of any reason except that it
> > improves readability. If postgres_fdw wants to take certain actions
> > based on the values of individual options, having them separate is
> > easier to handle than parsing them out of a connection string.
> >
> > Any way, if we are not going to change current behaviour, we should
> > change the documentation and say that option dbname means "database
> > name" and not a connection string.
> 
> I kind of wonder if this had some security aspect to it?  But not sure.

Yeah, me too.  As I recall, if the flag is not set, parameters set by
the FDW server earlier in the conninfo can be changed by params that
appear in the dbname.  Offhand I can't see any obvious security
implications, but then I've not thought about it very hard.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Change in "policy" on dump ordering?

2017-07-26 Thread Tom Lane
I thought of a quite different way that we might attack this problem,
but I'm not sure if it's worth pursuing or not.  The idea is basically
that we should get rid of the existing kluge for pushing ACLs to the end
altogether, and instead rely on dependency sorting to make things work.
This'd require some significant surgery on pg_dump, but it seems doable:

* Make ACLs have actual DumpableObject representations that participate
in the topological sort.

* Add more explicit dependencies between these objects and other ones.
For example, we'd fix the matview-permissions problem by adding
dependencies not only on the tables a matview references, but on their
ACLs.  Another case is that we'd want to ensure that a table's ACL comes
out after the table data, so as to solve the original problem that the
current behavior was meant to deal with, ie allowing restore of tables
even if they've been made read-only by revoking the owner's INSERT
privilege.  But I think that case would best be dealt with by forcing
table ACLs to be post-data objects.  (Otherwise they might come out in the
middle "data" section of a restore, which is likely to break some client
assumptions somewhere.)  Another variant of that is that table ACLs
probably need to come out after CREATE TRIGGER and foreign-key
constraints, in case an owner has revoked their own TRIGGER or REFERENCES
privilege.

This seems potentially doable, but I sure don't see it coming out small
enough to be a back-patchable fix; nor would it make things work for
existing archive files, only new dumps.  In fact, if we don't want to
break parallel restore for existing dump files, we'd probably still have
to implement the postpone-all-ACLs rule when dealing with an old dump
file.  (But maybe we could blow off that case, and just say you might have
to not use parallel restore with old dumps that contain self-revoke ACLs?)

The bigger issue is whether there's some failure case this would cause
that I'm missing altogether.  Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] expand_dbname in postgres_fdw

2017-07-26 Thread Robert Haas
On Wed, Jul 26, 2017 at 5:38 AM, Ashutosh Bapat
 wrote:
> According to F.34.1.1 at [1] passing connection string as dbname
> option should work, so your question is valid. I am not aware of any
> discussion around this on hackers. Comments in connect_pg_server()
> don't help either. But I guess, we expect users to set up individual
> foreign server and user mapping options instead of putting those in a
> connection string. I can not think of any reason except that it
> improves readability. If postgres_fdw wants to take certain actions
> based on the values of individual options, having them separate is
> easier to handle than parsing them out of a connection string.
>
> Any way, if we are not going to change current behaviour, we should
> change the documentation and say that option dbname means "database
> name" and not a connection string.

I kind of wonder if this had some security aspect to it?  But not sure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Increase Vacuum ring buffer.

2017-07-26 Thread Sokolov Yura

On 2017-07-26 19:46, Claudio Freire wrote:

On Wed, Jul 26, 2017 at 1:39 PM, Sokolov Yura
 wrote:

On 2017-07-24 12:41, Sokolov Yura wrote:
test_master_1/pretty.log

...

time   activity  tps  latency   stddev  min  max
11130 av+ch  198198ms374ms  7ms   1956ms
11160 av+ch  248163ms401ms  7ms   2601ms
11190 av+ch  321125ms363ms  7ms   2722ms
11220 av+ch 1155 35ms123ms  7ms   2668ms
11250 av+ch 1390 29ms 79ms  7ms   1422ms


vs


test_master_ring16_1/pretty.log
time   activity  tps  latency   stddev  min  max
11130 av+ch   26   1575ms635ms101ms   2536ms
11160 av+ch   25   1552ms648ms 58ms   2376ms
11190 av+ch   32   1275ms726ms 16ms   2493ms
11220 av+ch   23   1584ms674ms 48ms   2454ms
11250 av+ch   35   1235ms777ms 22ms   3627ms


That's a very huge change in latency for the worse

Are you sure that's the ring buffer's doing and not some methodology 
snafu?


Well, I tuned postgresql.conf so that there is no such
catastrophic slows down on master branch. (with default
settings such slowdown happens quite frequently).
bgwriter_lru_maxpages = 10 (instead of default 200) were one
of such tuning.

Probably there were some magic "border" that triggers this
behavior. Tuning postgresql.conf shifted master branch on
"good side" of this border, and faster autovacuum crossed it
to "bad side" again.

Probably, backend_flush_after = 2MB (instead of default 0) is
also part of this border. I didn't try to bench without this
option yet.

Any way, given checkpoint and autovacuum interference could be
such noticeable, checkpoint clearly should affect autovacuum
cost mechanism, imho.

With regards,
--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bug in locking an update tuple chain

2017-07-26 Thread Alvaro Herrera
Amit Kapila wrote:
> On Sat, Jul 15, 2017 at 2:30 AM, Alvaro Herrera
>  wrote:

> > a transaction wants to lock the
> > updated version of some tuple, and it does so; and some other
> > transaction is also locking the same tuple concurrently in a compatible
> > way.  So both are okay to proceed concurrently.  The problem is that if
> > one of them detects that anything changed in the process of doing this
> > (such as the other session updating the multixact to include itself,
> > both having compatible lock modes), it loops back to ensure xmax/
> > infomask are still sane; but heap_lock_updated_tuple_rec is not prepared
> > to deal with the situation of "the current transaction has the lock
> > already", so it returns a failure and the tuple is returned as "not
> > visible" causing the described problem.
> 
> Your fix seems logical to me, though I have not tested it till now.
> However, I wonder why heap_lock_tuple need to restart from the
> beginning of update-chain in this case?

Well, it's possible that we could change things so that it doesn't need
to re-start from the same spot where it initially began, but I think it
requires changing too much code; I'd rather not touch it in a
back-patchable bug fix.  If we really wanted, we could perhaps change
things to avoid repeated walks of the chain, but I'd see that as a pg11
(or future) change only.  (You would be forgiven for thinking that the
interactions between EvalPlanQualFetch, heap_lock_tuple and
heap_lock_update_tuple are rather Rube Goldbergian, to use Tom's term.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] pg_dump/pg_restore zerror() and strerror() mishap

2017-07-26 Thread Kunshchikov Vladimir
 Hello Alvaro, thanks for the feedback, fixed all of your points.
Attached new version of patch.

--
Best regards,
Vladimir Kunschikov
Lead software developer
IDS project
InfoTeCS JSC

From: Alvaro Herrera 
Sent: Wednesday, July 26, 2017 1:02 AM
To: Kunshchikov Vladimir
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [patch] pg_dump/pg_restore zerror() and strerror() mishap

Kunshchikov Vladimir wrote:

> Errors should be like this:
> pg_restore: [compress_io] could not read from input file: d3/2811.dat.gz: 
> invalid distance too far back
>
> Attached small fix for this issue.

After looking at this patch, I don't like it very much.  In particular,
I don't like the way you've handled the WRITE_ERROR_EXIT macro in
pg_backup_directory.c by undef'ing the existing one and creating it
anew.  The complete and correct definition should reside in one place
(pg_backup_archiver.h), instead of being split in two and being defined
differently.

Another point is that you broke the comment on the definition of struct
cfp "this is opaque to callers" by moving it to the header file
precisely with the point of making it transparent to callers.  We need
some better idea there.  I think it can be done by making compress_io.c
responsible of handing over the error message through some new function
(something very simple like "if compressedfp then return get_gz_error
else strerror" should suffice).

Also, I needed to rebase your patch over a recent pgindent run, though
that's a pretty small change.

--
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/bin/pg_dump/compress_io.c b/src/bin/pg_dump/compress_io.c
index 991fe11e8a..67908a069c 100644
--- a/src/bin/pg_dump/compress_io.c
+++ b/src/bin/pg_dump/compress_io.c
@@ -709,5 +709,25 @@ hasSuffix(const char *filename, const char *suffix)
   suffix,
   suffixlen) == 0;
 }
+#endif
 
+const char *
+get_cfp_error(cfp* fp)
+{
+#ifdef HAVE_LIBZ
+	if(!fp->compressedfp)
+		return strerror(errno);
+
+	int errnum;
+	
+	static const char fallback[] = "Zlib error";
+	const int maxlen = 255;
+	const char *errmsg = gzerror(fp->compressedfp, );
+	if(!errmsg || !memchr(errmsg, 0, maxlen))
+		errmsg = fallback;
+
+	return errnum == Z_ERRNO ? strerror(errno) : errmsg;
+#else
+	return strerror(errno);
 #endif
+}
diff --git a/src/bin/pg_dump/compress_io.h b/src/bin/pg_dump/compress_io.h
index 8f2e752cba..06b3762233 100644
--- a/src/bin/pg_dump/compress_io.h
+++ b/src/bin/pg_dump/compress_io.h
@@ -65,5 +65,6 @@ extern int	cfgetc(cfp *fp);
 extern char *cfgets(cfp *fp, char *buf, int len);
 extern int	cfclose(cfp *fp);
 extern int	cfeof(cfp *fp);
+extern const char * get_cfp_error(cfp *fp);
 
 #endif
diff --git a/src/bin/pg_dump/pg_backup_directory.c b/src/bin/pg_dump/pg_backup_directory.c
index 79922da8ba..e9c26bc571 100644
--- a/src/bin/pg_dump/pg_backup_directory.c
+++ b/src/bin/pg_dump/pg_backup_directory.c
@@ -352,7 +352,7 @@ _WriteData(ArchiveHandle *AH, const void *data, size_t dLen)
 	lclContext *ctx = (lclContext *) AH->formatData;
 
 	if (dLen > 0 && cfwrite(data, dLen, ctx->dataFH) != dLen)
-		WRITE_ERROR_EXIT;
+		exit_horribly(modulename, "could not write to output file: %s\n", get_cfp_error(ctx->dataFH));
 
 	return;
 }
@@ -490,7 +490,7 @@ _WriteByte(ArchiveHandle *AH, const int i)
 	lclContext *ctx = (lclContext *) AH->formatData;
 
 	if (cfwrite(, 1, ctx->dataFH) != 1)
-		WRITE_ERROR_EXIT;
+		exit_horribly(modulename, "could not write to output file: %s\n", get_cfp_error(ctx->dataFH));
 
 	return 1;
 }
@@ -519,7 +519,7 @@ _WriteBuf(ArchiveHandle *AH, const void *buf, size_t len)
 	lclContext *ctx = (lclContext *) AH->formatData;
 
 	if (cfwrite(buf, len, ctx->dataFH) != len)
-		WRITE_ERROR_EXIT;
+		exit_horribly(modulename, "could not write to output file: %s\n", get_cfp_error(ctx->dataFH));
 
 	return;
 }
diff --git a/src/bin/pg_dump/pg_backup_tar.c b/src/bin/pg_dump/pg_backup_tar.c
index f839712945..f17ce78aca 100644
--- a/src/bin/pg_dump/pg_backup_tar.c
+++ b/src/bin/pg_dump/pg_backup_tar.c
@@ -35,6 +35,7 @@
 #include "pgtar.h"
 #include "common/file_utils.h"
 #include "fe_utils/string_utils.h"
+#include "compress_io.h"
 
 #include 
 #include 
@@ -555,8 +556,12 @@ _tarReadRaw(ArchiveHandle *AH, void *buf, size_t len, TAR_MEMBER *th, FILE *fh)
 			{
 res = GZREAD(&((char *) buf)[used], 1, len, th->zFH);
 if (res != len && !GZEOF(th->zFH))
+{
+	int errnum;
+	const char *errmsg = gzerror(th->zFH, );
 	exit_horribly(modulename,
-  "could not read from input file: %s\n", strerror(errno));
+  "could not read from input file: %s\n", errnum == Z_ERRNO? strerror(errno) : errmsg);
+}
 			}
 			else
 			{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Change in "policy" on dump ordering?

2017-07-26 Thread Robert Haas
On Wed, Jul 26, 2017 at 9:30 AM, Tom Lane  wrote:
> Meanwhile, we have problems that bite people who aren't doing anything
> stranger than having a matview owned by a non-superuser.  How do you
> propose to fix that without reordering pg_dump's actions?

Obviously changing the order is essential.  What I wasn't sure about
was whether a hard division into phases was a good idea.  The
advantage of the dependency mechanism is that, at least in theory, you
can get things into any order you need by sticking the right
dependencies in there.  Your description made it sound like you'd
hard-coded matview entries to the end rather than relying on
dependencies, which could be a problem if something later turns up
where we don't want them all the way at the end.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages

2017-07-26 Thread Robert Haas
On Thu, Jun 8, 2017 at 10:44 AM, Dilip Kumar  wrote:
> On Thu, May 18, 2017 at 8:07 PM, Robert Haas  wrote:
>
> Thanks for the feedback and sorry for the delayed response.
>
>> You might need to adjust effective_cache_size.
>
> You are right. But, effective_cache_size will have the impact on the
> number of pages_fetched when it's used as parameterized path (i.e
> inner side of the nested loop). But for our case where we see the
> wrong number of pages got estimated (Q10), it was for the
> non-parameterized path.

Ah, oops.  My mistake.

One thing to keep in mind that this is just an estimate.  It's not
going to be right 100% of the time no matter what you do.  The goal is
to make the estimates better than they are today, and the patch can
succeed in being better overall even if there are some cases where
things get worse.  Have you tried to analyze what is causing the bad
estimate in this one case?

The formula that compute_bitmap_pages is using here to compute the
number of page fetches is (2.0 * T * tuples_fetched) / (2.0 * T +
tuples_fetched), where T is the number of pages in the table.  Now the
idea here is that when tuples_fetched is small, the number of pages
fetched is likely to be almost equal to the number of tuples fetched,
because probably all of the tuples will be on separate pages.  As the
number of tuples grows larger, we assume it's likely that sometimes
two or more of them will be on the same page, so pages_fetched grows
more slowly.  When tuples_fetched = T, that is, the number of tuples
equals the number of pages, we estimate that we're fetching 2/3 of the
table, because some pages will have no tuples to fetch at all, while
others have more than one.  When tuples_fetched = 2 * T or greater, we
assume we'll fetch every page in the table.

But this could be wrong.  If there are 100 tuples per paged, we could
have tuples_fetched = 2 * T but actually fetch only T / 50 pages
rather than T pages, if all the rows we need to fetch are tightly
clustered.  That would be a 50x estimation error; the one you're
seeing is about 3.8x.  And my guess is that it's exactly this problem:
the TIDs being fetched are not spread out evenly through the whole
table, but are rather all clustered, but you could try to verify that
through some experimentation.  I'm not sure we have the statistics to
solve that problem in a principled way.  It seems loosely related to
the physical-to-logical correlation which we do store, but not so
closely that any way of using that information directly is obvious.

Instead of trying to immediate improve things on the optimizer side,
I'm wondering whether our first step should be to try to improve
things on the executor side - i.e. reduce the number of pages that
actually get lossified.  tbm_lossify says:

 * XXX Really stupid implementation: this just lossifies pages in
 * essentially random order.  We should be paying some attention to the
 * number of bits set in each page, instead.

As the comment says, the current implementation is really stupid,
which means we're lossifying more pages than really necessary.  There
is some previous discussion of this topic here:

https://www.postgresql.org/message-id/flat/20160923205843.zcs533sqdzlh4cpo%40alap3.anarazel.de

There are two main considerations here.  One, it's better to lossify
pages with many bits set than with few bits set, because the
additional work we thereby incur is less.  Two, it's better to lossify
pages that are in the same chunk as other pages which we are also
going to lossify, because that's how we actually save memory.  The
current code will cheerfully lossify a chunk that contains no other
pages, or will lossify one page from a chunk but not the others,
saving no memory but hurting performance.

Maybe the simplest change here would be to make it so that when we
decide to lossify a chunk, we lossify all pages in the chunk, but only
if there's more than one.  In other words, given a proposed page P to
lossify, probe the hash table for all keys in the same chunk as P and
build up a words[] array for the proposed chunk.  If that words[]
array will end up with only 1 bit set, then forget the whole thing;
otherwise, delete all of the entries for the individual pages and
insert the new chunk instead.

A further refinement would be to try to do a better job picking which
chunks to lossify in the first place.  I don't have a clear idea of
how we could go about doing that.  There's an unused padding byte
available inside PageTableEntry, and really it's more like 28 bits,
because status only needs 2 bits and ischunk and recheck only need 1
bit each.  So without increasing the memory usage at all, we could use
those bits to store some kind of information that would give us a clue
as to whether a certain entry was likely to be a good candidate for
lossification.  What to store there is a little less clear, but one
idea is to store the number of page table 

Re: [HACKERS] Increase Vacuum ring buffer.

2017-07-26 Thread Claudio Freire
On Wed, Jul 26, 2017 at 1:39 PM, Sokolov Yura
 wrote:
> On 2017-07-24 12:41, Sokolov Yura wrote:
> test_master_1/pretty.log
...
> time   activity  tps  latency   stddev  min  max
> 11130 av+ch  198198ms374ms  7ms   1956ms
> 11160 av+ch  248163ms401ms  7ms   2601ms
> 11190 av+ch  321125ms363ms  7ms   2722ms
> 11220 av+ch 1155 35ms123ms  7ms   2668ms
> 11250 av+ch 1390 29ms 79ms  7ms   1422ms

vs

> test_master_ring16_1/pretty.log
> time   activity  tps  latency   stddev  min  max
> 11130 av+ch   26   1575ms635ms101ms   2536ms
> 11160 av+ch   25   1552ms648ms 58ms   2376ms
> 11190 av+ch   32   1275ms726ms 16ms   2493ms
> 11220 av+ch   23   1584ms674ms 48ms   2454ms
> 11250 av+ch   35   1235ms777ms 22ms   3627ms

That's a very huge change in latency for the worse

Are you sure that's the ring buffer's doing and not some methodology snafu?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] pg_dump/pg_restore zerror() and strerror() mishap

2017-07-26 Thread Alvaro Herrera
Kunshchikov Vladimir wrote:
>  Hello Alvaro, thanks for the feedback, fixed all of your points.
> Attached new version of patch.

Looks great -- it's a lot smaller than the original even.  One final
touch -- see cfread(), where we have an #ifdef where we test for
fp->compressedfp; the "#else" branch uses the same code as the
!fp->compressedfp.  I think your get_cfp_error can be written more
simply using that form.  (Also, your version probably errors or warns
about "code before declaration" in that routine, which is not allowed in
C89.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: psql: check env variable PSQL_PAGER

2017-07-26 Thread Pavel Stehule
Hi

2017-07-26 13:15 GMT+02:00 Dagfinn Ilmari Mannsåker :

> Pavel Stehule  writes:
>
> > Hi
> >
> > I wrote a special pager for psql. Surely, this pager is not good for
> paging
> > of man pages. So is not good to set it as global pager. We can introduce
> > new env variable PSQL_PAGER for this purpose. It can work similar like
> > PSQL_EDITOR variable.
> >
> > Notes, comments?
>
> There's already the option to put '\setenv PAGER ...' in .psqlrc, but I
> guess having it work like EDITOR makes sense for consistency, as well as
> not affecting commands spawned from psql via e.g. \!.
>

here is a patch - it is trivial

Regards

Pavel

>
> - ilmari
> --
> "The surreality of the universe tends towards a maximum" -- Skud's Law
> "Never formulate a law or axiom that you're not prepared to live with
>  the consequences of."  -- Skud's Meta-Law
>
commit cc5aac7ab431dd0a95062aa5171f860215d5bc4a
Author: Pavel Stehule 
Date:   Wed Jul 26 18:12:26 2017 +0200

initial

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index c592edac60..ef89f81fdf 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -4087,6 +4087,7 @@ $endif

 

+PSQL_PAGER
 PAGER
 
 
@@ -4096,8 +4097,10 @@ $endif
   more or less.  The default
   is platform-dependent.  Use of the pager can be disabled by setting
   PAGER to empty, or by using pager-related options of
-  the \pset command.
+  the \pset command. These variables are examined
+  in the order listed; the first that is set is used.
  
+
 

 
diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
index b3dbb5946e..9b0b83b0bd 100644
--- a/src/bin/psql/help.c
+++ b/src/bin/psql/help.c
@@ -411,7 +411,7 @@ helpVariables(unsigned short int pager)
 #endif
 
 	fprintf(output, _("  COLUMNSnumber of columns for wrapped format\n"));
-	fprintf(output, _("  PAGER  name of external pager program\n"));
+	fprintf(output, _("  PSQL_PAGER, PAGER  name of external pager program\n"));
 	fprintf(output, _("  PGAPPNAME  same as the application_name connection parameter\n"));
 	fprintf(output, _("  PGDATABASE same as the dbname connection parameter\n"));
 	fprintf(output, _("  PGHOST same as the host connection parameter\n"));
diff --git a/src/fe_utils/print.c b/src/fe_utils/print.c
index f756f767e5..8af5bbe97e 100644
--- a/src/fe_utils/print.c
+++ b/src/fe_utils/print.c
@@ -2870,7 +2870,9 @@ PageOutput(int lines, const printTableOpt *topt)
 			const char *pagerprog;
 			FILE	   *pagerpipe;
 
-			pagerprog = getenv("PAGER");
+			pagerprog = getenv("PSQL_PAGER");
+			if (!pagerprog)
+pagerprog = getenv("PAGER");
 			if (!pagerprog)
 pagerprog = DEFAULT_PAGER;
 			else

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump does not handle indirectly-granted permissions properly

2017-07-26 Thread Tom Lane
... btw, while you're working on this, it'd be nice if you fixed the
header comment for dumpACL().  It is unintelligible as to what racls
is, and apparently feels that it need not discuss initacls or initracls
at all.  I can't say that the reference to "fooacl" is really obvious
either.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove old comments in dependencies.c and README.dependencies

2017-07-26 Thread Alvaro Herrera
atorikoshi wrote:

> Attached patch removes the comments about min_group_size.

Agreed.  Removed those comments.  Thanks for the patch.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [GSOC] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-07-26 Thread Mengxing Liu
Hi, all. There was a very strange phenomenon I couldn't explain. So I was 
wondering if you can help me.


I was trying to replace the linked list with a skip list in serializable 
transaction object for faster conflict tracking. But the performance is bad.
So I used the instruction "rdtsc" to compare the speed of my skip list and the 
original linked list. The skip list was about 1.5x faster.


The interesting thing is that if I added the instruction "rdstc" at the end of 
the function "RWConflictExists", 
the performance of the whole system was increased by at most 3 times! 
Here is the result. 


| benchmarks | without rdtsc  | with rdtsc |
| simpe read/write | 4.91 | 14.16 |
| ssibench | 9.72 | 10.24 |
| tpcb | 26.45 | 26.38 |


( The simple read/write benchmark has the most number of conflicts. )


The patch is attached. All the difference of the two columns is with/without a 
simple line of code:
__asm__ __volatile__ ("rdtsc"); 
But I don't know why this instruction will influence the performance so much!


BTW, after adding the "rdtsc" instruction, the performance is better than the 
original version about 10% at most.
That means, the skip list can work! 


Looking forward to your advices. 

--
Sincerely


Mengxing Liu








skip-list-for-conflict-tracking.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Change in "policy" on dump ordering?

2017-07-26 Thread Tom Lane
Jordan Gigov  writes:
> But why should a superuser need the ACL to be applied before being allowed
> access? If you make the permission-checking function check if the user is a
> superuser before looking for per-user grants, wouldn't that solve the issue?

The superuser's permissions are not relevant, because the materialized
view is run with the permissions of its owner, not the superuser.
We are not going to consider changing that, either, because it would open
trivial-to-exploit security holes (any user could set up a trojan horse
matview and just wait for the next pg_upgrade or dump/restore).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AlterUserStmt anmd RoleSpec rules in grammar.y

2017-07-26 Thread Tom Lane
Pavel Golub  writes:
> I need someone to throw some light on grammar (gram.y).
> I'm investigating beta2 regression tests, and found new statement

> `ALTER USER ALL SET application_name to 'SLAP';`
> ^^^

You'll notice that that statement fails in the regression tests:

ALTER USER ALL SET application_name to 'SLAP';
ERROR:  syntax error at or near "ALL"

The one that works is

ALTER ROLE ALL SET application_name to 'SLAP';

and the reason is that AlterRoleSetStmt has a separate production
for ALL, but AlterUserSetStmt doesn't.  This seems a tad bizarre
though.  Peter, you added that production (in commit 9475db3a4);
is this difference intentional or just an oversight?  If it's
intentional, what's the reasoning?

BTW, I'm quite confused as to why these test cases (in rolenames.sql)
seem to predate that commit, and yet it did not change their results.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] GSoC 2017: weekly progress reports (week 8)

2017-07-26 Thread Shubham Barai
Project: Explicitly support predicate locks in index AMs besides b-tree

Hi,

During this week, I worked on predicate locking in spgist index. I think,
for spgist index, predicate lock only on leaf pages will be enough as
spgist searches can determine if there is a match or not only at leaf level.

I have done following things in this week.

1) read the source code of spgist index to understand  the access method

2) found appropriate places to insert calls to existing functions

3) created tests (to verify serialization failures and to demonstrate the
feature of reduced false positives) for 'point' and 'box' data types.


link to code and tests: https://github.com/shub
hambaraiss/postgres/commit/d9ae709c93ff038478ada411c621faeabe6813cb

I will attach the patch shortly.


Regards,
Shubham



 Sent with Mailtrack



Re: [HACKERS] Testlib.pm vs msys

2017-07-26 Thread Tom Lane
Andrew Dunstan  writes:
> On 07/26/2017 09:33 AM, Tom Lane wrote:
>> It might be interesting to look into its copy of IPC::Run and see if
>> the wait technology is different from later versions.

> It's using 0.94 just like my Linux box.

Oh well, I hoped maybe we could update that.

>> I still find this pretty ugly :-(.

> I'm open to alternative suggestions. But I don't want to spend a heck of
> a lot of time on this.

Well, let's at least do the temp files a little more safely.
Maybe like this?

regards, tom lane

diff --git a/src/bin/pg_ctl/t/001_start_stop.pl b/src/bin/pg_ctl/t/001_start_stop.pl
index 9c3551f..3acc80e 100644
--- a/src/bin/pg_ctl/t/001_start_stop.pl
+++ b/src/bin/pg_ctl/t/001_start_stop.pl
@@ -32,9 +32,17 @@ else
 	print $conf "listen_addresses = '127.0.0.1'\n";
 }
 close $conf;
-command_like([ 'pg_ctl', 'start', '-D', "$tempdir/data",
-	'-l', "$TestLib::log_path/001_start_stop_server.log" ],
-	qr/done.*server started/s, 'pg_ctl start');
+my $ctlcmd = [ 'pg_ctl', 'start', '-D', "$tempdir/data",
+			   '-l', "$TestLib::log_path/001_start_stop_server.log" ];
+if ($Config{osname} ne 'msys')
+{
+	command_like($ctlcmd, qr/done.*server started/s, 'pg_ctl start');
+}
+else
+{
+	# use the version of command_like that doesn't hang on Msys here
+	command_like_safe($ctlcmd, qr/done.*server started/s, 'pg_ctl start');
+}
 
 # sleep here is because Windows builds can't check postmaster.pid exactly,
 # so they may mistake a pre-existing postmaster.pid for one created by the
diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm
index fe09689..758c920 100644
--- a/src/test/perl/TestLib.pm
+++ b/src/test/perl/TestLib.pm
@@ -37,6 +37,7 @@ our @EXPORT = qw(
   program_version_ok
   program_options_handling_ok
   command_like
+  command_like_safe
   command_fails_like
 
   $windows_os
@@ -300,6 +301,24 @@ sub command_like
 	like($stdout, $expected_stdout, "$test_name: matches");
 }
 
+sub command_like_safe
+{
+	# Doesn't rely on detecting end of file on the file descriptors,
+	# which can fail, causing the process to hang, notably on Msys
+	# when used with 'pg_ctl start'
+	my ($cmd, $expected_stdout, $test_name) = @_;
+	my ($stdout, $stderr);
+	my $stdoutfile = File::Temp->new();
+	my $stderrfile = File::Temp->new();
+	print("# Running: " . join(" ", @{$cmd}) . "\n");
+	my $result = IPC::Run::run $cmd, '>', $stdoutfile, '2>', $stderrfile;
+	$stdout = slurp_file($stdoutfile);
+	$stderr = slurp_file($stderrfile);
+	ok($result, "$test_name: exit code 0");
+	is($stderr, '', "$test_name: no stderr");
+	like($stdout, $expected_stdout, "$test_name: matches");
+}
+
 sub command_fails_like
 {
 	my ($cmd, $expected_stderr, $test_name) = @_;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-07-26 Thread Amit Khandekar
On 26 July 2017 at 02:37, Robert Haas  wrote:
> Is there any real benefit in this "walker" interface?  It looks to me
> like it might be simpler to just change things around so that it
> returns a list of OIDs, like find_all_inheritors, but generated
> differently.  Then if you want bound-ordering rather than
> OID-ordering, you just do this:
>
> list_free(inhOids);
> inhOids = get_partition_oids_in_bound_order(rel);
>
> That'd remove the need for some if/then logic as you've currently got
> in get_next_child().

Yes, I had considered that ; i.e., first generating just a list of
bound-ordered oids. But that consequently needs all the child tables
to be opened and closed twice; once during the list generation, and
then while expanding the partitioned table. Agreed, that the second
time, heap_open() would not be that expensive because tables would be
cached, but still it would require to get the cached relation handle
from hash table. Since we anyway want to open the tables, better have
a *next() function to go-get the next partition in a fixed order.

Actually, there isn't much that the walker next() function does. Any
code that wants to traverse bound-wise can do that by its own. The
walker function is just a convenient way to make sure everyone
traverses in the same order by using this function.

Yet to go over other things including your review comments, and Amit
Langote's patch on refactoring RelationGetPartitionDispatchInfo().

> On another note, did you do anything about the suggestion Thomas made
> in 
> http://postgr.es/m/CAEepm=3sc_j1zwqdyrbu4dtfx5rhcamnnuaxrkwzfgt9m23...@mail.gmail.com
> ?

This is still pending on me; plus I think there are some more points.
I need to go over those and consolidate a list of todos.




-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] tab complete for psql pset pager values

2017-07-26 Thread Pavel Stehule
Hi

attached trivial patch for missing tab complete for \pset pager setting

Regards

Pavel
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index e9fdc908c7..1b15c5b0d1 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -3536,6 +3536,8 @@ psql_completion(const char *text, int start, int end)
 "unicode_column_linestyle|"
 "unicode_header_linestyle"))
 			COMPLETE_WITH_LIST_CS2("single", "double");
+		else if (TailMatchesCS1("pager"))
+			COMPLETE_WITH_LIST_CS3("on", "off", "always");
 	}
 	else if (TailMatchesCS1("\\unset"))
 		matches = complete_from_variables(text, "", "", true);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade failed with error - ERROR: column "a" in child table must be marked NOT NULL

2017-07-26 Thread Michael Paquier
On Wed, Jul 26, 2017 at 4:02 PM, Tom Lane  wrote:
> Not sure what's involved there code-wise, though, nor whether we'd want
> to back-patch.

I'll try to look at the code around that to come up with a clear
conclusion in the next couple of days, likely more as that's a
vacation period. Surely anything invasive would not be backpatched.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A bug in mapping attributes in ATExecAttachPartition()

2017-07-26 Thread Ashutosh Bapat
On Wed, Jul 12, 2017 at 7:17 AM, Amit Langote
 wrote:
>
> Thanks for the review.  Patch updated taking care of the comments.

The patches still apply and compile. make check runs well. I do not
have any further review comments. Given that they address a bug,
should we consider those for v10?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/perl extension fails on Windows

2017-07-26 Thread Ashutosh Sharma
On Wed, Jul 26, 2017 at 8:51 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> Based on discussion downthread, it seems like what we actually need to
>> do is update perl.m4 to extract CCFLAGS.  Turns out somebody proposed
>> a patch for that back in 2002:
>> https://www.postgresql.org/message-id/Pine.LNX.4.44.0211051045070.16317-20%40wotan.suse.de
>> It seems to need a rebase.  :-)
>
> Ah-hah, I *thought* we had considered the question once upon a time.
> There were some pretty substantial compatibility concerns raised in that
> thread, which is doubtless why it's still like that.
>
> My beef about inter-compiler compatibility (if building PG with a
> different compiler from that used for Perl) could probably be addressed by
> absorbing only -D switches from the Perl flags.  But Peter seemed to feel
> that even that could break things, and I worry that he's right for cases
> like -D_FILE_OFFSET_BITS which affect libc APIs.  Usually we'd have made
> the same decisions as Perl for that sort of thing, but if we didn't, it's
> a mess.
>
> I wonder whether we could adopt some rule like "absorb -D switches
> for macros whose names do not begin with an underscore".  That's
> surely a hack and three-quarters, but it seems safer than just
> absorbing everything willy-nilly.
>

Thanks Robert, Tom, Andrew and Amit for all your inputs. I have tried
to work on the patch shared by Reinhard  long time back for Linux. I
had to rebase the patch and also had to do add some more lines of code
to make it work on Linux. For Windows, I had to prepare a separate
patch to replicate similar behaviour.  I can see that both these
patches are working as expected i.e they are able import the switches
used by Perl into plperl module during build time.  However, on
windows i am still seeing the crash and i am still working to find the
reason for this.  Here, I attach the patches that i have prepared for
linux and Windows platforms. Thanks.


plperl_linux.patch
Description: Binary data


plperl_win.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH v3] pg_progress() SQL function to monitor progression of long running SQL queries/utilities

2017-07-26 Thread Pavel Stehule
2017-07-26 15:27 GMT+02:00 Robert Haas :

> On Wed, Jun 21, 2017 at 10:01 AM, Remi Colinet 
> wrote:
> > test=# SELECT  pid, ppid, bid, concat(repeat(' ', 3 * indent),name),
> value,
> > unit FROM pg_progress(0,0);
> >   pid  | ppid | bid |  concat  |  value   |  unit
> > ---+--+-+--+--+-
> >  14106 |0 |   4 | status   | query running|
> >  14106 |0 |   4 | relationship | progression  |
> >  14106 |0 |   4 |node name | Sort |
> >  14106 |0 |   4 |sort status   | on tapes writing |
> >  14106 |0 |   4 |completion| 0| percent
> >  14106 |0 |   4 |relationship  | Outer|
> >  14106 |0 |   4 |   node name  | Seq Scan |
> >  14106 |0 |   4 |   scan on| t_10m|
> >  14106 |0 |   4 |   fetched| 25049| block
> >  14106 |0 |   4 |   total  | 83334| block
> >  14106 |0 |   4 |   completion | 30   | percent
> > (11 rows)
> >
> > test=#
>
> Somehow I imagined that the output would look more like what EXPLAIN
> produces.
>

me too.

Regards

Pavel


>
> > If the one shared memory page is not enough for the whole progress
> report,
> > the progress report transfert between the 2 backends is done with a
> series
> > of request/response. Before setting the latch, the monitored backend
> write
> > the size of the data dumped in shared memory and set a status to indicate
> > that more data is to be sent through the shared memory page. The
> monitoring
> > backends get the result and sends an other signal, and then wait for the
> > latch again. The monitored backend does not collect a new progress report
> > but continues to dump the already collected report. And the exchange
> goes on
> > until the full progress report has been dumped.
>
> This is basically what shm_mq does.  We probably don't want to
> reinvent that code, as it has taken a surprising amount of debugging
> to get it fully working.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


[HACKERS] AlterUserStmt anmd RoleSpec rules in grammar.y

2017-07-26 Thread Pavel Golub
Hello, hackers.

I need someone to throw some light on grammar (gram.y).
I'm investigating beta2 regression tests, and found new statement

`ALTER USER ALL SET application_name to 'SLAP';`
^^^

I know for sure that in beta1 this operator fails. So I decided to recheck 
gram.y:

AlterUserStmt:
ALTER USER RoleSpec SetResetClause;

RoleSpec:NonReservedWord
 | CURRENT_USER
 | SESSION_USER;

But *ALL is reserved word*! Thus "ALTER ROLE\USER ALL" should fail.

OK, I checked in Pg10 beta2, installer provided by EDB. It worked.

Then I asked someone to check this against fresh built server from
'master'. It failed.

So, the situation is:

1. Docs say this is correct statement:
https://www.postgresql.org/docs/devel/static/sql-alterrole.html

2. The sources in master don't support such production:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/parser/gram.y;h=4b1ce09c445a5ee249a965ec0953b122df71eb6f;hb=refs/heads/master
Line 1179 for AlterUserSetStmt rule;
Line 14515 for RoleSpec rule;

3. EDB 10beta2 server supports it.

What's going on?



-- 
With best wishes,
 Pavel  mailto:pa...@gf.microolap.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Change in "policy" on dump ordering?

2017-07-26 Thread Jordan Gigov
But why should a superuser need the ACL to be applied before being allowed
access? If you make the permission-checking function check if the user is a
superuser before looking for per-user grants, wouldn't that solve the issue?

On 26 July 2017 at 16:30, Tom Lane  wrote:

> Robert Haas  writes:
> > On Tue, Jul 25, 2017 at 10:45 PM, Tom Lane  wrote:
> >> Instead, I've prepared the attached draft patch, which addresses the
> >> problem by teaching pg_backup_archiver.c to process TOC entries in
> >> three separate passes, "main" then ACLs then matview refreshes.
>
> > What worries me a bit is the possibility that something might depend
> > on a matview having already been refreshed.  I cannot immediately
> > think of a case whether such a thing happens that you won't dismiss as
> > wrong-headed, but how sure are we that none such will arise?
>
> Um, there are already precisely zero guarantees about that.  pg_dump
> has always been free to order these actions in any way that satisfies
> the dependencies it knows about.
>
> It's certainly possible to break it by introducing hidden dependencies
> within CHECK conditions.  But that's always been true, with or without
> materialized views, and we've always dismissed complaints about it with
> "sorry, we won't support that".  (I don't think the trigger case is
> such a problem, because we restore data before creating any triggers.)
>
> Meanwhile, we have problems that bite people who aren't doing anything
> stranger than having a matview owned by a non-superuser.  How do you
> propose to fix that without reordering pg_dump's actions?
>
> Lastly, the proposed patch has the advantage that it acts at the restore
> stage, not when a dump is being prepared.  Since it isn't affecting what
> goes into dump archives, it doesn't tie our hands if we think of some
> better idea later.
>
> regards, tom lane
>


Re: [HACKERS] pg_upgrade failed with error - ERROR: column "a" in child table must be marked NOT NULL

2017-07-26 Thread Tom Lane
Michael Paquier  writes:
> The documentation says the following:
> https://www.postgresql.org/docs/9.3/static/ddl-inherit.html
> All check constraints and not-null constraints on a parent table are
> automatically inherited by its children. Other types of constraints
> (unique, primary key, and foreign key constraints) are not inherited.

> When adding a primary key, the system does first SET NOT NULL on each
> column under cover, but this does not get spread to the child tables
> as this is a PRIMARY KEY. The question is, do we want to spread the
> NOT NULL constraint created by the PRIMARY KEY command to the child
> tables, or not?

Yeah, I think we really ought to for consistency.  Quite aside from
pg_dump hazards, this allows situations where selecting from an
apparently not-null column can produce nulls (coming from unconstrained
child tables).  That's exactly what the automatic inheritance rules
are intended to prevent.  If we had a way to mark NOT NULL constraints
as not-inherited, it'd be legitimate for ADD PRIMARY KEY to paste on
one of those instead of a regular one ... but we lack that feature,
so it's moot.

Not sure what's involved there code-wise, though, nor whether we'd want
to back-patch.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HACKERS[PATCH] split ProcArrayLock into multiple parts

2017-07-26 Thread Robert Haas
On Fri, Jun 9, 2017 at 2:39 PM, Jim Van Fleet  wrote:
> I left out the retry in LWLockAcquire.

If you want this to be considered, you should add it to the next
CommitFest, currently https://commitfest.postgresql.org/14/

However, I can't see this being acceptable in the current form:

1. I'm pretty sure that there will be no appetite for introducing
special cases for ProcArrayLock into lwlock.c directly; such logic
should be handled by the code that calls lwlock.c.  It's very
unpalatable to have LWLockConditionalAcquire() harcoded to fail always
for ProcArrayLock, and it's also adding overhead for every caller that
reaches those "if" statements and has to branch or not.

2. Always using the group-clearing approach instead of only when the
lock is uncontended seems like it will lead to a loss of performance
in some situations.

3. This adds a good amount of complexity to the code but it's not
clearly better overall.  Your own results in
http://postgr.es/m/ofbab24999.8db8c8de-on86258136.006aeb24-86258136.006b3...@notes.na.collabserv.com
show that some workloads benefit and others are harmed.  I don't think
a 6% loss on single-socket machines is acceptable; there are still
plenty of those out there.

I don't think the idea of partitioning ProcArrayLock is necessarily
bad -- Heikki tried it before -- but I don't think it's necessarily
easy to work out all the kinks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testlib.pm vs msys

2017-07-26 Thread Andrew Dunstan


On 07/26/2017 09:33 AM, Tom Lane wrote:
> Andrew Dunstan  writes:
>
>> The latter fact makes me
>> theorize that the problem arises from the fairly ancient perl that Msys
>> provides, and which we need to use to run prove so the TAP tests
>> understand the environment's virtualized file paths.
> It might be interesting to look into its copy of IPC::Run and see if
> the wait technology is different from later versions.


It's using 0.94 just like my Linux box.

>
>> The attached patch should get around the problem without upsetting the
>> good work you've been doing in reducing TAP test times generally.
> I still find this pretty ugly :-(.
>
>   

I'm open to alternative suggestions. But I don't want to spend a heck of
a lot of time on this.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Default Partition for Range

2017-07-26 Thread Jeevan Ladhe
Hi Beena,

I have posted the rebased patches[1] for default list partition.
Your patch also needs a rebase.

[1]
https://www.postgresql.org/message-id/CAOgcT0OVwDu%2BbeChWb5R5s6rfKLCiWcZT5617hqu7T3GdA1hAw%40mail.gmail.com

Regards,
Jeevan Ladhe

On Fri, Jul 14, 2017 at 3:28 PM, Beena Emerson 
wrote:

> On Tue, Jul 4, 2017 at 4:21 PM, Rahila Syed 
> wrote:
> >
> > Hello Beena,
> >
> > Thanks for the updated patch. It passes the basic tests which I
> performed.
> >
> > Few comments:
> > 1. In   check_new_partition_bound(),
> >
> >> /* Default case is not handled here */
> >>if (spec->is_default)
> >>break;
> > The default partition check here can be performed in common for both
> range
> > and list partition.
>
> Removed the check here.
>
> >
> > 2. RANGE_DATUM_FINITE = 0, /* actual datum stored elsewhere */
> > +   RANGE_DATUM_DEFAULT,/* default */
> >
> > More elaborative comment?
>
> I am not sure what to add. Do you have something in mind?
>
> >
> > 3.  Inside make_one_range_bound(),
> >
> >>+
> >>+   /* datums are NULL for default */
> >>+   if (datums == NULL)
> >>+   for (i = 0; i < key->partnatts; i++)
> >>+   bound->content[i] = RANGE_DATUM_DEFAULT;
> >>+
> > IMO, returning bound at this stage will make code clearer as further
> > processing
> > does not happen for default partition.
>
> Done.
>
> >
> > 4.
> > @@ -549,6 +568,7 @@ RelationBuildPartitionDesc(Relation rel)
> > sizeof(RangeDatumContent
> > *));
> > boundinfo->indexes = (int *) palloc((ndatums + 1) *
> > sizeof(int));
> > +   boundinfo->default_index = -1;
> > This should also be done commonly for both default list and range
> partition.
>
> Removed the line here. Common allocation is done by Jeevan's patch.
>
> >
> > 5.
> >   if (!spec->is_default && partdesc->nparts > 0)
> > {
> > ListCell   *cell;
> >
> > Assert(boundinfo &&
> >boundinfo->strategy ==
> PARTITION_STRATEGY_LIST &&
> >(boundinfo->ndatums > 0 ||
> > partition_bound_accepts_nulls(boundinfo) ||
> > partition_bound_has_default(boundinfo)));
> > The assertion condition partition_bound_has_default(boundinfo) is
> rendered
> > useless
> > because of if condition change and can be removed perhaps.
>
> This cannot be removed.
> This check is required when this code is run for a non-default
> partition and default is the only existing partition.
>
> >
> > 6. I think its better to have following  regression tests coverage
> >
> > --Testing with inserting one NULL and other non NULL values for
> multicolumn
> > range partitioned table.
> >
> > postgres=# CREATE TABLE range_parted (
> > postgres(# a int,
> > postgres(# b int
> > postgres(# ) PARTITION BY RANGE (a, b);
> > CREATE TABLE
> > postgres=#
> > postgres=# CREATE TABLE part1 (
> > postgres(# a int NOT NULL CHECK (a = 1),
> > postgres(# b int NOT NULL CHECK (b >= 1 AND b <= 10)
> > postgres(# );
> > CREATE TABLE
> >
> > postgres=# ALTER TABLE range_parted ATTACH PARTITION part1 FOR VALUES
> FROM
> > (1, 1) TO (1, 10);
> > ALTER TABLE
> > postgres=# CREATE TABLE partr_def1 PARTITION OF range_parted DEFAULT;
> > CREATE TABLE
> >
> > postgres=# insert into range_parted values (1,NULL);
> > INSERT 0 1
> > postgres=# insert into range_parted values (NULL,9);
> > INSERT 0 1
>
> added.
>
>
>
>
>
> --
>
> Beena Emerson
>
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] Update comments in nodeModifyTable.c

2017-07-26 Thread Robert Haas
On Wed, Jun 14, 2017 at 10:40 PM, Etsuro Fujita
 wrote:
> Agreed, but I think it's better to add that detail to this comment in
> ExecInitModifyTable:
>
>  * Initialize the junk filter(s) if needed.  INSERT queries need a
> filter
>  * if there are any junk attrs in the tlist.  UPDATE and DELETE always
>  * need a filter, since there's always a junk 'ctid' or 'wholerow'
>  * attribute present --- no need to look first.
>
> I'd also like to propose to update the third sentence in this comment, since
> there isn't necessarily a ctid or wholerow in the UPDATE/DELETE tlist when
> the result relation is a foreign table.
>
> Attached is an updated version of the patch.

Well, now I'm confused:

  * Initialize the junk filter(s) if needed.  INSERT queries need a filter
  * if there are any junk attrs in the tlist.  UPDATE and DELETE always
  * need a filter, since there's always a junk 'ctid' or 'wholerow'
- * attribute present --- no need to look first.
+ * attribute present if not foreign table, and if foreign table, there
+ * are always junk attributes present the FDW needs to identify the exact
+ * row to update or delete --- no need to look first.  For foreign tables,
+ * there's also a wholerow attribute when the relation has a row-level
+ * trigger on UPDATE/DELETE but not on INSERT.

So the first part of the change weakens the assertion that a 'ctid' or
'wholerow' attribute will always be present by saying that an FDW may
instead have other attributes sufficient to identify the row.  But
then the additional sentence says that there will be a 'wholerow'
attribute after all.  So this whole change seems to me to be going
around in a circle.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testlib.pm vs msys

2017-07-26 Thread Tom Lane
Andrew Dunstan  writes:
> This made no difference. And I'm not really surprised, since the test is
> not hanging when run from an MSVC build.

Oh well.

> The latter fact makes me
> theorize that the problem arises from the fairly ancient perl that Msys
> provides, and which we need to use to run prove so the TAP tests
> understand the environment's virtualized file paths.

It might be interesting to look into its copy of IPC::Run and see if
the wait technology is different from later versions.

> The attached patch should get around the problem without upsetting the
> good work you've been doing in reducing TAP test times generally.

I still find this pretty ugly :-(.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Change in "policy" on dump ordering?

2017-07-26 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jul 25, 2017 at 10:45 PM, Tom Lane  wrote:
>> Instead, I've prepared the attached draft patch, which addresses the
>> problem by teaching pg_backup_archiver.c to process TOC entries in
>> three separate passes, "main" then ACLs then matview refreshes.

> What worries me a bit is the possibility that something might depend
> on a matview having already been refreshed.  I cannot immediately
> think of a case whether such a thing happens that you won't dismiss as
> wrong-headed, but how sure are we that none such will arise?

Um, there are already precisely zero guarantees about that.  pg_dump
has always been free to order these actions in any way that satisfies
the dependencies it knows about.

It's certainly possible to break it by introducing hidden dependencies
within CHECK conditions.  But that's always been true, with or without
materialized views, and we've always dismissed complaints about it with
"sorry, we won't support that".  (I don't think the trigger case is
such a problem, because we restore data before creating any triggers.)

Meanwhile, we have problems that bite people who aren't doing anything
stranger than having a matview owned by a non-superuser.  How do you
propose to fix that without reordering pg_dump's actions?

Lastly, the proposed patch has the advantage that it acts at the restore
stage, not when a dump is being prepared.  Since it isn't affecting what
goes into dump archives, it doesn't tie our hands if we think of some
better idea later.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refreshing subscription relation state inside a transaction block

2017-07-26 Thread Robert Haas
On Mon, Jun 19, 2017 at 4:30 AM, Masahiko Sawada  wrote:
>> I think that either of the options you suggested now would be better.
>> We'll need that for stopping the tablesync which is in progress during
>> DropSubscription as well as those will currently still keep running. I
>> guess we could simply just register syscache callback, the only problem
>> with that is we'd need to AcceptInvalidationMessages regularly while we
>> do the COPY which is not exactly free so maybe killing at the end of
>> transaction would be better (both for refresh and drop)?
>
> Attached patch makes table sync worker check its relation subscription
> state at the end of COPY and exits if it has disappeared, instead of
> killing table sync worker in DDL. There is still a problem that a
> table sync worker for a large table can hold a slot for a long time
> even after its state is deleted. But it would be for PG11 item.

Do we still need to do something about this?  Should it be an open item?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH v3] pg_progress() SQL function to monitor progression of long running SQL queries/utilities

2017-07-26 Thread Robert Haas
On Wed, Jun 21, 2017 at 10:01 AM, Remi Colinet  wrote:
> test=# SELECT  pid, ppid, bid, concat(repeat(' ', 3 * indent),name), value,
> unit FROM pg_progress(0,0);
>   pid  | ppid | bid |  concat  |  value   |  unit
> ---+--+-+--+--+-
>  14106 |0 |   4 | status   | query running|
>  14106 |0 |   4 | relationship | progression  |
>  14106 |0 |   4 |node name | Sort |
>  14106 |0 |   4 |sort status   | on tapes writing |
>  14106 |0 |   4 |completion| 0| percent
>  14106 |0 |   4 |relationship  | Outer|
>  14106 |0 |   4 |   node name  | Seq Scan |
>  14106 |0 |   4 |   scan on| t_10m|
>  14106 |0 |   4 |   fetched| 25049| block
>  14106 |0 |   4 |   total  | 83334| block
>  14106 |0 |   4 |   completion | 30   | percent
> (11 rows)
>
> test=#

Somehow I imagined that the output would look more like what EXPLAIN produces.

> If the one shared memory page is not enough for the whole progress report,
> the progress report transfert between the 2 backends is done with a series
> of request/response. Before setting the latch, the monitored backend write
> the size of the data dumped in shared memory and set a status to indicate
> that more data is to be sent through the shared memory page. The monitoring
> backends get the result and sends an other signal, and then wait for the
> latch again. The monitored backend does not collect a new progress report
> but continues to dump the already collected report. And the exchange goes on
> until the full progress report has been dumped.

This is basically what shm_mq does.  We probably don't want to
reinvent that code, as it has taken a surprising amount of debugging
to get it fully working.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server

2017-07-26 Thread Stephen Frost
All,

* Masahiko Sawada (sawada.m...@gmail.com) wrote:
> On Tue, Jul 25, 2017 at 4:43 AM, Michael Paquier
>  wrote:
> > On Mon, Jul 24, 2017 at 6:45 PM, Stephen Frost  wrote:
> >> What the change would do is make the pg_stop_backup() caller block until
> >> the last WAL is archvied, and perhaps that ends up taking hours, and
> >> then the connection is dropped for whatever reason and the backup fails
> >> where it otherwise what?  wouldn't have been valid anyway at that
> >> point, since it's not valid until the last WAL is actually archived.
> >> Perhaps eventually it would be archived and the caller was planning for
> >> that and everything is fine, but, well, that feels like an awful lot of
> >> wishful thinking.
> >
> > Letting users taking unconsciously inconsistent backups is worse than
> > potentially breaking scripts that were actually not working as
> > Postgres would expect. So I am +1 for back-patching a lighter version
> > of the proposed patch that makes the wait happen on purpose.
> >
> >>> > I'd hate to have to do it, but we could technically add a GUC to address
> >>> > this in the back-branches, no?  I'm not sure that's really worthwhile
> >>> > though..
> >>>
> >>> That would be mighty ugly.
> >>
> >> Oh, absolutely agreed.
> >
> > Yes, let's avoid that. We are talking about a switch aimed at making
> > backups potentially inconsistent.
> 
> Thank you for the review comments!
> Attached updated the patch. The noting in pg_baseback doc will be
> necessary for back branches if we decided to not back-patch it to back
> branches. So it's not contained in this patch for now.
> 
> Regarding back-patching this to back branches, I also vote for
> back-patching to back branches. Or we can fix the docs of back
> branches and fix the code only in PG10. I expect that the user who
> wrote a backup script has done enough functional test and dealt with
> this issue somehow, but since the current doc clearly says that
> pg_stop_backup() waits for all WAL to be archived we have to make a
> consideration about there are users who wrote a wrong backup script.
> So I think we should at least notify it in the minor release.

That sounds like we have at least three folks thinking that this should
be back-patched, and one who doesn't.  Does anyone else wish to voice an
opinion regarding back-patching this..?

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-07-26 Thread Robert Haas
On Tue, Apr 18, 2017 at 4:25 AM, Pavan Deolasee
 wrote:
> I'll include the fix in the next set of patches.

I haven't see a new set of patches.  Are you intending to continue
working on this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testlib.pm vs msys

2017-07-26 Thread Andrew Dunstan


On 07/25/2017 02:45 PM, Andrew Dunstan wrote:
>> Anyway, if we believe that it broke with f13ea95f9, hen it's plausible
>> that CMD.EXE has been sharing pg_ctl's stdout/stderr all along, and we
>> just had not noticed before.  Could you check what happens if we
>> change the bInheritHandles parameter as I suggested upthread?
>>
>>  
>
> I'll try when I get all the animals caught up.
>


This made no difference. And I'm not really surprised, since the test is
not hanging when run from an MSVC build. The latter fact makes me
theorize that the problem arises from the fairly ancient perl that Msys
provides, and which we need to use to run prove so the TAP tests
understand the environment's virtualized file paths.


The attached patch should get around the problem without upsetting the
good work you've been doing in reducing TAP test times generally.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/src/bin/pg_ctl/t/001_start_stop.pl b/src/bin/pg_ctl/t/001_start_stop.pl
index 9c3551f..3acc80e 100644
--- a/src/bin/pg_ctl/t/001_start_stop.pl
+++ b/src/bin/pg_ctl/t/001_start_stop.pl
@@ -32,9 +32,17 @@ else
 	print $conf "listen_addresses = '127.0.0.1'\n";
 }
 close $conf;
-command_like([ 'pg_ctl', 'start', '-D', "$tempdir/data",
-	'-l', "$TestLib::log_path/001_start_stop_server.log" ],
-	qr/done.*server started/s, 'pg_ctl start');
+my $ctlcmd = [ 'pg_ctl', 'start', '-D', "$tempdir/data",
+			   '-l', "$TestLib::log_path/001_start_stop_server.log" ];
+if ($Config{osname} ne 'msys')
+{
+	command_like($ctlcmd, qr/done.*server started/s, 'pg_ctl start');
+}
+else
+{
+	# use the version of command_like that doesn't hang on Msys here
+	command_like_safe($ctlcmd, qr/done.*server started/s, 'pg_ctl start');
+}
 
 # sleep here is because Windows builds can't check postmaster.pid exactly,
 # so they may mistake a pre-existing postmaster.pid for one created by the
diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm
index fe09689..a2df156 100644
--- a/src/test/perl/TestLib.pm
+++ b/src/test/perl/TestLib.pm
@@ -37,6 +37,7 @@ our @EXPORT = qw(
   program_version_ok
   program_options_handling_ok
   command_like
+  command_like_safe
   command_fails_like
 
   $windows_os
@@ -300,6 +301,23 @@ sub command_like
 	like($stdout, $expected_stdout, "$test_name: matches");
 }
 
+sub command_like_safe
+{
+	# Doesn't rely on on detecting end of file on the file descriptors,
+	# which can fail, causing the process to hang, notably on Msys
+	# when used with 'pg_ctl start'
+	my ($cmd, $expected_stdout, $test_name) = @_;
+	my ($stdout, $stderr);
+	print("# Running: " . join(" ", @{$cmd}) . "\n");
+	my $result = IPC::Run::run $cmd, '>', 'stdout.txt', '2>', 'stderr.txt';
+	$stdout = slurp_file('stdout.txt');
+	$stderr = slurfp_file('stderr.txt');
+	unlink 'stdout.txt','stderr.txt';
+	ok($result, "$test_name: exit code 0");
+	is($stderr, '', "$test_name: no stderr");
+	like($stdout, $expected_stdout, "$test_name: matches");
+}
+
 sub command_fails_like
 {
 	my ($cmd, $expected_stderr, $test_name) = @_;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade failed with error - ERROR: column "a" in child table must be marked NOT NULL

2017-07-26 Thread Michael Paquier
On Wed, Jul 26, 2017 at 12:09 PM, tushar  wrote:
> v9.5/9.6
>
> create these objects -
> CREATE TABLE constraint_rename_test (a int CONSTRAINT con1 CHECK (a > 0), b
> int, c int);
> CREATE TABLE constraint_rename_test2 (a int CONSTRAINT con1 CHECK (a > 0), d
> int) INHERITS (constraint_rename_test);
> ALTER TABLE constraint_rename_test ADD CONSTRAINT con3 PRIMARY KEY (a);
>
> v9.6/v10 - run pg_upgrade
>
> pg_restore: creating COMMENT "SCHEMA "public""
> pg_restore: creating TABLE "public.constraint_rename_test"
> pg_restore: creating TABLE "public.constraint_rename_test2"
> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> pg_restore: [archiver (db)] Error from TOC entry 351; 1259 16388 TABLE
> constraint_rename_test2 edb
> pg_restore: [archiver (db)] could not execute query: ERROR:  column "a" in
> child table must be marked NOT NULL
> Command was:
> -- For binary upgrade, must preserve pg_type oid
> SELECT
> pg_catalog.binary_upgrade_set_next_pg_type_oid('16390'::pg_catalog.oid);
>
> manually i am able to create all these objects .

You can go further down to reproduce the failure of your test case. I
can see the same thing with at least 9.3, which is as far as I tested,
for any upgrade combinations up to HEAD. And I would imagine that this
is an old behavior.

The documentation says the following:
https://www.postgresql.org/docs/9.3/static/ddl-inherit.html
All check constraints and not-null constraints on a parent table are
automatically inherited by its children. Other types of constraints
(unique, primary key, and foreign key constraints) are not inherited.

When adding a primary key, the system does first SET NOT NULL on each
column under cover, but this does not get spread to the child tables
as this is a PRIMARY KEY. The question is, do we want to spread the
NOT NULL constraint created by the PRIMARY KEY command to the child
tables, or not? It is easy enough to fix your problem by switching the
second and third commands in your test case, still I think that the
current behavior is non-intuitive, and that we ought to fix this by
applying NOT NULL to the child's columns when a primary key is added
to the parent. Thoughts?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Change in "policy" on dump ordering?

2017-07-26 Thread Robert Haas
On Tue, Jul 25, 2017 at 10:45 PM, Tom Lane  wrote:
> Instead, I've prepared the attached draft patch, which addresses the
> problem by teaching pg_backup_archiver.c to process TOC entries in
> three separate passes, "main" then ACLs then matview refreshes.
> It's survived light testing but could doubtless use further review.

What worries me a bit is the possibility that something might depend
on a matview having already been refreshed.  I cannot immediately
think of a case whether such a thing happens that you won't dismiss as
wrong-headed, but how sure are we that none such will arise?

I mean, a case that would actually break is if you had a CHECK
constraint or a functional index that contained a function which
referenced a materialized view for some validation or transformation
purpose.  Then it wouldn't be formally immutable, of course.  But
maybe we can imagine that some other case not involving lying could
exist, or come to exist.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump issues

2017-07-26 Thread Дмитрий Воронин
Hello, 

25.07.2017, 11:17, "Victor Drobny" :
>
> Hello,
>
> Do you expect to have some flag like '--rename=test->test2'?

Yes, I do. 

> Will dump with test replaced by test2(of course only in related places)
> be valid dump in this case?

Yes, it will.

> What is the possible scenario for the renaming option? Is it doing to be
> dumping of the one schema only?
> Or it could be dump of database? In this case pg_dump should also
> support multiple rules for renaming.

pg_dump scans dumped objects and rename them by rules, that could be set by 
pg_dump argument line options.
As I now, pg_dump and pg_restore use the same functions, so, renaming mechanism 
can be integrated in pg_restore too.
pg_restore will scan dumped objects and rename them by rules.

In future, rules could be applied on all database objects.

>
> Thank you for attention!
>
> --
> --
> Victor Drobny
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company

-- 
Best regards, Dmitry Voronin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: psql: check env variable PSQL_PAGER

2017-07-26 Thread Dagfinn Ilmari Mannsåker
Pavel Stehule  writes:

> Hi
>
> I wrote a special pager for psql. Surely, this pager is not good for paging
> of man pages. So is not good to set it as global pager. We can introduce
> new env variable PSQL_PAGER for this purpose. It can work similar like
> PSQL_EDITOR variable.
>
> Notes, comments?

There's already the option to put '\setenv PAGER ...' in .psqlrc, but I
guess having it work like EDITOR makes sense for consistency, as well as
not affecting commands spawned from psql via e.g. \!.

- ilmari
-- 
"The surreality of the universe tends towards a maximum" -- Skud's Law
"Never formulate a law or axiom that you're not prepared to live with
 the consequences of."  -- Skud's Meta-Law


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Log LDAP "diagnostic messages"?

2017-07-26 Thread Ashutosh Bapat
On Wed, Jul 26, 2017 at 6:51 AM, Thomas Munro
 wrote:
> Hi hackers,
>
> Some LDAP error codes are a bit vague.  For example:
>
>LDAP_CONNECT_ERROR  Indicates a connection problem.
>LDAP_PROTOCOL_ERROR A protocol violation was detected.
>
> To learn more, you have to call
> ldap_get_option(LDAP_OPT_DIAGNOSTIC_MESSAGE).  Should we do that?  For
> example, instead of:
>
>   LOG:  could not start LDAP TLS session: Protocol error
>
> ... you could see:
>
>   LOG:  could not start LDAP TLS session: Protocol error
>   DETAIL:  LDAP diagnostic message: unsupported extended operation
>
> Well, that may not be the most illuminating example, but that's a
> message sent back by the LDAP server that we're currently throwing
> away, and can be used to distinguish between unsupported TLS versions,
> missing StartTLS extension and various other cases.  Perhaps that
> particular message would also be available via your LDAP server's
> logs, if you can access them, but in some cases we're throwing away
> client-side messages that are not available anywhere else like "TLS:
> unable to get CN from peer certificate", "TLS: hostname does not match
> CN in peer certificate" and more.
>

+1.

> Something like the attached.

The patch prints errdetail() as "No LDAP diagnostic message
available." when LDAP doesn't provide diagnostics. May be some error
messages do not have any diagnostic information. In that case above
error detail may be confusing. May be we should just omit error
details when diagnostic message is not available.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_upgrade failed with error - ERROR: column "a" in child table must be marked NOT NULL

2017-07-26 Thread tushar

v9.5/9.6

create these objects -
CREATE TABLE constraint_rename_test (a int CONSTRAINT con1 CHECK (a > 
0), b int, c int);
CREATE TABLE constraint_rename_test2 (a int CONSTRAINT con1 CHECK (a > 
0), d int) INHERITS (constraint_rename_test);

ALTER TABLE constraint_rename_test ADD CONSTRAINT con3 PRIMARY KEY (a);

v9.6/v10 - run pg_upgrade

pg_restore: creating COMMENT "SCHEMA "public""
pg_restore: creating TABLE "public.constraint_rename_test"
pg_restore: creating TABLE "public.constraint_rename_test2"
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 351; 1259 16388 TABLE 
constraint_rename_test2 edb
pg_restore: [archiver (db)] could not execute query: ERROR:  column "a" 
in child table must be marked NOT NULL

Command was:
-- For binary upgrade, must preserve pg_type oid
SELECT 
pg_catalog.binary_upgrade_set_next_pg_type_oid('16390'::pg_catalog.oid);


manually i am able to create all these objects .

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] expand_dbname in postgres_fdw

2017-07-26 Thread Ashutosh Bapat
On Wed, Jul 26, 2017 at 12:17 AM, Arseny Sher  wrote:
> Hi,
>
> Is there any particular reason why postgres_fdw forbids usage of
> connection strings by passing expand_dbname = false to
> PQconnectdbParams? Attached patch shows where this happens.
>

According to F.34.1.1 at [1] passing connection string as dbname
option should work, so your question is valid. I am not aware of any
discussion around this on hackers. Comments in connect_pg_server()
don't help either. But I guess, we expect users to set up individual
foreign server and user mapping options instead of putting those in a
connection string. I can not think of any reason except that it
improves readability. If postgres_fdw wants to take certain actions
based on the values of individual options, having them separate is
easier to handle than parsing them out of a connection string.

Any way, if we are not going to change current behaviour, we should
change the documentation and say that option dbname means "database
name" and not a connection string.

[1] 
https://www.postgresql.org/docs/10/static/postgres-fdw.html#idm44880567492496
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-07-26 Thread Rafia Sabih
On Wed, Jul 26, 2017 at 12:02 PM, Ashutosh Bapat
 wrote:
>
> Ok. If those queries have equi-join between partitioned tables and are
> not picking up partition-wise join, that case needs to be
> investigated. Q21 for example has join between three lineitem
> instances. Those joins can be executed by partition-wise join. But it
> may so happen that optimal join order doesn't join partitioned tables
> with each other, thus interleaving partitioned tables with
> unpartitioned or differently partitioned tables in join order.
> Partition-wise join is not possible then. A different partitioning
> scheme may be required there.
>
Good point, will look into this direction as well.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-07-26 Thread Rafia Sabih
On Wed, Jul 26, 2017 at 10:38 AM, Ashutosh Bapat
 wrote:
>
> On Tue, Jul 25, 2017 at 9:39 PM, Dilip Kumar  wrote:
> > On Tue, Jul 25, 2017 at 8:59 PM, Robert Haas  wrote:
> >> On Tue, Jul 25, 2017 at 1:31 AM, Rafia Sabih
> >>  wrote:
> >>> - other queries show a good 20-30% improvement in performance. Performance
> >>> numbers are as follows,
> >>>
> >>> Query| un_part_head (seconds) | part_head (seconds) | part_patch 
> >>> (seconds) |
> >>> 3 | 76 |127 | 88 |
> >>> 4 |17 | 244 | 41 |
> >>> 5 | 52 | 123 | 84 |
> >>> 7 | 73 | 134 | 103 |
> >>> 10 | 67 | 111 | 89 |
> >>> 12 | 53 | 114 | 99 |
> >>> 18 | 447 | 709 | 551 |
> >>
> >> Hmm.  This certainly shows that benefit of the patch, although it's
> >> rather sad that we're still slower than if we hadn't partitioned the
> >> data in the first place.  Why does partitioning hurt performance so
> >> much?
> >
> > I was analysing some of the plans (without partition and with
> > partition), Seems like one of the reasons of performing worse with the
> > partitioned table is that we can not use an index on the partitioned
> > table.
> >
> > Q4 is taking 17s without partition whereas it's taking 244s with partition.
> >
> > Now if we analyze the plan
> >
> > Without partition, it can use parameterize index scan on lineitem
> > table which is really big in size. But with partition, it has to scan
> > this table completely.
> >
> >   ->  Nested Loop Semi Join
> >  ->  Parallel Bitmap Heap Scan on orders
> >->  Bitmap Index Scan on
> > idx_orders_orderdate  (cost=0.00..24378.88 r
> >->  Index Scan using idx_lineitem_orderkey on
> > lineitem  (cost=0.57..29.29 rows=105 width=8) (actual
> > time=0.031..0.031 rows=1 loops=1122364)
> >Index Cond: (l_orderkey =
> > orders.o_orderkey)
> >Filter: (l_commitdate < 
> > l_receiptdate)
> >Rows Removed by Filter: 1
> >
>
> If the partitions have the same indexes as the unpartitioned table,
> planner manages to create parameterized plans for each partition and
> thus parameterized plan for the whole partitioned table. Do we have
> same indexes on unpartitioned table and each of the partitions? The

Yes both lineitem and orders have same number of partitions viz 17 and
on the same partitioning key (*_orderkey) and same ranges for each
partition. However, I missed creating the index on o_orderdate for the
partitions. But on creating it as well, the plan with bitmap heap scan
is used and it still completes in some 200 seconds, check the attached
file for the query plan.

> difference between the two cases is the parameterized path on an
> unpartitioned table scans only one index whereas that on the
> partitioned table scans the indexes on all the partitions. My guess is
> the planner thinks those many scans are costlier than hash/merge join
> and chooses those strategies over parameterized nest loop join. In
> case of partition-wise join, only one index on the inner partition is
> involved and thus partition-wise join picks up parameterized nest loop
> join. Notice, that this index is much smaller than the index on the
> partitioned table, so the index scan will be a bit faster. But only a
> bit, since the depth of the index doesn't increase linearly with the
> size of index.
>
As I have observed, the thing with this query is that selectivity
estimation is too high than actual, now when index scan is chosen for
lineitem being in the inner side of NLJ, the query completes quickly
since the number of actual returned rows is too low. However, in case
we pick seq scan, or lineitem is on the outer side, the query is going
to take a really long time. Now, when Hash-Join is picked in the case
of partitioned database and no partition-wise join is available, seq
scan is preferred instead of index scan and hence the elongated query
execution time.

I tried this query with random_page_cost = 0 and forcing NLJ and the
chosen plan completes the query in 45 seconds, check the attached file
for explain analyse output.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


q4_idx_orderdate.out
Description: Binary data


Q4_low_random_page_cost.out
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: psql: check env variable PSQL_PAGER

2017-07-26 Thread Julien Rouhaud
On Wed, Jul 26, 2017 at 7:11 AM, Pavel Stehule  wrote:
> Hi
>
> I wrote a special pager for psql. Surely, this pager is not good for paging
> of man pages. So is not good to set it as global pager. We can introduce new
> env variable PSQL_PAGER for this purpose. It can work similar like
> PSQL_EDITOR variable.
>

+1.  I used to have a psql alias just for this, so it'd definitely be useful.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Parallel Hash take II

2017-07-26 Thread Thomas Munro
Hi hackers,

Here is a new version of my parallel-aware hash join patchset.  I've
dropped 'shared' from the feature name and EXPLAIN output since that's
now implied by the word "Parallel" (that only made sense in earlier
versions that had Shared Hash and Parallel Shared Hash, but a Shared
Hash with just one participant building it didn't turn out to be very
useful so I dropped it a few versions ago).  I figured for this new
round I should create a new thread, but took the liberty of copying
the CC list from the previous one[1].

The main changes are:

1.  Implemented the skew optimisation for parallel-aware mode.  The
general approach is the same as the regular hash table: insert with a
CAS loop.  The details of memory budget management are different
though.  It grants chunks of budget to participants as needed even
though allocation is still per-tuple, and it has to deal with
concurrent bucket removal.  I removed one level of indirection from
the skew hash table: in this version hashtable->skewBucket is an array
of HashSkewBucket instead of pointers to HashSkewBuckets allocated
separately.  That makes the hash table array twice as big but avoids
one pointer hop when probing an active bucket; that refactoring was
not strictly necessary but made the changes to support parallel build
simpler.

2.  Simplified costing.  There is now just one control knob
"parallel_synchronization_cost", which I charge for each time the
participants will wait for each other at a barrier, to be set high
enough to dissuade the planner from using Parallel Hash for tiny hash
tables that would be faster in a parallel-oblivious hash join.
Earlier ideas about modelling the cost of shared memory access didn't
work out.

Status:  I think there are probably some thinkos in the new skew
stuff.  I think I need some new ideas about how to refactor things so
that there isn't quite so much "if-shared-then-this-else-that".  I
think I should build some kind of test mode to control barriers so
that I can test the permutations of participant arrival phase
exhaustively.  I need to propose an empirically derived default for
the GUC.  There are several other details I would like to tidy up and
improve.  That said, I wanted to post what I have as a checkpoint now
that I have the major remaining piece (skew optimisation) more-or-less
working and the costing at a place that I think make sense.

I attach some queries to exercise various interesting cases.  I would
like to get something like these into fast-running regression test
format.

Note that this patch requires the shared record typmod patch[2] in
theory, since shared hash table tuples might reference bless record
types, but there is no API dependency so you can use this patch set
without applying that one.  If anyone knows how to actually provoke a
parallel hash join that puts RECORD types into the hash table, I'd be
very interested to hear about it, but certainly for TPC and similar
testing that other patch set is not necessary.

Of the TPC-H queries, I find that Q3, Q5, Q7, Q8, Q9, Q10, Q12, Q14,
Q16, Q18, Q20 and Q21 make use of Parallel Hash nodes (I tested with
neqjoinsel-fix-v3.patch[3] also applied, which avoids some but not all
craziness in Q21).  For examples that also include a
parallel-oblivious Hash see Q8 and Q10: in those queries you can see
the planner deciding that it's not worth paying
parallel_synchronization_cost = 10 to load the 25 row "nation" table.
I'll report on performance separately.

[1] 
https://www.postgresql.org/message-id/flat/CAEepm=2W=cokizxcg6qifqp-dhue09aqtremm7yjdrhmhdv...@mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kzwqk3g5ygn3mdv7a8da...@mail.gmail.com
[3] 
https://www.postgresql.org/message-id/CAEepm%3D3%3DNHHko3oOzpik%2BggLy17AO%2Bpx3rGYrg3x_x05%2BBr9-A%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com


parallel-hash-v16.patchset.tgz
Description: GNU Zip compressed data


hj-test-queries.sql
Description: Binary data


hj-skew.sql
Description: Binary data


hj-skew-unmatched.sql
Description: Binary data


hj-skew-overflow.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump does not handle indirectly-granted permissions properly

2017-07-26 Thread tushar

On 07/26/2017 02:12 AM, Tom Lane wrote:

AFAICT, pg_dump has no notion that it needs to be careful about the order
in which permissions are granted.  I did

regression=# create user joe;
CREATE ROLE
regression=# create user bob;
CREATE ROLE
regression=# create user alice;
CREATE ROLE
regression=# \c - joe
You are now connected to database "regression" as user "joe".
regression=> create table joestable(f1 int);
CREATE TABLE
regression=> grant select on joestable to alice with grant option;
GRANT
regression=> \c - alice
You are now connected to database "regression" as user "alice".
regression=> grant select on joestable to bob;
GRANT

and then pg_dump'd that.  The ACL entry for joestable looks like this:

--
-- TOC entry 5642 (class 0 OID 0)
-- Dependencies: 606
-- Name: joestable; Type: ACL; Schema: public; Owner: joe
--

SET SESSION AUTHORIZATION alice;
GRANT SELECT ON TABLE joestable TO bob;
RESET SESSION AUTHORIZATION;
GRANT SELECT ON TABLE joestable TO alice WITH GRANT OPTION;

Unsurprisingly, that fails to restore:

db2=# SET SESSION AUTHORIZATION alice;
SET
db2=> GRANT SELECT ON TABLE joestable TO bob;
ERROR:  permission denied for relation joestable


I am not certain whether this is a newly introduced bug or not.
However, I tried it in 9.2-9.6, and all of them produce the
GRANTs in the right order:

GRANT SELECT O
I am also getting the same error while doing pg_upgrade from v9.6 to v10 
,although  not able to reproduce v9.5->v9.6 pg_upgrade.


v9.6

CREATE TABLE deptest (f1 serial primary key, f2 text);
\set VERBOSITY default
CREATE USER regress_dep_user0;
CREATE USER regress_dep_user1;
CREATE USER regress_dep_user2;
SET SESSION AUTHORIZATION regress_dep_user0;
REASSIGN OWNED BY regress_dep_user0 TO regress_dep_user1;
REASSIGN OWNED BY regress_dep_user1 TO regress_dep_user0;
CREATE TABLE deptest1 (f1 int unique);
GRANT ALL ON deptest1 TO regress_dep_user1 WITH GRANT OPTION;
SET SESSION AUTHORIZATION regress_dep_user1;
GRANT ALL ON deptest1 TO regress_dep_user2;

v10 - run pg_upgrade.

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-07-26 Thread Amit Langote
On 2017/07/25 21:55, Rajkumar Raghuwanshi wrote:
> Got one more observation :  update... returning is not working with whole
> row reference. please take a look.
> 
> postgres=# create table part (a int, b int) partition by range(a);
> CREATE TABLE
> postgres=# create table part_p1 partition of part for values from
> (minvalue) to (0);
> CREATE TABLE
> postgres=# create table part_p2 partition of part for values from (0) to
> (maxvalue);
> CREATE TABLE
> postgres=# insert into part values (10,1);
> INSERT 0 1
> postgres=# insert into part values (20,2);
> INSERT 0 1
> postgres=# update part t1 set a = b returning t1;
> ERROR:  unexpected whole-row reference found in partition key

That looks like a bug which exists in HEAD too.  I posted a patch in a
dedicated thread to address the same [1].

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/9a39df80-871e-6212-0684-f93c83be4097%40lab.ntt.co.jp



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] map_partition_varattnos() and whole-row vars

2017-07-26 Thread Amit Langote
Rajkumar Raghuwanshi reported [1] on the "UPDATE partition key" thread
that whole-row vars don't play nicely with partitioned table operations.

For example, when used to convert WITH CHECK OPTION constraint expressions
and RETURNING target list expressions, it will error out if the
expressions contained whole-row vars.  That's a bug, because whole-row
vars are legal in those expressions.  I think the decision to output error
upon encountering a whole-row in the input expression was based on the
assumption that it will only ever be used to convert partition constraint
expressions, which cannot contain those.  So, we can relax that
requirement so that its other users are not bitten by it.

Attached fixes that.

Adding this to the PG 10 open items list.

Thanks,
Amit

[1]
https://postgr.es/m/CAKcux6%3Dz38gH4K6YAFi%2BYvo5tHTwBL4tam4VM33CAPZ5dDMk1Q%40mail.gmail.com
From b1954d7c2293bf9fc0d3ded30a742d7de0084082 Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 26 Jul 2017 16:45:46 +0900
Subject: [PATCH] Fix map_partition_varattnos to sometimes accept wholerow vars

It was thought that it would never encount wholerow vars in its input
expressions (partition constraint expressions for example).  But then
it was used to convert expressions where wholerow vars are legal, such
as, WCO constraint expressions and RETURNING target list members.  So,
add an argument to tell it whether or not to error on finding wholerows.

Adds test in insert.sql and updatable_views.sql.

Reported by: Rajkumar Raghuwanshi
Report: 
https://postgr.es/m/CAKcux6%3Dz38gH4K6YAFi%2BYvo5tHTwBL4tam4VM33CAPZ5dDMk1Q%40mail.gmail.com
---
 src/backend/catalog/partition.c   |  7 ---
 src/backend/commands/tablecmds.c  |  2 +-
 src/backend/executor/nodeModifyTable.c| 16 
 src/include/catalog/partition.h   |  3 ++-
 src/test/regress/expected/insert.out  | 10 ++
 src/test/regress/expected/updatable_views.out | 10 ++
 src/test/regress/sql/insert.sql   |  6 ++
 src/test/regress/sql/updatable_views.sql  |  9 +
 8 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index e20ddce2db..b419466f2e 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -904,7 +904,8 @@ get_qual_from_partbound(Relation rel, Relation parent,
  */
 List *
 map_partition_varattnos(List *expr, int target_varno,
-   Relation partrel, Relation 
parent)
+   Relation partrel, Relation 
parent,
+   bool wholerow_ok)
 {
AttrNumber *part_attnos;
boolfound_whole_row;
@@ -921,7 +922,7 @@ map_partition_varattnos(List *expr, int target_varno,

RelationGetDescr(parent)->natts,

_whole_row);
/* There can never be a whole-row reference here */
-   if (found_whole_row)
+   if (found_whole_row && !wholerow_ok)
elog(ERROR, "unexpected whole-row reference found in partition 
key");
 
return expr;
@@ -1825,7 +1826,7 @@ generate_partition_qual(Relation rel)
 * in it to bear this relation's attnos. It's safe to assume varno = 1
 * here.
 */
-   result = map_partition_varattnos(result, 1, rel, parent);
+   result = map_partition_varattnos(result, 1, rel, parent, false);
 
/* Save a copy in the relcache */
oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index bb00858ad1..e3eef88054 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -13738,7 +13738,7 @@ ATExecAttachPartition(List **wqueue, Relation rel, 
PartitionCmd *cmd)
constr = linitial(partConstraint);
tab->partition_constraint = (Expr *)
map_partition_varattnos((List *) constr, 1,
-   
part_rel, rel);
+   
part_rel, rel, false);
/* keep our lock until commit */
if (part_rel != attachRel)
heap_close(part_rel, NoLock);
diff --git a/src/backend/executor/nodeModifyTable.c 
b/src/backend/executor/nodeModifyTable.c
index 77ba15dd90..cd4cf137e0 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1989,10 +1989,14 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, 
int eflags)
List   *wcoExprs = NIL;
   

Re: [HACKERS] On Complex Source Code Reading Strategy

2017-07-26 Thread Albe Laurenz
Zeray Kalayu wrote:
> I want to be PG hacker but it seems a complex beast to find my way out in it.
>  So, can anyone suggest me from his experience/style the general
> approaches/techniques/strategies on how to read complex source code in
> general and PG in particular effectively.
> 
> Can you remember your being novice to PostgreSQL hacking and write
> your experience/style to me and probably to other novices as well.
> 
> There are some guidelines in PG wiki but they are far from the truth
> of down the rabbit hole.
> 
> I believe that there is plethora of knowledge and skill sets in the PG
> hackers head not to be found in any kind of written material and thus,
> that It would be great if any PG hacker could write his
> experience/style of PG hacking so that is of useful asset for PG
> hacking beginners.

I'm not the most competent hacker around, but maybe that qualifies
me to answer your question :^)

I assume that you have read the documentation and know your
way around PostgreSQL.

Find something that you need and want to work on - maybe some
shortcoming or missing feature of PostgreSQL.  The TODO list
(https://wiki.postgresql.org/wiki/Todo) may give some inspiration,
but be cautioned that it contains mostly things that nobody
could reach consensus on or implement easily.

Best is to work on something that serves your own need.
You don't have to work on core PostgreSQL, it could also be a
server extension.

If you have some project, you have an angle from which to tackle
the large body of code that PostgreSQL is.
As always, start with a design. Ask the list before you start coding.

Another good and commendable path into the source is to review
patches (https://wiki.postgresql.org/wiki/Reviewing_a_Patch).
This is higly appreciated, because there is always a shortage of
reviewers, and you get to see how other people go about changing the code.
There is a lot to learn this way!

You will find that the PostgreSQL source is mostly well written,
well documented and well structured.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] On Complex Source Code Reading Strategy

2017-07-26 Thread Zeray Kalayu
Hello Dear PostgreSQL Hackers,

I want to be PG hacker but it seems a complex beast to find my way out in it.
 So, can anyone suggest me from his experience/style the general
approaches/techniques/strategies on how to read complex source code in
general and PG in particular effectively.

Can you remember your being novice to PostgreSQL hacking and write
your experience/style to me and probably to other novices as well.

There are some guidelines in PG wiki but they are far from the truth
of down the rabbit hole.

I believe that there is plethora of knowledge and skill sets in the PG
hackers head not to be found in any kind of written material and thus,
that It would be great if any PG hacker could write his
experience/style of PG hacking so that is of useful asset for PG
hacking beginners.

Regards,
Zeray


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-07-26 Thread Etsuro Fujita

On 2017/07/26 6:07, Robert Haas wrote:

On Thu, Jul 13, 2017 at 1:09 PM, Amit Khandekar  wrote:

Attached is a WIP patch (make_resultrels_ordered.patch) that generates
the result rels in canonical order. This patch is kept separate from
the update-partition-key patch, and can be applied on master branch.


Thank you for working on this, Amit!


Hmm, I like the approach you've taken here in general,


+1 for the approach.


Is there any real benefit in this "walker" interface?  It looks to me
like it might be simpler to just change things around so that it
returns a list of OIDs, like find_all_inheritors, but generated
differently.  Then if you want bound-ordering rather than
OID-ordering, you just do this:

list_free(inhOids);
inhOids = get_partition_oids_in_bound_order(rel);

That'd remove the need for some if/then logic as you've currently got
in get_next_child().


Yeah, that would make the code much simple, so +1 for Robert's idea.


I think we should always expand in bound order rather than only when
it's a result relation. I think for partition-wise join, we're going
to want to do it this way for all relations in the query, or at least
for all relations in the query that might possibly be able to
participate in a partition-wise join.  If there are multiple cases
that are going to need this ordering, it's hard for me to accept the
idea that it's worth the complexity of trying to keep track of when we
expanded things in one order vs. another.  There are other
applications of having things in bound order too, like MergeAppend ->
Append strength-reduction (which might not be legal anyway if there
are list partitions with multiple, non-contiguous list bounds or if
any NULL partition doesn't end up in the right place in the order, but
there will be lots of cases where it can work).


+1 for that as well.  Another benefit from that would be EXPLAIN; we 
could display partitions for a partitioned table in the same order for 
Append and ModifyTable (ie, SELECT/UPDATE/DELETE), which I think would 
make the EXPLAIN result much readable.


Best regards,
Etsuro Fujita



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 10 (latest beta) and older ICU

2017-07-26 Thread Victor Wagner
On Tue, 25 Jul 2017 16:50:38 -0400
Tom Lane  wrote:

> Victor Wagner  writes:
> > PostgreSQL 10 when build with --with-icu requires ICU 4.6 and above.
> > (because it searches for libicu using pkgconfig, and pgkconfig
> > support significantly changed in ICU  version 4.6)  
> 
> > However, there are some reasons to build PostgreSQL with older
> > versions of ICU.   
> 
> No doubt, but making that happen seems like a feature, and IMO we're
> too late in the v10 beta test cycle to be taking new features on

May be PGDG people could integrate it as a patch for RPMS for
particular systems which are affected by the problem.

I'd rather say that it is bugfix. Relaxing too strict checking.
If we choose approach to allow non-strict language tags, it is oneline
patch.

If we choose more safe approach to ignore non-strict collations, it
would take about five lines 
1. Replace one ereport(ERROR with ereport(WARINING
2. Return special value (NULL) after issuing this warning
3. Handle this special value in calling function by continuing to next
loop iteration
4,5 - curly braces around 1 and 2. see patch at the end

> board, especially ones with inherently-large portability risks.  We

I don't think that there are so large portability risks. Patch can be
done such way that it would behave exactly as now if ICU version is 4.6
or above. Moreover, it is easy to make old ICU support separate
configure option (because another configure test is needed anyway).

> could maybe consider patches for this for v11 ... but will anyone
> still care by then?

Undoubtedly will. v11 is expedted to be released in 2018. And RHEL 6
is supported until 2020, and extended support would end in Nov 2023.

And it is possible that some derived distribution would last longer.


> (Concretely, my concern is that the alpha and beta testing that's
> happened so far hasn't covered pre-4.6 ICU at all.  I'd be surprised
> if the incompatibility you found so far is the only one.)

diff --git a/src/backend/commands/collationcmds.c 
b/src/backend/commands/collationcmds.c
index b6c14c9..9e5da98 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -442,11 +442,13 @@ get_icu_language_tag(const char *localename)
 
status = U_ZERO_ERROR;
uloc_toLanguageTag(localename, buf, sizeof(buf), TRUE, );
-   if (U_FAILURE(status))
-   ereport(ERROR,
+   if (U_FAILURE(status)) 
+   {
+   ereport(WARNING,
(errmsg("could not convert locale name \"%s\" 
to language tag: %s",
localename, 
u_errorName(status;
-
+   return NULL;
+   }
return pstrdup(buf);
 }

@@ -733,6 +735,10 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
char   *localeid = 
psprintf("%s@collation=%s", name, val);
 
langtag = get_icu_language_tag(localeid);
+   if (langtag == NULL) {
+   /* No such collation in library, skip */
+   continue;
+   }
collcollate = U_ICU_VERSION_MAJOR_NUM >= 54 ? 
langtag : localeid;
 
/*






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-07-26 Thread Ashutosh Bapat
On Wed, Jul 26, 2017 at 11:08 AM, Rafia Sabih
 wrote:
>
>
> On Wed, Jul 26, 2017 at 11:06 AM, Ashutosh Bapat
>  wrote:
>>
>> On Wed, Jul 26, 2017 at 11:00 AM, Rafia Sabih
>>  wrote:
>> >
>> >
>> > On Wed, Jul 26, 2017 at 10:58 AM, Ashutosh Bapat
>> >  wrote:
>> >>
>> >> On Tue, Jul 25, 2017 at 11:01 AM, Rafia Sabih
>> >>  wrote:
>> >>
>> >> > Query plans for the above mentioned queries is attached.
>> >> >
>> >>
>> >> Can you please share plans for all the queries, even if they haven't
>> >> chosen partition-wise join when run on partitioned tables with
>> >> enable_partition_wise_join ON? Also, please include the query in
>> >> explain analyze output using -a or -e flats to psql. That way we will
>> >> have query and its plan in the same file for ready reference.
>> >>
>> > I didn't run the query not using partition-wise join, for now.
>>
>> parse-parse error, sorry. Do you mean, you haven't run the queries
>> which do not use partition-wise join?
>>
> Yes, that's what I mean.

Ok. If those queries have equi-join between partitioned tables and are
not picking up partition-wise join, that case needs to be
investigated. Q21 for example has join between three lineitem
instances. Those joins can be executed by partition-wise join. But it
may so happen that optimal join order doesn't join partitioned tables
with each other, thus interleaving partitioned tables with
unpartitioned or differently partitioned tables in join order.
Partition-wise join is not possible then. A different partitioning
scheme may be required there.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cache lookup failed error for partition key with custom opclass

2017-07-26 Thread Rushabh Lathia
On Tue, Jul 25, 2017 at 7:43 PM, Tom Lane  wrote:

> Rushabh Lathia  writes:
> > On Mon, Jul 24, 2017 at 7:23 PM, Tom Lane  wrote:
> >> Some looking around says that this *isn't* the only place that just
> >> blithely assumes that it will find an opfamily entry.  But I agree
> >> that not checking for that isn't up to project standards.
>
> > I go thorough the get_opfamily_proc() in the system and added the
> > check for InvalidOid.
>
> Think I did that already, please compare your results with
> 278cb4341103e967189997985b09981a73e23a34
>

Thanks Tom.


>
> regards, tom lane
>



-- 
Rushabh Lathia


Re: [HACKERS] [BUGS] BUG #14759: insert into foreign data partitions fail

2017-07-26 Thread Amit Langote
On 2017/07/25 9:43, David G. Johnston wrote:
> On Mon, Jul 24, 2017 at 5:19 PM, Amit Langote > wrote:
> 
>> On 2017/07/25 6:28, mtun...@gmail.com wrote:
>>> The following bug has been logged on the website:
>>>
>>> Bug reference:  14759
>>> Logged by:  Murat Tuncer
>>> Email address:  mtun...@gmail.com
>>> PostgreSQL version: 10beta2
>>> Operating system:   Mac 10.12.6
>>> Description:
>>>
>>> I got
>>>
>>> ERROR:  cannot route inserted tuples to a foreign table
>>
>> Inserting tuples into a partitioned table that will route to one of its
>> foreign table partitions is unsupported in PG 10.  The limitation is
>> mentioned on the following page:
>> https://www.postgresql.org/docs/devel/static/ddl-partitioning.html
> 
> 
> It would be nice to also note this limitation here:
> 
> https://www.postgresql.org/docs/devel/static/sql-createforeigntable.html

Yeah, I thought the same when writing my previous email.

> Also, the ddl-partitioning.html page has a section "5.10.2.3.
> Limitations".  Moving (or duplicating maybe) the existing comment on that
> page in that section would make finding out about this limitation a bit
> easier.

Yeah, perhaps.

> I'd probably move (and rework) the "limitation wording" to the limitation
> sections and do something like the following in the main section.
> 
> "Foreign Tables can be added to a partitioning structure but inserts to the
> partitioned table will fail if they are routed to a foreign table
> partition.  Direct writes to the foreign table, and partition reads, work
> normally."

Done that in the attached.

> I'm curious what the other limitations are...

When I first wrote that documentation line (I am assuming you're asking
about "although these have some limitations that normal tables do not"), I
was thinking about the fact that the core system does not enforce
(locally) any constraints defined on foreign tables.  Since we allow
inserting data into partitions directly, it is imperative that we enforce
the "partition constraint" along with the traditional constraints such as
NOT NULL and CHECK constraints, which we can do for local table partitions
but not for foreign table ones.

Anyway, attached patch documents all these limitations about foreign table
partitions more prominently.

Thanks,
Amit
From 7ba72f024223cbe1a8e1da1220b8f8efb8e8f215 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 3 Apr 2017 16:45:15 +0900
Subject: [PATCH] Clarify that partition constraint is not enforced on foreign
 tables

---
 doc/src/sgml/ddl.sgml  | 15 ---
 doc/src/sgml/ref/create_foreign_table.sgml | 17 +++--
 2 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index b05a9c2150..e7a10e15d3 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -2988,9 +2988,10 @@ VALUES ('Albany', NULL, NULL, 'NY');

 Partitions can also be foreign tables
 (see ),
-although these have some limitations that normal tables do not.  For
-example, data inserted into the partitioned table is not routed to
-foreign table partitions.
+although they have some limitations that normal tables do not.  For
+example, routing the data inserted into the partitioned table to foreign
+table partitions is not supported, nor are the partition constraints
+enforced when the data is directly inserted into them.

 

@@ -3297,6 +3298,14 @@ ALTER TABLE measurement ATTACH PARTITION 
measurement_y2008m02
not the partitioned table.
   
  
+
+ 
+  
+   Routing tuples to partitions that are foreign tables is not supported.
+   So, if an inserted tuple routes to one of the foreign partitions, an
+   error will occur.
+  
+ 
 
 
 
diff --git a/doc/src/sgml/ref/create_foreign_table.sgml 
b/doc/src/sgml/ref/create_foreign_table.sgml
index 065c982082..12087ec05c 100644
--- a/doc/src/sgml/ref/create_foreign_table.sgml
+++ b/doc/src/sgml/ref/create_foreign_table.sgml
@@ -79,7 +79,9 @@ CHECK ( expression ) [ NO INHERIT ]
   
If PARTITION OF clause is specified then the table is
created as a partition of parent_table with specified
-   bounds.
+   bounds.  Note that routing tuples to partitions that are foreign tables
+   is not supported. So, if an inserted tuple routes to one of foreign
+   partitions, an error will occur.
   
 
   
@@ -279,16 +281,19 @@ CHECK ( expression ) [ NO INHERIT ]
   Notes
 

-Constraints on foreign tables (such as CHECK
-or NOT NULL clauses) are not enforced by the
-core PostgreSQL system, and most foreign data wrappers
-do not attempt to enforce them either; that is, the constraint is
+Constraints (both the user-defined constraints such as CHECK
+or NOT NULL clauses and the partition constraint) are not
+enforced by the core PostgreSQL system, and most foreign
+data wrappers do not 

Re: [HACKERS] UPDATE of partition key

2017-07-26 Thread Amit Langote
On 2017/07/26 6:07, Robert Haas wrote:
> On Thu, Jul 13, 2017 at 1:09 PM, Amit Khandekar  
> wrote:
>> Attached is a WIP patch (make_resultrels_ordered.patch) that generates
>> the result rels in canonical order. This patch is kept separate from
>> the update-partition-key patch, and can be applied on master branch.
>
> I suspect this isn't correct for a table that contains wCTEs, because
> there would in that case be multiple result relations.
> 
> I think we should always expand in bound order rather than only when
> it's a result relation. I think for partition-wise join, we're going
> to want to do it this way for all relations in the query, or at least
> for all relations in the query that might possibly be able to
> participate in a partition-wise join.  If there are multiple cases
> that are going to need this ordering, it's hard for me to accept the
> idea that it's worth the complexity of trying to keep track of when we
> expanded things in one order vs. another.  There are other
> applications of having things in bound order too, like MergeAppend ->
> Append strength-reduction (which might not be legal anyway if there
> are list partitions with multiple, non-contiguous list bounds or if
> any NULL partition doesn't end up in the right place in the order, but
> there will be lots of cases where it can work).

Sorry to be responding this late to the Amit's make_resultrel_ordered
patch itself, but I agree that we should teach the planner to *always*
expand partitioned tables in the partition bound order.

When working on something else, I ended up writing a prerequisite patch
that refactors RelationGetPartitionDispatchInfo() to not be too tied to
its current usage for tuple-routing, so that it can now be used in the
planner (for example, in expand_inherited_rtentry(), instead of
find_all_inheritors()).  If we could adopt that patch, we can focus on the
update partition row movement issues more closely on this thread, rather
than the concerns about the order that planner puts partitions into.

I checked that we get the same result relation order with both the
patches, but I would like to highlight a notable difference here between
the approaches taken by our patches.  In my patch, I have now taught
RelationGetPartitionDispatchInfo() to lock *only* the partitioned tables
in the tree, because we need to look at its partition descriptor to
collect partition OIDs and bounds.  We can defer locking (and opening the
relation descriptor of) leaf partitions to a point where planner has
determined that the partition will be accessed after all (not pruned),
which will be done in a separate patch of course.

Sorry again that I didn't share this patch sooner.

Thanks,
Amit
From 7a22aedc7c1ae8e1568745c99cf1d11d42cf59d9 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 24 Jul 2017 18:59:57 +0900
Subject: [PATCH 1/3] Decouple RelationGetPartitionDispatchInfo() from executor

Currently it and the structure it generates viz. PartitionDispatch
objects are too coupled with the executor's tuple-routing code.  That
include locking considerations and responsibilities for releasing
relcache references, etc.  That makes it useless for usage in other
places such as during planning.
---
 src/backend/catalog/partition.c| 326 +
 src/backend/commands/copy.c|  35 ++--
 src/backend/executor/execMain.c| 156 ++--
 src/backend/executor/nodeModifyTable.c |  29 ++-
 src/include/catalog/partition.h|  53 ++
 src/include/executor/executor.h|   4 +-
 src/include/nodes/execnodes.h  |  53 +-
 7 files changed, 409 insertions(+), 247 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index e20ddce2db..e07701d5e8 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -105,6 +105,24 @@ typedef struct PartitionRangeBound
boollower;  /* this is the lower (vs upper) 
bound */
 } PartitionRangeBound;
 
+/*---
+ * PartitionDispatchData - information of partitions of one partitioned table
+ *in a partition tree
+ *
+ * partkey Partition key of the table
+ * partdescPartition descriptor of the table
+ * indexes Array with partdesc->nparts members (for details on 
what the
+ * individual value represents, see the comments in
+ * RelationGetPartitionDispatchInfo())
+ *---
+ */
+typedef struct PartitionDispatchData
+{
+   PartitionKeypartkey;/* Points into the table's relcache 
entry */
+   PartitionDesc   partdesc;   /* Ditto */
+   int*indexes;
+} PartitionDispatchData;
+
 static int32 qsort_partition_list_value_cmp(const void *a, const void *b,