date:20170731

On Mon, Mar 20, 2017 at 11:47:03PM -0400, Noah Misch wrote:
> "pgbench -i -s 50; pgbench -S -j2 -c16 -T900 -P5" freezes consistently on
> Cygwin 2.2.1 and Cygwin 2.6.0.  (I suspect most other versions are affected.)
> I've pinged[1] the Cygwin bug thread with some additional detail.

The problem was cygserver thread exhaustion; cygserver needs a thread per
simultaneous waiter.  With "cygserver -r 40" or the equivalent config file
setting, this test does not freeze.  Cygwin 2.8.0 introduced a change to
dynamically grow the thread count:
https://cygwin.com/git/gitweb.cgi?p=newlib-cygwin.git;a=commitdiff;h=0b73dba4de3fdadde499edfbc7ca9d9a01c11487

However, Cygwin 2.8.0 introduced another source of cygserver freezes:
https://cygwin.com/git/gitweb.cgi?p=newlib-cygwin.git;a=commitdiff;h=b80b2c011936f7f075b76b6e59f9e8a5ec49caa1

The 2.8.0-specific freezes have no known workaround.  Cygwin 2.8.1 works,
having reverted the problem commit.  Do not use PostgreSQL with Cygwin 2.8.0.

> If a Cygwin
> buildfarm member starts using --enable-tap-tests, you may see failures in the
> pgbench test suite.  (lorikeet used --enable-tap-tests from 2017-03-18 to
> 2017-03-20, but it failed before reaching the pgbench test suite.)  Curious
> that "make check" has too little concurrency to see more effects from this.

I now understand the bug required eleven concurrent lock waiters, and it's
plausible that "make check" doesn't experience that.  The pgbench test suite
uses -c5, so I expect it to be stable on almost any Cygwin.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] segfault in HEAD when too many nested functions call

On Fri, Jul 28, 2017 at 02:42:06PM -0400, Robert Haas wrote:
> On Fri, Jul 28, 2017 at 12:29 PM, Noah Misch  wrote:
> > Your colleagues achieve compliance despite uncertainty; for inspiration, I
> > recommend examining Alvaro's status updates as examples of this.  The policy
> > currently governs your open items even if you disagree with it.

Thanks for resolving this open item.

> I emphatically agree with that.  If the RMT is to accomplish its
> purpose, it must be able to exert authority even when an individual
> contributor doesn't like the decisions it makes.
> 
> On the other hand, nothing in the open item policy the current RMT has
> adopted prohibits you from using judgement about when and how
> vigorously to enforce that policy in any particular case, and I would
> encourage you to do so.

I understand.  When it comes to open item regulation, the aspects that keep me
up at night are completeness and fairness.  Minimizing list traffic about
non-compliant open items is third priority at best.  Furthermore, it takes an
expensive and subjective calculation to determine whether a policy-violating
open item is progressing well.  I will keep an eye out for minor policy
violations that I can ignore cheaply and fairly.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Improve the performance of the standby server when dropping tables on the primary server

2017-07-31 Thread Tokuda, Takashi

Hi,

The attached patch changes data structure storing unowned SMgrRelation objects 
from list structure to hash structure.
The reason why I change it is that list structure very slowly removes a node.
And list structure takes longer time to remove a node than hash structure.

The problem was reported in BUG #14575.
https://www.postgresql.org/message-id/20170303023246.25054.66...@wrigleys.postgresql.org

In my customer's case, the standby server was delayed more than 20 minites 
when dropping many table at once.


 - Performance check

I confirmed the performance of dropping tables by the following method.

1. Set up a synchronous streaming replication environment.
   And set synchronous_commit = remote_apply in postgresql.conf.

2. Create 100,000 tables (tbl1, tbl2, ... , tbl10).
   And insert one row in each table.

3. Measure the time to drop 50 tables by psql

  $ time psql -d ${DATABSE} -p ${PORT} -f drop.sql 

drop.sql
--
begin;
drop table tbl1;
drop table tbl2;
...
drop table tbl50;
commit;
--

Result:
without this patch
real0m3.734s
user0m0.003s
sys 0m0.005s

with this patch
real0m1.292s
user0m0.005s
sys 0m0.003s

Even in this case, we have improved considerably, 
so I suggest you might approve it.

Regards,
Takashi Tokuda



change_data_structure_storing_unowned_SMgrRelation.patch
Description: change_data_structure_storing_unowned_SMgrRelation.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #14759: insert into foreign data partitions fail

On 2017/08/01 12:45, Etsuro Fujita wrote:
> On 2017/08/01 10:18, Amit Langote wrote:
>> Good points; fixed in the updated patch.
> 
> I should have mentioned this in an earlier mail, but one thing I noticed
> is this:
> 
> -the remote server.
> +the remote server.  That becomes especially important if the table is
> +being used in a partition hierarchy, where it is recommended to add
> +a constraint matching the partition constraint expression on
> +the remote table.
> 
> I think this would apply to CHECK constraints on foreign tables when
> implementing partitioning with inheritance.  Why do we only mention this
> for partition constraints?

One thing to consider might be that while a user can mark user-defined
CHECK constraints as being NOT VALID so that the planner doesn't consider
them during constraint exclusion, the same cannot be done for internally
generated partition constraints.

Maybe, (for time being?), the planner should be taught to disregard
foreign tables' partition constraint (if any) for constraint exclusion.

> Other than that, the patch looks good to me.

Thanks for the review.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Update comments in nodeModifyTable.c

2017-07-31 Thread Etsuro Fujita


On 2017/08/01 1:03, Robert Haas wrote:

On Fri, Jul 28, 2017 at 8:12 AM, Etsuro Fujita
 wrote:

On 2017/07/26 22:39, Robert Haas wrote:

So the first part of the change weakens the assertion that a 'ctid' or
'wholerow' attribute will always be present by saying that an FDW may
instead have other attributes sufficient to identify the row.


That's right.


But
then the additional sentence says that there will be a 'wholerow'
attribute after all.  So this whole change seems to me to be going
around in a circle.


What I mean by the additional one is: if the result table that is a foreign
table has a row-level UPDATE/DELETE trigger, a 'wholerow' will also be
present.  So, if the result table didn't have the trigger, there wouldn't be
'whole-row', so in that case it could be possible that there would be only
attributes other than 'ctid' and 'wholerow'.


I think maybe something like this would be clearer, then:

  /*
   * Initialize the junk filter(s) if needed.  INSERT queries need a filter
   * if there are any junk attrs in the tlist.  UPDATE and DELETE always
- * need a filter, since there's always a junk 'ctid' or 'wholerow'
- * attribute present --- no need to look first.
+ * need a filter, since there's always at least one junk attribute present
+ * --- no need to look first.  Typically, this will be a 'ctid'
+ * attribute, but in the case of a foreign data wrapper it might be a
+ * 'wholerow' attribute or some other set of junk attributes sufficient to
+ * identify the remote row.
   *
   * If there are multiple result relations, each one needs its own junk
   * filter.  Note multiple rels are only possible for UPDATE/DELETE, so we


Maybe I'm missing something, but I'm not sure that's a good idea because 
the change says like we might have 'wholerow' only for the FDW case, but 
that isn't correct because we would have 'wholerow' for a view as well. 
ISTM that views should be one of the typical cases, so I'd like to 
propose to modify the sentence starting with 'Typically' to something 
like this: "Typically, this will be a 'ctid' or 'wholerow' attribute, 
but in the case of a foreign data wrapper it might be a set of junk 
attributes sufficient to identify the remote row."  What do you think 
about that?


Best regards,
Etsuro Fujita



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning vs ON CONFLICT

On 2017/08/01 10:52, Robert Haas wrote:
> On Mon, Apr 3, 2017 at 6:28 AM, Amit Langote
>  wrote:
>> Since nowhere has the user asked to ensure unique(b) across partitions by
>> defining the same on parent, this seems just fine.  But one question to
>> ask may be whether that will *always* be the case?  That is, will we take
>> ON CONFLICT DO NOTHING without the conflict target specification to mean
>> checking for conflicts on the individual leaf partition level, even in the
>> future when we may have global constraints?
> 
> No.  We'll take it to mean that there is no conflict with any unique
> constraint we're able to declare.  Currently, that means a
> partition-local unique constraint because that's all there is.  It
> will include any new things added in the future.

So is the latest patch posted upthread to process ON CONFLICT DO NOTHING
using locally-defined unique indexes on leaf partitions something to consider?

Maybe, not until we have cascading index definition working [1]?

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/c8fe4f6b-ff46-aae0-89e3-e936a35f0cfd%40postgrespro.ru



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [BUGS] BUG #14758: Segfault with logical replication on a function index

On Mon, Jul 31, 2017 at 09:40:34AM +0900, Masahiko Sawada wrote:
> On Sat, Jul 29, 2017 at 4:35 AM, Scott Milliken  wrote:
> > Thank you Masahiko! I've tested and confirmed that this patch fixes the
> > problem.
> >
> 
> Thank you for the testing. This issue should be added to the open item
> since this cause of the server crash. I'll add it.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Peter,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Subscription code improvements

On Fri, Jul 07, 2017 at 10:19:19PM +0200, Petr Jelinek wrote:
> I have done some review of subscription handling (well self-review) and
> here is the result of that (It's slightly improved version from another
> thread [1]).

> Only the 0002, 0004 and 0005 are actual bug fixes, but I'd still like to
> get it all into PG10 as especially the locking now behaves really
> differently than everything else and that does not seem like a good idea.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Peter,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #14759: insert into foreign data partitions fail

2017-07-31 Thread Etsuro Fujita


On 2017/08/01 10:18, Amit Langote wrote:

Good points; fixed in the updated patch.


I should have mentioned this in an earlier mail, but one thing I noticed 
is this:


-the remote server.
+the remote server.  That becomes especially important if the table is
+being used in a partition hierarchy, where it is recommended to add
+a constraint matching the partition constraint expression on
+the remote table.

I think this would apply to CHECK constraints on foreign tables when 
implementing partitioning with inheritance.  Why do we only mention this 
for partition constraints?


Other than that, the patch looks good to me.

Best regards,
Etsuro Fujita



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A bug in mapping attributes in ATExecAttachPartition()

Thanks for taking a look at this.

On 2017/08/01 6:26, Robert Haas wrote:
> On Wed, Jul 26, 2017 at 9:50 PM, Amit Langote
>  wrote:
>> At least patch 0001 does address a bug.  Not sure if we can say that 0002
>> addresses a bug; it implements a feature that might be a
>> nice-to-have-in-PG-10.
> 
> I think 0001 is actually several fixes that should be separated:

Agreed.

> 
> - Cosmetic fixes, like renaming childrels to attachRel_children,
> adding a period after "Grab a work queue entry", and replacing the if
> (skip_validate) / if (!skip_validate) blocks with if (skip_validate) {
> ... } else { ... }.

OK, these cosmetic changes are now in attached patch 0001.

> 
> - Taking AccessExclusiveLock initially to avoid a lock upgrade hazard.

This in 0002.

> 
> - Rejiggering things so that we apply map_partition_varattnos() to the
> partConstraint in all relevant places.

And this in 0003.

> Regarding the XXX, we currently require AccessExclusiveLock for the
> addition of a CHECK constraint, so I think it's best to just do the
> same thing here.  We can optimize later, but it's probably not easy to
> come up with something that is safe and correct in all cases.
Agreed.  Dropped the XXX part in the comment.


0004 is what used to be 0002 before (checking partition constraints of
individual leaf partitions to skip their scans).  Attached here just in case.

Thanks,
Amit
From 614b0a51e4f820da81ec11b01ca79508c12415f7 Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 1 Aug 2017 10:12:39 +0900
Subject: [PATCH 1/4] Cosmetic fixes for code in ATExecAttachPartition

---
 src/backend/commands/tablecmds.c | 32 +++-
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index bb00858ad1..2f7ef53caf 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -13421,7 +13421,7 @@ ATExecAttachPartition(List **wqueue, Relation rel, 
PartitionCmd *cmd)
 {
RelationattachRel,
catalog;
-   List   *childrels;
+   List   *attachRel_children;
TupleConstr *attachRel_constr;
List   *partConstraint,
   *existConstraint;
@@ -13490,9 +13490,9 @@ ATExecAttachPartition(List **wqueue, Relation rel, 
PartitionCmd *cmd)
 * Prevent circularity by seeing if rel is a partition of attachRel. (In
 * particular, this disallows making a rel a partition of itself.)
 */
-   childrels = find_all_inheritors(RelationGetRelid(attachRel),
-   
AccessShareLock, NULL);
-   if (list_member_oid(childrels, RelationGetRelid(rel)))
+   attachRel_children = find_all_inheritors(RelationGetRelid(attachRel),
+   
 AccessShareLock, NULL);
+   if (list_member_oid(attachRel_children, RelationGetRelid(rel)))
ereport(ERROR,
(errcode(ERRCODE_DUPLICATE_TABLE),
 errmsg("circular inheritance not allowed"),
@@ -13686,17 +13686,15 @@ ATExecAttachPartition(List **wqueue, Relation rel, 
PartitionCmd *cmd)
 
/* It's safe to skip the validation scan after all */
if (skip_validate)
+   {
+   /* No need to scan the table after all. */
ereport(INFO,
(errmsg("partition constraint for table \"%s\" 
is implied by existing constraints",

RelationGetRelationName(attachRel;
-
-   /*
-* Set up to have the table be scanned to validate the partition
-* constraint (see partConstraint above).  If it's a partitioned table, 
we
-* instead schedule its leaf partitions to be scanned.
-*/
-   if (!skip_validate)
+   }
+   else
{
+   /* Constraints proved insufficient, so we need to scan the 
table. */
List   *all_parts;
ListCell   *lc;
 
@@ -13721,17 +13719,17 @@ ATExecAttachPartition(List **wqueue, Relation rel, 
PartitionCmd *cmd)
part_rel = attachRel;
 
/*
-* Skip if it's a partitioned table.  Only 
RELKIND_RELATION
-* relations (ie, leaf partitions) need to be scanned.
+* Skip if the partition is itself a partitioned table. 
 We can
+* only ever scan RELKIND_RELATION relations.
 */
-   if (part_rel != attachRel &&
-   part_rel->rd_rel->relkind == 
RELKIND_PARTITIONED_TABLE)
+   if (part_rel->rd_rel->relkind == 
RELKIND_PARTITIONED_TABLE)
{
-

Re: [HACKERS] Update description of \d[S+] in \?

2017-07-31 Thread David G. Johnston

On Mon, Jul 31, 2017 at 7:06 PM, Robert Haas  wrote:

> On Thu, Jul 13, 2017 at 8:40 PM, Amit Langote
>  wrote:
> > On 2017/07/13 19:57, Ashutosh Bapat wrote:
> >> On Thu, Jul 13, 2017 at 12:01 PM, Amit Langote
> >>  wrote:
> >>> The description of \d[S+] currently does not mention that it will list
> >>> materialized views and foreign tables.  Attached fixes that.
> >>>
> >>
> >> I guess the same change is applicable to the description of \d[S+] NAME
> as well.
> >
> > Thanks for the review.  Fixed in the attached.
>
> The problem with this, IMV, is that it makes those lines more than 80
> characters, whereas right now they are not.

84:   \\d[S+] list (foreign) tables, (materialized)
views, and sequences\n
76:   \\d[S+] list (foreign) tables, (mat.) views, and
sequences\n

  And that line seems
> doomed to get even longer in the future.
>

Cross that bridge when we come to it?

Lumping the tables and views into a single label (I'd go with "relations"
since these are all - albeit non-exclusively - things that can appear in a
FROM clause) would greatly aid things here.  Indexes and sequences would
retain their own identities.  But I seem to recall that elsewhere we call
indexes relations - and I'm not sure about sequences.

I'm partial to calling it "relations and sequences" and letting the reader
check the documentation for what "relations" means in this context.

David J.

Re: [HACKERS] PL_stashcache, or, what's our minimum Perl version?

Peter Eisentraut  writes:
> On 7/31/17 16:54, Tom Lane wrote:
>> Maybe "which" isn't the best tool for the job, not sure.

> Yeah, "which" is not portable.  This would need a bit more work and
> portability testing.

Fair enough.  This late in beta is probably not the time to be adding new
portability testing needs.  However, I noticed that some places --- not
consistently everywhere --- were solving this with the low-tech method of
just skipping AC_PATH_PROG if the variable is already set.  We could apply
that hack more consistently by inventing a PGAC_PATH_PROGS wrapper macro
as I previously sketched, but for now just defining it as

# Let the user override the search for $1
if test -z "$$1"; then
  AC_PATH_PROGS($@)
fi

(untested, but you get the idea).  In the long run I would like to improve
this to force the supplied value into absolute-path form, but I'd be
content to ship v10 like that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH v3] pg_progress() SQL function to monitor progression of long running SQL queries/utilities

On Mon, Jul 31, 2017 at 6:10 AM, Pavel Stehule  wrote:
> you can support XML, JSON output format like EXPLAIN does.
>
>  https://www.postgresql.org/docs/current/static/sql-explain.html

+1 for that approach.
-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Update description of \d[S+] in \?

On Thu, Jul 13, 2017 at 8:40 PM, Amit Langote
 wrote:
> On 2017/07/13 19:57, Ashutosh Bapat wrote:
>> On Thu, Jul 13, 2017 at 12:01 PM, Amit Langote
>>  wrote:
>>> The description of \d[S+] currently does not mention that it will list
>>> materialized views and foreign tables.  Attached fixes that.
>>>
>>
>> I guess the same change is applicable to the description of \d[S+] NAME as 
>> well.
>
> Thanks for the review.  Fixed in the attached.

The problem with this, IMV, is that it makes those lines more than 80
characters, whereas right now they are not.  And that line seems
doomed to get even longer in the future.

Of course, having it be inaccurate is not great either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Hash take II

On Mon, Jul 31, 2017 at 9:11 PM, Andres Freund  wrote:
> - Echoing concerns from other threads (Robert: ping): I'm doubtful that
>   it makes sense to size the number of parallel workers solely based on
>   the parallel scan node's size.  I don't think it's this patch's job to
>   change that, but to me it seriously amplifys that - I'd bet there's a
>   lot of cases with nontrivial joins where the benefit from parallelism
>   on the join level is bigger than on the scan level itself.  And the
>   number of rows in the upper nodes might also be bigger than on the
>   scan node level, making it more important to have higher number of
>   nodes.

Well, I feel like a broken record here but ... yeah, I agree we need
to improve that.  It's probably generally true that the more parallel
operators we add, the more potential benefit there is in doing
something about that problem.  But, like you say, not in this patch.

http://postgr.es/m/CA+TgmoYL-SQZ2gRL2DpenAzOBd5+SW30QB=a4csewtogejz...@mail.gmail.com

I think we could improve things significantly by generating multiple
partial paths with different number of parallel workers, instead of
just picking a number of workers based on the table size and going
with it.  For that to work, though, you'd need something built into
the costing to discourage picking paths with too many workers.  And
you'd need to be OK with planning taking a lot longer when parallelism
is involved, because you'd be carrying around more paths for longer.
There are other problems to solve, too.

I still think, though, that it's highly worthwhile to get at least a
few more parallel operators - and this one in particular - done before
we attack that problem in earnest.  Even with a dumb calculation of
the number of workers, this helps a lot.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning vs ON CONFLICT

On Mon, Apr 3, 2017 at 6:28 AM, Amit Langote
 wrote:
> Since nowhere has the user asked to ensure unique(b) across partitions by
> defining the same on parent, this seems just fine.  But one question to
> ask may be whether that will *always* be the case?  That is, will we take
> ON CONFLICT DO NOTHING without the conflict target specification to mean
> checking for conflicts on the individual leaf partition level, even in the
> future when we may have global constraints?

No.  We'll take it to mean that there is no conflict with any unique
constraint we're able to declare.  Currently, that means a
partition-local unique constraint because that's all there is.  It
will include any new things added in the future.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Constraint exclusion for partitioned tables

On Thu, Apr 6, 2017 at 6:47 AM, Ashutosh Bapat
 wrote:
> I am guessing that for normal inheritance, a constraint on parent
> doesn't necessarily imply the same constraint on the child (Amit
> Langote gives me an example of NOT NULL constraint).

CHECK constraints that apply to the parent would apply to all
children, unless they are NO INHERIT, so even for regular inheritance,
it might still be possible to prove something by ignoring things that
won't necessarily cascade.

For partitioning, it may be that we've got enough restrictions in
place on what can happen that we can assume everything can cascade.
Actually, I hope that's true, since the partitioned table has no
storage of its own.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #14759: insert into foreign data partitions fail

On 2017/08/01 10:10, David G. Johnston wrote:
> On Mon, Jul 31, 2017 at 5:42 PM, Amit Langote > wrote:
> 
>>
>> On a second thought though, I think we should list the foreign table
>> partitions' limitations in only one place, that is, the CREATE FOREIGN
>> TABLE reference page.  Listing them under 5.10.2.3. seems a bit off to me,
>> because other limitations listed there are those of the new partitioned
>> table objects, such as lack of global index constraints, etc.  Lack of
>> tuple-routing to foreign partitions does not seem to me of the similar
>> nature.  Also, the same text is no longer repeated in 3 different places.
>>
>> Thoughts on the updated patch?
>>
> 
> Overall, works for me.
> 
> grammar (add a couple of commas for flow) and style (dropping the first
> "the")
> 
> current: "(both the user-defined constraints such as CHECK or
> NOT NULL clauses and the partition constraint)"
> proposed: "(both user-defined constraints, such as CHECK or
> NOT NULL clauses, and the partition constraint)"

Good points; fixed in the updated patch.

Thanks,
Amit
From a5b75278a6dab39c5cd7a95a746c28ed8d5f1bbc Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 3 Apr 2017 16:45:15 +0900
Subject: [PATCH] Clarify that partition constraint is not enforced on foreign
 tables

---
 doc/src/sgml/ddl.sgml  |  8 +++-
 doc/src/sgml/ref/create_foreign_table.sgml | 17 +++--
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index b05a9c2150..a707c3e22a 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -2986,11 +2986,9 @@ VALUES ('Albany', NULL, NULL, 'NY');

 

-Partitions can also be foreign tables
-(see ),
-although these have some limitations that normal tables do not.  For
-example, data inserted into the partitioned table is not routed to
-foreign table partitions.
+Partitions can also be foreign tables, although they have some limitations
+that normal tables do not; see  for
+more information.

 

diff --git a/doc/src/sgml/ref/create_foreign_table.sgml 
b/doc/src/sgml/ref/create_foreign_table.sgml
index 065c982082..594f75e112 100644
--- a/doc/src/sgml/ref/create_foreign_table.sgml
+++ b/doc/src/sgml/ref/create_foreign_table.sgml
@@ -79,7 +79,9 @@ CHECK ( expression ) [ NO INHERIT ]
   
If PARTITION OF clause is specified then the table is
created as a partition of parent_table with specified
-   bounds.
+   bounds.  Note that routing tuples to partitions that are foreign tables
+   is not supported.  So, if a tuple inserted (or copied) into the table
+   routes to one of the foreign partitions, an error will occur.
   
 
   
@@ -279,16 +281,19 @@ CHECK ( expression ) [ NO INHERIT ]
   Notes
 

-Constraints on foreign tables (such as CHECK
-or NOT NULL clauses) are not enforced by the
-core PostgreSQL system, and most foreign data wrappers
-do not attempt to enforce them either; that is, the constraint is
+Constraints (both user-defined constraints, such as CHECK
+or NOT NULL clauses, and the partition constraint) are not
+enforced by the core PostgreSQL system, and most foreign
+data wrappers do not attempt to enforce them either; that is, they are
 simply assumed to hold true.  There would be little point in such
 enforcement since it would only apply to rows inserted or updated via
 the foreign table, and not to rows modified by other means, such as
 directly on the remote server.  Instead, a constraint attached to a
 foreign table should represent a constraint that is being enforced by
-the remote server.
+the remote server.  That becomes especially important if the table is
+being used in a partition hierarchy, where it is recommended to add
+a constraint matching the partition constraint expression on
+the remote table.

 

-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [BUGS] BUG #14758: Segfault with logical replication on a function index

On 2017-07-31 09:40:34 +0900, Masahiko Sawada wrote:
> Moved to -hackers.
> 
> On Sat, Jul 29, 2017 at 4:35 AM, Scott Milliken  wrote:
> > Thank you Masahiko! I've tested and confirmed that this patch fixes the
> > problem.
> >
> 
> Thank you for the testing. This issue should be added to the open item
> since this cause of the server crash. I'll add it.

Adding Petr to CC list.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Hash take II

From: Thomas Munro 
Date: Wed 26 Jul 2017 19:58:20 NZST
Subject: [PATCH] Add support for parallel-aware hash joins.

Hi,

WRT the main patch:

- Echoing concerns from other threads (Robert: ping): I'm doubtful that
  it makes sense to size the number of parallel workers solely based on
  the parallel scan node's size.  I don't think it's this patch's job to
  change that, but to me it seriously amplifys that - I'd bet there's a
  lot of cases with nontrivial joins where the benefit from parallelism
  on the join level is bigger than on the scan level itself.  And the
  number of rows in the upper nodes might also be bigger than on the
  scan node level, making it more important to have higher number of
  nodes.

- If I understand the code in initial_cost_hashjoin() correctly, we
  count the synchronization overhead once, independent of the number of
  workers.  But on the other hand we calculate the throughput by
  dividing by the number of workers.  Do you think that's right?

- I haven't really grokked the deadlock issue you address. Could you
  expand the comments on that? Possibly somewhere central referenced by
  the various parts.

- maybe I'm overly paranoid, but it might not be bad to add some extra
  checks for ExecReScanHashJoin ensuring that it doesn't get called when
  workers are still doing something.

- seems like you're dereffing tuple unnecessarily here:

+   /*
+* If we detached a chain of tuples, transfer them to the main hash 
table
+* or batch storage.
+*/
+   if (regainable_space > 0)
+   {
+   HashJoinTuple tuple;
+
+   tuple = (HashJoinTuple)
+   dsa_get_address(hashtable->area, detached_chain_shared);
+   ExecHashTransferSkewTuples(hashtable, detached_chain,
+  
detached_chain_shared);
+
+   /* Remove from the total space used. */
+   LWLockAcquire(>shared->chunk_lock, LW_EXCLUSIVE);
+   Assert(hashtable->shared->size >= regainable_space);
+   hashtable->shared->size -= regainable_space;
+   LWLockRelease(>shared->chunk_lock);
+
+   /*
+* If the bucket we removed is the same as the bucket the 
caller just
+* overflowed, then we can forget about the overflowing part of 
the
+* tuple.  It's been moved out of the skew hash table.  
Otherwise, the
+* caller will call again; eventually we'll either succeed in
+* allocating space for the overflow or reach this case.
+*/
+   if (bucket_to_remove == bucketno)
+   {
+   hashtable->spaceUsedSkew = 0;
+   hashtable->spaceAllowedSkew = 0;
+   }
+   }


- The names here could probably improved some:
+   case WAIT_EVENT_HASH_SHRINKING1:
+   event_name = "Hash/Shrinking1";
+   break;
+   case WAIT_EVENT_HASH_SHRINKING2:
+   event_name = "Hash/Shrinking2";
+   break;
+   case WAIT_EVENT_HASH_SHRINKING3:
+   event_name = "Hash/Shrinking3";
+   break;
+   case WAIT_EVENT_HASH_SHRINKING4:
+   event_name = "Hash/Shrinking4";

- why are we restricting rows_total bit to parallel aware?

+   /*
+* If parallel-aware, the executor will also need an estimate of the 
total
+* number of rows expected from all participants so that it can size the
+* shared hash table.
+*/
+   if (best_path->jpath.path.parallel_aware)
+   {
+   hash_plan->plan.parallel_aware = true;
+   hash_plan->rows_total = best_path->inner_rows_total;
+   }
+

- seems we need a few more test - I don't think the existing tests are
  properly going to exercise the skew stuff, multiple batches, etc?
  This is nontrivial code, I'd really like to see a high test coverage
  of the new code.

- might not hurt to reindent before the final submission

- Unsurprisingly, please implement the FIXME ;)


Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #14759: insert into foreign data partitions fail

2017-07-31 Thread David G. Johnston

On Mon, Jul 31, 2017 at 5:42 PM, Amit Langote  wrote:

>
> On a second thought though, I think we should list the foreign table
> partitions' limitations in only one place, that is, the CREATE FOREIGN
> TABLE reference page.  Listing them under 5.10.2.3. seems a bit off to me,
> because other limitations listed there are those of the new partitioned
> table objects, such as lack of global index constraints, etc.  Lack of
> tuple-routing to foreign partitions does not seem to me of the similar
> nature.  Also, the same text is no longer repeated in 3 different places.
>
> Thoughts on the updated patch?
>

Overall, works for me.

grammar (add a couple of commas for flow) and style (dropping the first
"the")

current: "(both the user-defined constraints such as CHECK or
NOT NULL clauses and the partition constraint)"
proposed: "(both user-defined constraints, such as CHECK or
NOT NULL clauses, and the partition constraint)"

Thanks!

David J.

Re: [HACKERS] PL_stashcache, or, what's our minimum Perl version?

On 7/31/17 16:54, Tom Lane wrote:
> Maybe "which" isn't the best tool for the job, not sure.

Yeah, "which" is not portable.  This would need a bit more work and
portability testing.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] AlterUserStmt anmd RoleSpec rules in grammar.y

On 7/26/17 11:29, Tom Lane wrote:
> You'll notice that that statement fails in the regression tests:
> 
> ALTER USER ALL SET application_name to 'SLAP';
> ERROR:  syntax error at or near "ALL"
> 
> The one that works is
> 
> ALTER ROLE ALL SET application_name to 'SLAP';
> 
> and the reason is that AlterRoleSetStmt has a separate production
> for ALL, but AlterUserSetStmt doesn't.  This seems a tad bizarre
> though.  Peter, you added that production (in commit 9475db3a4);
> is this difference intentional or just an oversight?  If it's
> intentional, what's the reasoning?

That looks like a bug to me.  ALTER USER also does not support the IN
DATABASE clause, so the code deviation might have started there already.

I propose the attached patch to clean this up.

For backpatching, I could develop some less invasive versions.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 48a7c95b7c8243626362c7845a45cdb036028f7e Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 31 Jul 2017 20:36:32 -0400
Subject: [PATCH] Further unify ROLE and USER command grammar rules

ALTER USER ... SET did not support all the syntax variants of ALTER ROLE
...  SET.  Fix that, and to avoid further deviations of this kind, unify
many the grammar rules for ROLE/USER/GROUP commands.

Reported-by: Pavel Golub 
---
 doc/src/sgml/ref/alter_user.sgml|   8 +--
 src/backend/parser/gram.y   | 105 +++-
 src/test/regress/expected/rolenames.out |  10 +--
 3 files changed, 43 insertions(+), 80 deletions(-)

diff --git a/doc/src/sgml/ref/alter_user.sgml b/doc/src/sgml/ref/alter_user.sgml
index 9b8a39b376..411a6dcc38 100644
--- a/doc/src/sgml/ref/alter_user.sgml
+++ b/doc/src/sgml/ref/alter_user.sgml
@@ -38,10 +38,10 @@
 
 ALTER USER name RENAME TO 
new_name
 
-ALTER USER role_specification SET 
configuration_parameter { TO | = } { 
value | DEFAULT }
-ALTER USER role_specification SET 
configuration_parameter FROM CURRENT
-ALTER USER role_specification 
RESET configuration_parameter
-ALTER USER role_specification 
RESET ALL
+ALTER USER { role_specification | 
ALL } [ IN DATABASE database_name 
] SET configuration_parameter { TO | = } { 
value | DEFAULT }
+ALTER USER { role_specification | 
ALL } [ IN DATABASE database_name 
] SET configuration_parameter FROM CURRENT
+ALTER USER { role_specification | 
ALL } [ IN DATABASE database_name 
] RESET configuration_parameter
+ALTER USER { role_specification | 
ALL } [ IN DATABASE database_name 
] RESET ALL
 
 where role_specification 
can be:
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4b1ce09c44..e20bf5ec55 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -250,7 +250,7 @@ static Node *makeRecursiveViewSelect(char *relname, List 
*aliases, Node *query);
AlterObjectDependsStmt AlterObjectSchemaStmt AlterOwnerStmt
AlterOperatorStmt AlterSeqStmt AlterSystemStmt AlterTableStmt
AlterTblSpcStmt AlterExtensionStmt AlterExtensionContentsStmt 
AlterForeignTableStmt
-   AlterCompositeTypeStmt AlterUserStmt AlterUserMappingStmt 
AlterUserSetStmt
+   AlterCompositeTypeStmt AlterUserMappingStmt
AlterRoleStmt AlterRoleSetStmt AlterPolicyStmt
AlterDefaultPrivilegesStmt DefACLAction
AnalyzeStmt ClosePortalStmt ClusterStmt CommentStmt
@@ -262,9 +262,9 @@ static Node *makeRecursiveViewSelect(char *relname, List 
*aliases, Node *query);
CreateAssertStmt CreateTransformStmt CreateTrigStmt 
CreateEventTrigStmt
CreateUserStmt CreateUserMappingStmt CreateRoleStmt 
CreatePolicyStmt
CreatedbStmt DeclareCursorStmt DefineStmt DeleteStmt 
DiscardStmt DoStmt
-   DropGroupStmt DropOpClassStmt DropOpFamilyStmt DropPLangStmt 
DropStmt
+   DropOpClassStmt DropOpFamilyStmt DropPLangStmt DropStmt
DropAssertStmt DropCastStmt DropRoleStmt
-   DropUserStmt DropdbStmt DropTableSpaceStmt
+   DropdbStmt DropTableSpaceStmt
DropTransformStmt
DropUserMappingStmt ExplainStmt FetchStmt
GrantStmt GrantRoleStmt ImportForeignSchemaStmt IndexStmt 
InsertStmt
@@ -841,8 +841,6 @@ stmt :
| AlterTSConfigurationStmt
| AlterTSDictionaryStmt
| AlterUserMappingStmt
-   | AlterUserSetStmt
-   | AlterUserStmt
| AnalyzeStmt
| CheckPointStmt
| ClosePortalStmt
@@ -890,7 +888,6 @@ stmt :
| DoStmt
| DropAssertStmt
| DropCastStmt
-   | DropGroupStmt
| DropOpClassStmt

Re: [HACKERS] [BUGS] BUG #14759: insert into foreign data partitions fail

On 2017/08/01 6:41, David G. Johnston wrote:
> On Tue, Jul 25, 2017 at 11:29 PM, Amit Langote <
> langote_amit...@lab.ntt.co.jp> wrote:
> 
>>> I'm curious what the other limitations are...
>>
>> When I first wrote that documentation line (I am assuming you're asking
>> about "although these have some limitations that normal tables do not"), I
>> was thinking about the fact that the core system does not enforce
>> (locally) any constraints defined on foreign tables.  Since we allow
>> inserting data into partitions directly, it is imperative that we enforce
>> the "partition constraint" along with the traditional constraints such as
>> NOT NULL and CHECK constraints, which we can do for local table partitions
>> but not for foreign table ones.
>>
>> Anyway, attached patch documents all these limitations about foreign table
>> partitions more prominently.
>>
> 
> The revised patch down-thread looks good.  Thanks.
> 
> I indeed was referring to the paragraph you quoted.
> 
> I would probably just   s/For example/In particular/   and call it good -
> or maybe also tell the user that all the limitations are listed in the
> notes section for create foreign table (though my first thoughts are all
> quite wordy).

Thanks for the review.

On a second thought though, I think we should list the foreign table
partitions' limitations in only one place, that is, the CREATE FOREIGN
TABLE reference page.  Listing them under 5.10.2.3. seems a bit off to me,
because other limitations listed there are those of the new partitioned
table objects, such as lack of global index constraints, etc.  Lack of
tuple-routing to foreign partitions does not seem to me of the similar
nature.  Also, the same text is no longer repeated in 3 different places.

Thoughts on the updated patch?

Thanks,
Amit
From f79f98710a5bf6bd1b0a921ed2e57fa510c6ac60 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 3 Apr 2017 16:45:15 +0900
Subject: [PATCH] Clarify that partition constraint is not enforced on foreign
 tables

---
 doc/src/sgml/ddl.sgml  |  8 +++-
 doc/src/sgml/ref/create_foreign_table.sgml | 17 +++--
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index b05a9c2150..a707c3e22a 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -2986,11 +2986,9 @@ VALUES ('Albany', NULL, NULL, 'NY');

 

-Partitions can also be foreign tables
-(see ),
-although these have some limitations that normal tables do not.  For
-example, data inserted into the partitioned table is not routed to
-foreign table partitions.
+Partitions can also be foreign tables, although they have some limitations
+that normal tables do not; see  for
+more information.

 

diff --git a/doc/src/sgml/ref/create_foreign_table.sgml 
b/doc/src/sgml/ref/create_foreign_table.sgml
index 065c982082..43a6cbcfab 100644
--- a/doc/src/sgml/ref/create_foreign_table.sgml
+++ b/doc/src/sgml/ref/create_foreign_table.sgml
@@ -79,7 +79,9 @@ CHECK ( expression ) [ NO INHERIT ]
   
If PARTITION OF clause is specified then the table is
created as a partition of parent_table with specified
-   bounds.
+   bounds.  Note that routing tuples to partitions that are foreign tables
+   is not supported.  So, if a tuple inserted (or copied) into the table
+   routes to one of the foreign partitions, an error will occur.
   
 
   
@@ -279,16 +281,19 @@ CHECK ( expression ) [ NO INHERIT ]
   Notes
 

-Constraints on foreign tables (such as CHECK
-or NOT NULL clauses) are not enforced by the
-core PostgreSQL system, and most foreign data wrappers
-do not attempt to enforce them either; that is, the constraint is
+Constraints (both the user-defined constraints such as CHECK
+or NOT NULL clauses and the partition constraint) are not
+enforced by the core PostgreSQL system, and most foreign
+data wrappers do not attempt to enforce them either; that is, they are
 simply assumed to hold true.  There would be little point in such
 enforcement since it would only apply to rows inserted or updated via
 the foreign table, and not to rows modified by other means, such as
 directly on the remote server.  Instead, a constraint attached to a
 foreign table should represent a constraint that is being enforced by
-the remote server.
+the remote server.  That becomes especially important if the table is
+being used in a partition hierarchy, where it is recommended to add
+a constraint matching the partition constraint expression on
+the remote table.

 

-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix a typo in pg_upgrade/info.c

2017-07-31 Thread Masahiko Sawada

On Tue, Aug 1, 2017 at 6:23 AM, Peter Eisentraut
 wrote:
> On 7/13/17 03:22, Masahiko Sawada wrote:
>> Hi,
>>
>> Attached patch for $subject.
>>
>> s/reporing/reporting/g
>
> fixed

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL_stashcache, or, what's our minimum Perl version?

I wrote:
> If we need to fix things so that AC_PATH_PROG will honor a non-path
> input value, then let's do that.  But let's not make the build system
> shakier/less reproducible than it is already.

> I suggest that we could inject logic like this:

>   if VARIABLE-is-set-and-value-isn't-already-absolute; then
> VARIABLE=`which $VARIABLE 2>/dev/null`
>   fi

> in front of the existing logic for AC_PATH_PROG(VARIABLE,...).
> Maybe "which" isn't the best tool for the job, not sure.

Concretely, how about something like the attached?

BTW, I haven't done it here, but I wonder whether we should not make
PGAC_PATH_PROGS invoke AC_ARG_VAR on the target variable, so that
configure knows that it should be treated as affecting results caching.

regards, tom lane

diff --git a/config/docbook.m4 b/config/docbook.m4
index c485eaf..f9307f3 100644
*** a/config/docbook.m4
--- b/config/docbook.m4
***
*** 3,9 
  # PGAC_PROG_NSGMLS
  # 
  AC_DEFUN([PGAC_PROG_NSGMLS],
! [AC_PATH_PROGS([NSGMLS], [onsgmls nsgmls])])
  
  
  # PGAC_CHECK_DOCBOOK(VERSION)
--- 3,9 
  # PGAC_PROG_NSGMLS
  # 
  AC_DEFUN([PGAC_PROG_NSGMLS],
! [PGAC_PATH_PROGS(NSGMLS, [onsgmls nsgmls])])
  
  
  # PGAC_CHECK_DOCBOOK(VERSION)
diff --git a/config/perl.m4 b/config/perl.m4
index 9706c4d..e44ca94 100644
*** a/config/perl.m4
--- b/config/perl.m4
***
*** 4,13 
  # PGAC_PATH_PERL
  # --
  AC_DEFUN([PGAC_PATH_PERL],
! [# Let the user override the search
! if test -z "$PERL"; then
!   AC_PATH_PROG(PERL, perl)
! fi
  
  if test "$PERL"; then
pgac_perl_version=`$PERL -v 2>/dev/null | sed -n ['s/This is perl.*v[a-z ]*\([0-9]\.[0-9][0-9.]*\).*$/\1/p']`
--- 4,10 
  # PGAC_PATH_PERL
  # --
  AC_DEFUN([PGAC_PATH_PERL],
! [PGAC_PATH_PROGS(PERL, perl)
  
  if test "$PERL"; then
pgac_perl_version=`$PERL -v 2>/dev/null | sed -n ['s/This is perl.*v[a-z ]*\([0-9]\.[0-9][0-9.]*\).*$/\1/p']`
diff --git a/config/programs.m4 b/config/programs.m4
index b7deb86..0b0098e 100644
*** a/config/programs.m4
--- b/config/programs.m4
***
*** 1,6 
--- 1,23 
  # config/programs.m4
  
  
+ # PGAC_PATH_PROGS
+ # ---
+ # This wrapper for AC_PATH_PROGS behaves like that macro except in the case
+ # where VARIABLE is already set but is not set to an absolute path.  We will
+ # attempt to make it into an absolute path using "which".  If that fails,
+ # we ignore the existing value, which is the behavior of AC_PATH_PROGS.
+ 
+ AC_DEFUN([PGAC_PATH_PROGS],
+ [# If user is trying to override the search for $1, make sure that
+ # the variable's value is an absolute path.
+ if test -n "$$1"; then
+   $1=`which "$$1" 2>/dev/null`
+ fi
+ AC_PATH_PROGS($@)dnl
+ ])
+ 
+ 
  # PGAC_PATH_BISON
  # ---
  # Look for Bison, set the output variable BISON to its path if found.
***
*** 8,17 
  # Note we do not accept other implementations of yacc.
  
  AC_DEFUN([PGAC_PATH_BISON],
! [# Let the user override the search
! if test -z "$BISON"; then
!   AC_PATH_PROGS(BISON, bison)
! fi
  
  if test "$BISON"; then
pgac_bison_version=`$BISON --version 2>/dev/null | sed q`
--- 25,31 
  # Note we do not accept other implementations of yacc.
  
  AC_DEFUN([PGAC_PATH_BISON],
! [PGAC_PATH_PROGS(BISON, bison)
  
  if test "$BISON"; then
pgac_bison_version=`$BISON --version 2>/dev/null | sed q`
*** if test -z "$BISON"; then
*** 41,47 
  *** PostgreSQL then you do not need to worry about this, because the Bison
  *** output is pre-generated.)])
  fi
! # We don't need AC_SUBST(BISON) because AC_PATH_PROG did it
  AC_SUBST(BISONFLAGS)
  ])# PGAC_PATH_BISON
  
--- 55,61 
  *** PostgreSQL then you do not need to worry about this, because the Bison
  *** output is pre-generated.)])
  fi
! # We don't need AC_SUBST(BISON) because PGAC_PATH_PROGS did it
  AC_SUBST(BISONFLAGS)
  ])# PGAC_PATH_BISON
  
*** AC_DEFUN([PGAC_CHECK_GETTEXT],
*** 229,235 
   [AC_MSG_ERROR([a gettext implementation is required for NLS])])
AC_CHECK_HEADER([libintl.h], [],
[AC_MSG_ERROR([header file  is required for NLS])])
!   AC_PATH_PROGS(MSGFMT, msgfmt)
if test -z "$MSGFMT"; then
  AC_MSG_ERROR([msgfmt is required for NLS])
fi
--- 243,249 
   [AC_MSG_ERROR([a gettext implementation is required for NLS])])
AC_CHECK_HEADER([libintl.h], [],
[AC_MSG_ERROR([header file  is required for NLS])])
!   PGAC_PATH_PROGS(MSGFMT, msgfmt)
if test -z "$MSGFMT"; then
  AC_MSG_ERROR([msgfmt is required for NLS])
fi
*** AC_DEFUN([PGAC_CHECK_GETTEXT],
*** 238,245 
  pgac_cv_msgfmt_flags=-c
  fi])
AC_SUBST(MSGFMT_FLAGS, $pgac_cv_msgfmt_flags)
!   AC_PATH_PROGS(MSGMERGE, msgmerge)
!   AC_PATH_PROGS(XGETTEXT, xgettext)
  ])# PGAC_CHECK_GETTEXT
  
  
--- 252,259 
  pgac_cv_msgfmt_flags=-c
  fi])

Re: [HACKERS] 10 beta docs: different replication solutions

2017-07-31 Thread Steve Singer


On Mon, 31 Jul 2017, Merlin Moncure wrote:


On Sun, Jul 30, 2017 at 8:34 PM, Steve Singer  wrote:


We don't seem to describe logical replication on

https://www.postgresql.org/docs/10/static/different-replication-solutions.html

The attached patch adds a section.


This is a good catch.  Two quick observations:

1) Super pedantic point. I don't like the 'repl.' abbreviation in the
'most common implementation' both for the existing hs/sr and for the
newly added logical.


Updated



2) This lingo:
+ Logical replication allows the data changes from individual tables
+ to be replicated. Logical replication doesn't require a particular server
+ to be designated as a master or a slave but allows data to flow
in multiple
+ directions. For more information on logical replication, see
.

Is good, but I would revise it just a bit to emphasize the
subscription nature of logical replication to link the concepts
expressed strongly in the main section.  For example:

Logical replication allows the data changes [remove: "from individual
tables to be replicated"] to be published to subscriber nodes.  Data
can flow in any direction between nodes on a per-table basis; there is
no concept of a master server.  Conflict resolution must be handled
completely by the application.  For more information on...

what do you think?


Sounds good.

I've incorporated these changes into an updated patch but I changed the 
language around conflict resolution.  Conflict resolution could be handled 
by triggers or adding extra columns to the primary key, that wouldn't be 
'completely by the application'





merlin

diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index 138bdf2..19b26f8 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -158,6 +158,27 @@ protocol to make nodes agree on a serializable transactional order.

   
 
+
+  
+   Logical Replication
+   
+
+
+  Logical replication allows a database server to send a stream of
+  data modifications to another server.
+  PostgreSQL logical replication constructs
+  a stream of logical data modifications from the WAL.
+
+
+  Logical replication allows the data changes to be published to subscriber
+  nodes. Data can flow in any direction between nodes on a per-table basis;
+  there is no concept of a master server. PostgreSQL
+  does not include support for conflict resolution.
+  For more information on logical replication, see .
+   
+   
+  
+ 
   
Trigger-Based Master-Standby Replication

@@ -290,6 +311,7 @@ protocol to make nodes agree on a serializable transactional order.
  Shared Disk Failover
  File System Replication
  Write-Ahead Log Shipping
+ Logical Replication
  Trigger-Based Master-Standby Replication
  Statement-Based Replication Middleware
  Asynchronous Multimaster Replication
@@ -303,7 +325,8 @@ protocol to make nodes agree on a serializable transactional order.
  Most Common Implementation
  NAS
  DRBD
- Streaming Repl.
+ Streaming Replication.
+ Logical Replication.
  Slony
  pgpool-II
  Bucardo
@@ -315,6 +338,7 @@ protocol to make nodes agree on a serializable transactional order.
  shared disk
  disk blocks
  WAL
+ Logical decoding
  table rows
  SQL
  table rows
@@ -330,6 +354,7 @@ protocol to make nodes agree on a serializable transactional order.
  
  
  
+ 
 
 
 
@@ -337,6 +362,7 @@ protocol to make nodes agree on a serializable transactional order.
  
  
  
+ 
  
  
  
@@ -349,6 +375,7 @@ protocol to make nodes agree on a serializable transactional order.
  
  
  
+ 
  
  
  
@@ -360,6 +387,7 @@ protocol to make nodes agree on a serializable transactional order.
  
  with sync off
  
+ 
  
  
  
@@ -371,6 +399,7 @@ protocol to make nodes agree on a serializable transactional order.
  
  with sync on
  
+ 
  
  
  
@@ -385,6 +414,7 @@ protocol to make nodes agree on a serializable transactional order.
  
  
  
+ 
 
 
 
@@ -393,6 +423,7 @@ protocol to make nodes agree on a serializable transactional order.
  
  
  
+ 
  
  
  
@@ -403,6 +434,7 @@ protocol to make nodes agree on a serializable transactional order.
  
  
  
+ 
  
  
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Hash take II

2017-07-31 Thread Thomas Munro

On Tue, Aug 1, 2017 at 9:28 AM, Andres Freund  wrote:
> On 2017-07-26 20:12:56 +1200, Thomas Munro wrote:
>> 2.  Simplified costing.  There is now just one control knob
>> "parallel_synchronization_cost", which I charge for each time the
>> participants will wait for each other at a barrier, to be set high
>> enough to dissuade the planner from using Parallel Hash for tiny hash
>> tables that would be faster in a parallel-oblivious hash join.
>> Earlier ideas about modelling the cost of shared memory access didn't
>> work out.
>
> Hm. You say, "didn't work out" - could you expand a bit on that? I'm
> quite doubtful that justaccounting for barriers will be good enough.

The earlier approach and some variants I played with were based on the
idea that we should try to estimate the cost of using shared memory.
But there's no precedent for costing the cache hierarchy beyond disk
vs memory, and it depends so much on your hardware (NUMA vs UMA) and
the data distribution.  I have no doubt that variations in memory
access costs are important (for example, it was data distribution that
determined whether big-cache-oblivious-shared-hash-table or
MonetDB-style cache-aware approach won in that paper I've mentioned
here before[1]), but it seems like a hard problem and I didn't feel
like it was necessary.  Or do you have a different idea here?

Another point is that in the earlier versions I was trying to teach
the planner how to choose among Hash, Shared Hash and Parallel Shared
Hash.  The difference in costing between Hash and Shared Hash (one
worker builds, all workers probe) was important and sensitive, because
the only difference between them would be the cost of memory sharing.
When I dropped Shared Hash from the patch set, it no longer seemed
necessary to try to deal with such subtle costing, because Hash and
Parallel Hash (now without the word 'Shared') already have wildly
different costs: the latter is divided over N CPUs.  So I felt I could
get away with a much blunter instrument: just something to avoid
parallel build overheads for tiny tables like TPC-H "nation".

I still wanted something that makes intuitive sense and that could be
set using experimental evidence though.  Parallel_synchronization_cost
is an estimate of how long the average backend will have to wait for
the last backend to complete the phase and arrive at each barrier.
The most interesting case is the build phase: how long will the the
last backend make us wait before probing can begin?  Well, that
depends on the parallel grain.  Currently, the ultimate source of all
parallelism in our executor is Parallel Seq Scan and Parallel Index
Scan, and they hand out a page at a time.  Of course, any number of
nodes may sit between the hash join and the scan, and one of them
might include a function that sleeps for 100 years in one backend or
performs a join that generates wildly different numbers of tuples in
each backend.  I don't know what to do about that, other than to
assume we have perfectly spherical cows and reason on the basis of an
expected parallel grain reaching us from the scans.

One thing to note about parallel_synchronization_cost is that the cost
units, where 1 is traditionally the cost of scanning a page, actually
make *some* kind of sense here, though it's a bit tenuous: the last
worker to complete is the one that scans the final pages, while the
others see the scan finished.  What's really wanted here is not simply
page scanning cost but rather a percentage of the total cost that
represents how much extra work the lanterne rouge of backends has to
do.

Two relevant projects here are:

1.  David Rowley proposes changing the seq scan grain[2], perhaps
adaptively.  I suppose as this number increases the time at which two
workers finish can vary more greatly.
2.  The parallel-append project introduces a completely different type
of granularity based on unrelated and separately costed subplans
rather than pages.  Perhaps there are things that could be done here
to model the fact that some workers might finish a long time before
others, but I don't know.

Perhaps what parallel hash really needs is not a user-controlled
parallel_synchronization_cost, but some number produced by the planner
to describe the expected distribution of tuple counts over workers.
Armed with something like that and the cost per tuple you might be
able to estimate how long we expect hash join barriers to make you
wait without introducing any new GUCs at all.  I thought about some of
these things a bit but it seemed like a big research project of its
own and I was persuaded in an off-list discussion by Robert to try to
find the simplest thing that would avoid parallel-aware hash for
little tables that are already built very cheaply.

[1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.225.3495
[2] 
https://www.postgresql.org/message-id/CAKJS1f-XhfQ2-%3D85wgYo5b3WtEs%3Dys%3D2Rsq%3DNuvnmaV4ZsM1XQ%40mail.gmail.com

-- 
Thomas Munro

Re: [HACKERS] PostgreSQL 10 (latest beta) and older ICU

On 7/25/17 15:20, Victor Wagner wrote:
> It turns out, that PostgreSQL enumerates collations for all ICU locales
> and passes it into uloc_toLanguageTag function with strict argument of
> this function set to TRUE. But for some locales
> (es*@collation=tradtiional, si*@collation=dictionary) only call with
> strict=FALSE succeeds in ICU 4.2. In newer versions of ICU all locales
> can be resolved with strict=TRUE.

We are only calling uloc_toLanguageTag() with keyword/value combinations
that ICU itself previously told us were supported.  So just ignoring
errors doesn't seem proper in this case.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [PATCH] Pattern based listeners for asynchronous messaging (LISTEN/NOTIFY)

2017-07-31 Thread Chapman Flack

On 07/31/17 16:30, Peter Eisentraut wrote:

> I would think about specifying an operator somewhere in the syntax, like
> you have with LISTEN SIMILAR TO.  It would even be nice if a
> non-built-in operator could be used for matching names.

Hmm ... as I was reading through the email thread, I saw a suggestion
was once made to look at ltree, and then I noticed the patch as presented
had simply gone with regular expressions instead. I wonder if there'd be
a way to work it so an operator can be specified, and allow you to treat
the names as (say) ltree labels if that suited your application.

-Chap

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Incorrect comment of XLByteToSeg() and XLByteToPrevSeg()

2017-07-31 Thread Tatsuo Ishii

> Thanks for the patch. Looks good to me. I will commit/push into all
> supported branches if there's no objection.

Done.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL_stashcache, or, what's our minimum Perl version?

I wrote:
> Done.  I have also reconfigured buildfarm member prairiedog to use
> a non-MULTIPLICITY build of Perl 5.8.3, with the oldest Test::More
> and IPC::Run versions I could lay my hands on.  Although I'd gotten
> through a manual "make check-world" with this configuration in HEAD
> before touching the buildfarm configuration, I see that it just fell
> over in the back branches.  So there's still some more fixing to be
> done, or else we'll need to change that claim again.  Will investigate
> once the buildfarm run finishes.

The reason it works manually and not in the buildfarm is that the
buildfarm injects

my $pflags = "PROVE_FLAGS=--timer";

(run_build.pl:1609) and it turns out that 5.8.3's version of prove
does not have the --timer switch.  I see that --timer is there in
the next oldest version I have at hand, 5.8.6.  I doubt it is worth
teaching the buildfarm scripts to autoconfigure this, but could we
do something like

my $pflags = "PROVE_FLAGS='" . ($ENV{PROVE_FLAGS} || "--timer") . "'";

to allow overriding this choice from the buildfarm config?

FYI, I plan to keep the TAP tests enabled on prairiedog for HEAD,
but probably not for the back branches after this run cycle
finishes, because it's just too-darn-slow.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump does not handle indirectly-granted permissions properly

2017-07-31 Thread Stephen Frost

Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> AFAICT, pg_dump has no notion that it needs to be careful about the order
> >> in which permissions are granted.  I did
> 
> > I'm afraid that's correct, though I believe that's always been the case.
> > I spent some time looking into this today and from what I've gathered so
> > far, there's essentially an implicit (or at least, I couldn't find any
> > explicit reference to it) ordering in ACLs whereby a role which was
> > given a GRANT OPTION always appears in the ACL list before an ACL entry
> > where that role is GRANT'ing that option to another role, and this is
> > what pg_dump was (again, implicitly, it seems) depending on to get this
> > correct in prior versions.
> 
> Yeah, I suspected that was what made it work before.  I think the ordering
> just comes from the fact that new ACLs are added to the end, and you can't
> add an entry as a non-owner unless there's a grant WGO there already.

Right.

> I suspect it would be easier to modify the backend code that munges ACL
> lists so that it takes care to preserve that property, if we ever find
> there are cases where it does not.  It seems plausible to me that
> pg_dump is not the only code that depends on that ordering.

Hm, well, if we're alright with that then I think the attached would
address the pg_dump issue, and I believe this would work as this code is
only run for 9.6+ servers anyway.

This needs more cleanup, testing, and comments explaining why we're
doing this (and then perhaps comments, somewhere.. in the backend ACL
code that explains that the ordering needs to be preserved), but the
basic idea seems sound to me and the case you presented does work with
this patch (for me, at least) whereas what's in master didn't.

Thoughts?

Thanks!

Stephen
diff --git a/src/bin/pg_dump/dumputils.c b/src/bin/pg_dump/dumputils.c
new file mode 100644
index dfc6118..8686ed0
*** a/src/bin/pg_dump/dumputils.c
--- b/src/bin/pg_dump/dumputils.c
*** buildACLQueries(PQExpBuffer acl_subquery
*** 723,742 
  	 * these are run the initial privileges will be in place, even in a binary
  	 * upgrade situation (see below).
  	 */
! 	printfPQExpBuffer(acl_subquery, "(SELECT pg_catalog.array_agg(acl) FROM "
! 	  "(SELECT pg_catalog.unnest(coalesce(%s,pg_catalog.acldefault(%s,%s))) AS acl "
! 	  "EXCEPT "
! 	  "SELECT pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(%s,%s as foo)",
  	  acl_column,
  	  obj_kind,
  	  acl_owner,
  	  obj_kind,
  	  acl_owner);
  
! 	printfPQExpBuffer(racl_subquery, "(SELECT pg_catalog.array_agg(acl) FROM "
! 	  "(SELECT pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(%s,%s))) AS acl "
! 	  "EXCEPT "
! 	  "SELECT pg_catalog.unnest(coalesce(%s,pg_catalog.acldefault(%s,%s as foo)",
  	  obj_kind,
  	  acl_owner,
  	  acl_column,
--- 723,742 
  	 * these are run the initial privileges will be in place, even in a binary
  	 * upgrade situation (see below).
  	 */
! 	printfPQExpBuffer(acl_subquery, "(SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM "
! 	  "(SELECT acl, row_n FROM pg_catalog.unnest(coalesce(%s,pg_catalog.acldefault(%s,%s))) WITH ORDINALITY AS perm(acl,row_n) "
! 	  "WHERE NOT EXISTS ( "
! 	  "SELECT 1 FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(%s,%s))) AS init(init_acl) WHERE acl = init_acl)) as foo)",
  	  acl_column,
  	  obj_kind,
  	  acl_owner,
  	  obj_kind,
  	  acl_owner);
  
! 	printfPQExpBuffer(racl_subquery, "(SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM "
! 	  "(SELECT acl, row_n FROM pg_catalog.unnest(coalesce(pip.initprivs,pg_catalog.acldefault(%s,%s))) WITH ORDINALITY AS initp(acl,row_n) "
! 	  "WHERE NOT EXISTS ( "
! 	  "SELECT 1 FROM pg_catalog.unnest(coalesce(%s,pg_catalog.acldefault(%s,%s))) AS permp(orig_acl) WHERE acl = orig_acl)) as foo)",
  	  obj_kind,
  	  acl_owner,
  	  acl_column,
*** buildACLQueries(PQExpBuffer acl_subquery
*** 761,779 
  	{
  		printfPQExpBuffer(init_acl_subquery,
  		  "CASE WHEN privtype = 'e' THEN "
! 		  "(SELECT pg_catalog.array_agg(acl) FROM "
! 		  "(SELECT pg_catalog.unnest(pip.initprivs) AS acl "
! 		  "EXCEPT "
! 		  "SELECT pg_catalog.unnest(pg_catalog.acldefault(%s,%s))) as foo) END",
  		  obj_kind,
  		  acl_owner);
  
  		printfPQExpBuffer(init_racl_subquery,
  		  "CASE WHEN privtype = 'e' THEN "
  		  "(SELECT pg_catalog.array_agg(acl) FROM "
! 		  "(SELECT pg_catalog.unnest(pg_catalog.acldefault(%s,%s)) AS acl "
! 		  "EXCEPT "
! 		  "SELECT pg_catalog.unnest(pip.initprivs)) as foo) END",
  		  obj_kind,
  		  acl_owner);
  	}
--- 761,779 
  	{
  		printfPQExpBuffer(init_acl_subquery,
  		  "CASE WHEN privtype = 'e' THEN "
! 		  "(SELECT

Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page (Was: [HACKERS] [WIP] Zipfian distribution in pgbench)

2017-07-31 Thread Peter Geoghegan


Robert Haas  wrote:

On Mon, Jul 31, 2017 at 1:54 PM, Peter Geoghegan  wrote:

That is hard to justify. I don't think that failing to set LP_DEAD hints
is the cost that must be paid to realize a benefit elsewhere, though. I
don't see much problem with having both benefits consistently. It's
actually very unlikely that VACUUM will run, and a TID will be recycled
at exactly the wrong time. We could probably come up with a more
discriminating way of detecting that that may have happened, at least
for Postgres 11. We'd continue to use LSN; the slow path would be taken
when the LSN changed, but we do not give up on setting LP_DEAD bits. I
think we can justify going to the heap again in this slow path, if
that's what it takes.


Yeah, that might possibly be a good approach.


I also wonder if it's worth teaching lazy_scan_heap() to keep around a
list of TIDs that can at least have their LP_DEAD bit set within their
index page, for use during subsequent index vacuuming. Doing at least
this much for TIDs from heap pages that are skipped due to some other
session concurrently holding a pin on the heap page ("pinskipped_pages"
pages) could help some cases, and seems doable. VACUUM does not need an
extra interlock against another VACUUM (such as a buffer pin) here, of
course.

I wouldn't expect this to help very much on many workloads, including
Alik's Zipfian workload, but it might be useful to have a real guarantee
about how long it can be, in VACUUM cycles, before a dead index tuple at
least has its LP_DEAD bit set. LP_DEAD will only be set when an index
scan goes to the heap, and it's not hard to imagine workloads where no
index scan ever does that with dead tuples whose heap TIDs were killed
only very recently.

Unlike with heap pruning, setting the LP_DEAD bit of all dead index
tuples on a leaf page is pretty much as good as a full VACUUM of the
page. The only thing that makes it not quite as good is that you cannot
assume that it's safe to kill the heap TIDs afterwards.

--
Peter Geoghegan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #14759: insert into foreign data partitions fail

2017-07-31 Thread David G. Johnston

On Tue, Jul 25, 2017 at 11:29 PM, Amit Langote <
langote_amit...@lab.ntt.co.jp> wrote:

> > I'm curious what the other limitations are...
>
> When I first wrote that documentation line (I am assuming you're asking
> about "although these have some limitations that normal tables do not"), I
> was thinking about the fact that the core system does not enforce
> (locally) any constraints defined on foreign tables.  Since we allow
> inserting data into partitions directly, it is imperative that we enforce
> the "partition constraint" along with the traditional constraints such as
> NOT NULL and CHECK constraints, which we can do for local table partitions
> but not for foreign table ones.
>
> Anyway, attached patch documents all these limitations about foreign table
> partitions more prominently.
>

The revised patch down-thread looks good.  Thanks.

I indeed was referring to the paragraph you quoted.

I would probably just   s/For example/In particular/   and call it good -
or maybe also tell the user that all the limitations are listed in the
notes section for create foreign table (though my first thoughts are all
quite wordy).

David J.

Re: [HACKERS] building libpq.a static library

Peter Eisentraut  writes:
> On 7/12/17 11:11, Tom Lane wrote:
>> FWIW, we used to have support for building static libpq, but
>> we got rid of it a long time ago.  I couldn't find the exact
>> spot in some desultory trawling of the commit history.

> We still build and install static libraries.

Hm, I'd taken Jeroen's complaint at face value, but that sure
explains why I couldn't find where it'd been removed ;-)

The answer may then be that he's working with a vendor-packaged version
and the vendor chose to enforce their distro policy about not shipping
static libraries by patching that build step out of libpq's Makefile.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Hash take II

Hi,

On 2017-07-26 20:12:56 +1200, Thomas Munro wrote:
> Here is a new version of my parallel-aware hash join patchset.

Yay!

Working on reviewing this. Will send separate emails for individual
patch reviews.


> 2.  Simplified costing.  There is now just one control knob
> "parallel_synchronization_cost", which I charge for each time the
> participants will wait for each other at a barrier, to be set high
> enough to dissuade the planner from using Parallel Hash for tiny hash
> tables that would be faster in a parallel-oblivious hash join.
> Earlier ideas about modelling the cost of shared memory access didn't
> work out.

Hm. You say, "didn't work out" - could you expand a bit on that? I'm
quite doubtful that justaccounting for barriers will be good enough.


> I'll report on performance separately.

Looking forward to that ;)

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] building libpq.a static library

On 7/12/17 11:11, Tom Lane wrote:
> FWIW, we used to have support for building static libpq, but
> we got rid of it a long time ago.  I couldn't find the exact
> spot in some desultory trawling of the commit history.

We still build and install static libraries.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL not setting OpenSSL session id context?

Heikki Linnakangas  writes:
> I agree with Tom that we don't really want abbreviated SSL handshakes, 
> or other similar optimizations, to take place. PostgreSQL connections 
> are quite long-lived, so we have little to gain. But it makes the attack 
> surface larger. There have been vulnerabilities related to SSL 
> renegotiation, resumption, abbreviated handshakes, and all that.

> I think we should actually call SSL_CTX_set_session_cache_mode(ctx, 
> SSL_SESS_CACHE_OFF), to disable session caching altogether. I'm not sure 
> if we still need to call SSL_CTX_set_session_cache_mode() if we do that.

AIUI (and I just learned about this stuff yesterday, so I might be wrong)
session caching and session tickets are two independent mechanisms for
SSL session reuse.

I have no objection to explicitly disabling session caching, but I think
it won't have any real effect, because no backend process could ever have
any entries in its session cache anyway.  Maybe it'd result in a more
apropos error message, don't know.

But we need to disable session tickets separately from that.  What's
happening right now in Shay's case, I believe, is that the client is
asking for a session ticket and getting one.  The ticket contains enough
data to re-establish the same SSL context with a successor backend;
but it does not contain any data that would allow restoration of
relevant backend state.  We could imagine "resuming" the session with
virgin backend state, but I think that violates the spirit if not the
letter of RFC 5077.  In any case, implementing it with those semantics
would tie our hands if anyone ever wanted to provide something closer
to true session restoration.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A bug in mapping attributes in ATExecAttachPartition()

On Wed, Jul 26, 2017 at 9:50 PM, Amit Langote
 wrote:
> At least patch 0001 does address a bug.  Not sure if we can say that 0002
> addresses a bug; it implements a feature that might be a
> nice-to-have-in-PG-10.

I think 0001 is actually several fixes that should be separated:

- Cosmetic fixes, like renaming childrels to attachRel_children,
adding a period after "Grab a work queue entry", and replacing the if
(skip_validate) / if (!skip_validate) blocks with if (skip_validate) {
... } else { ... }.

- Taking AccessExclusiveLock initially to avoid a lock upgrade hazard.

- Rejiggering things so that we apply map_partition_varattnos() to the
partConstraint in all relevant places.

Regarding the XXX, we currently require AccessExclusiveLock for the
addition of a CHECK constraint, so I think it's best to just do the
same thing here.  We can optimize later, but it's probably not easy to
come up with something that is safe and correct in all cases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix a typo in pg_upgrade/info.c

On 7/13/17 03:22, Masahiko Sawada wrote:
> Hi,
> 
> Attached patch for $subject.
> 
> s/reporing/reporting/g

fixed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Another comment typo in execMain.c

On 7/6/17 03:23, Etsuro Fujita wrote:
> Here is a comment in ExecFindPartition() in execMain.c:
> 
>  /*
>   * First check the root table's partition constraint, if any.  No 
> point in
>   * routing the tuple it if it doesn't belong in the root table itself.
>   */
> 
> I think that in the second sentence "it" just before "if" is a typo. 
> Attached is a small patch for fixing that.

fixed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] POC: Sharing record typmods between backends

Hi,

diff --git a/src/backend/access/common/tupdesc.c 
b/src/backend/access/common/tupdesc.c
index 9fd7b4e019b..97c0125a4ba 100644
--- a/src/backend/access/common/tupdesc.c
+++ b/src/backend/access/common/tupdesc.c
@@ -337,17 +337,75 @@ DecrTupleDescRefCount(TupleDesc tupdesc)
 {
Assert(tupdesc->tdrefcount > 0);
 
-   ResourceOwnerForgetTupleDesc(CurrentResourceOwner, tupdesc);
+   if (CurrentResourceOwner != NULL)
+   ResourceOwnerForgetTupleDesc(CurrentResourceOwner, tupdesc);
if (--tupdesc->tdrefcount == 0)
FreeTupleDesc(tupdesc);
 }

What's this about? CurrentResourceOwner should always be valid here, no?
If so, why did that change? I don't think it's good to detach this from
the resowner infrastructure...


 /*
- * Compare two TupleDesc structures for logical equality
+ * Compare two TupleDescs' attributes for logical equality
  *
  * Note: we deliberately do not check the attrelid and tdtypmod fields.
  * This allows typcache.c to use this routine to see if a cached record type
  * matches a requested type, and is harmless for relcache.c's uses.
+ */
+bool
+equalTupleDescAttrs(Form_pg_attribute attr1, Form_pg_attribute attr2)
+{

comment not really accurate, this routine afaik isn't used by
typcache.c?


/*
- * Magic numbers for parallel state sharing.  Higher-level code should use
- * smaller values, leaving these very large ones for use by this module.
+ * Magic numbers for per-context parallel state sharing.  Higher-level code
+ * should use smaller values, leaving these very large ones for use by this
+ * module.
  */
 #define PARALLEL_KEY_FIXED 
UINT64CONST(0x0001)
 #define PARALLEL_KEY_ERROR_QUEUE   
UINT64CONST(0x0002)
@@ -63,6 +74,16 @@
 #define PARALLEL_KEY_ACTIVE_SNAPSHOT   UINT64CONST(0x0007)
 #define PARALLEL_KEY_TRANSACTION_STATE UINT64CONST(0x0008)
 #define PARALLEL_KEY_ENTRYPOINT
UINT64CONST(0x0009)
+#define PARALLEL_KEY_SESSION_DSM   
UINT64CONST(0x000A)
+
+/* Magic number for per-session DSM TOC. */
+#define PARALLEL_SESSION_MAGIC 0xabb0fbc9
+
+/*
+ * Magic numbers for parallel state sharing in the per-session DSM area.
+ */
+#define PARALLEL_KEY_SESSION_DSA   
UINT64CONST(0x0001)
+#define PARALLEL_KEY_RECORD_TYPMOD_REGISTRYUINT64CONST(0x0002)

Not this patch's fault, but this infrastructure really isn't great. We
should really replace it with a shmem.h style infrastructure, using a
dht hashtable as backing...


+/* The current per-session DSM segment, if attached. */
+static dsm_segment *current_session_segment = NULL;
+

I think it'd be better if we had a proper 'SessionState' and
'BackendSessionState' infrastructure that then contains the dsm segment
etc. I think we'll otherwise just end up with a bunch of parallel
infrastructures.



+/*
+ * A mechanism for sharing record typmods between backends.
+ */
+struct SharedRecordTypmodRegistry
+{
+   dht_hash_table_handle atts_index_handle;
+   dht_hash_table_handle typmod_index_handle;
+   pg_atomic_uint32 next_typmod;
+};
+

I think the code needs to explain better how these are intended to be
used. IIUC, atts_index is used to find typmods by "identity", and
typmod_index by the typmod, right? And we need both to avoid
all workers generating different tupledescs, right?  Kinda guessable by
reading typecache.c, but that shouldn't be needed.


+/*
+ * A flattened/serialized representation of a TupleDesc for use in shared
+ * memory.  Can be converted to and from regular TupleDesc format.  Doesn't
+ * support constraints and doesn't store the actual type OID, because this is
+ * only for use with RECORD types as created by CreateTupleDesc().  These are
+ * arranged into a linked list, in the hash table entry corresponding to the
+ * OIDs of the first 16 attributes, so we'd expect to get more than one entry
+ * in the list when named and other properties differ.
+ */
+typedef struct SerializedTupleDesc
+{
+   dsa_pointer next;   /* next with the same same 
attribute OIDs */
+   int natts;  /* number of attributes 
in the tuple */
+   int32   typmod; /* typmod for tuple type */
+   boolhasoid; /* tuple has oid attribute in 
its header */
+
+   /*
+* The attributes follow.  We only ever access the first
+* ATTRIBUTE_FIXED_PART_SIZE bytes of each element, like the code in
+* tupdesc.c.
+*/
+   FormData_pg_attribute attributes[FLEXIBLE_ARRAY_MEMBER];
+} SerializedTupleDesc;

Not a fan of a separate tupledesc representation, that's just going to
lead to divergence over time. I think we should rather change the normal
tupledesc representation to be

Re: [HACKERS] Inconsistencies in from_char_parse_int_len()

Douglas Doole  writes:
> I was playing with TO_TIMESTAMP() and I noticed a weird result:
> postgres=# select to_timestamp('20170-07-24 21:59:57.12345678', '-mm-dd
> hh24:mi:ss.us');
>   to_timestamp
> 
>  20170-07-24 22:00:09.345678+00
> (1 row)

FWIW, we already tightened that up in v10:

regression=#  select to_timestamp('20170-07-24 21:59:57.12345678', '-mm-dd 
hh24:mi:ss.us');
ERROR:  date/time field value out of range: "20170-07-24 21:59:57.12345678"

There may well be some discrepancies left.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL_stashcache, or, what's our minimum Perl version?

Peter Eisentraut  writes:
> On 7/31/17 15:38, Tom Lane wrote:
>> Really?  That seems pretty broken, independently of how many variables
>> are affected.  But the ones you'd be most likely to do that with are
>> using AC_PATH_PROG already, I think.  Having lesser-used program variables
>> behave inconsistently doesn't seem like much of a win.

> Well, if we're fiddling around here, I would change them all to
> AC_CHECK_PROG if possible.  Especially the PYTHON one annoys me all the
> time.  CC is another one I set occasionally.

I will object really really strongly to that, as it is 180 degrees from
where I think we need to go, and will make things a lot worse than before
on the documentation aspect that I was concerned about to begin with.

If we need to fix things so that AC_PATH_PROG will honor a non-path
input value, then let's do that.  But let's not make the build system
shakier/less reproducible than it is already.

I suggest that we could inject logic like this:

  if VARIABLE-is-set-and-value-isn't-already-absolute; then
VARIABLE=`which $VARIABLE 2>/dev/null`
  fi

in front of the existing logic for AC_PATH_PROG(VARIABLE,...).
Maybe "which" isn't the best tool for the job, not sure.

Another idea, which would probably require replacing _AC_PATH_PROG
rather than just putting a wrapper around it, would be to let it
perform its normal path walk but using the given word instead of
$ac_word.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG 10 release notes

2017-07-31 Thread Thomas Munro

On Tue, Apr 25, 2017 at 1:31 PM, Bruce Momjian  wrote:
> I have committed the first draft of the Postgres 10 release notes.  They
> are current as of two days ago, and I will keep them current.  Please
> give me any feedback you have.

Hi Bruce,

"Add AFTER trigger transition tables to record changed rows (Kevin Grittner)"

Any chance I could ask for a secondary author credit here?  While I
started out as a reviewer and I understand that we don't list those, I
finished up writing quite a lot of lines of committed code for this
(see commits 1add0b15, 8c55244a, c46c0e52, 501ed02c, f32d57fd,
9e6104c6, 29fd3d9d, 304007d9, 5ebeb579) and was already listed as a
co-author by Kevin in the original commits (59702716, 18ce3a4a).  Of
course I wish I'd identified and fixed all of those things *before*
the original commits, but that's how it played out...

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Inconsistencies in from_char_parse_int_len()

2017-07-31 Thread Douglas Doole

I was playing with TO_TIMESTAMP() and I noticed a weird result:

postgres=# select to_timestamp('20170-07-24 21:59:57.12345678', '-mm-dd
hh24:mi:ss.us');
  to_timestamp

 20170-07-24 22:00:09.345678+00
(1 row)

Even though the "us" token is supposed to be restricted to 00-99 it
looks like the microseconds was calculated as 12.345678.

Digging into the code, I found inconsistencies in
from_char_parse_int_len(). From formatting.c:

!/*
! * Read a single integer from the source string, into the int pointed
to by
! * 'dest'. If 'dest' is NULL, the result is discarded.
! *
! * In fixed-width mode (the node does not have the FM suffix), consume
at most
! * 'len' characters.  However, any leading whitespace isn't counted in
'len'.
! *

!static int
!from_char_parse_int_len(int *dest, char **src, const int len,
FormatNode *node)
!{

!if (S_FM(node->suffix) || is_next_separator(node))
!{
!/*
! * This node is in Fill Mode, or the next node is known to be a
! * non-digit value, so we just slurp as many characters as we
can get.
! */
!errno = 0;
!result = strtol(init, src, 10);
!}
!else
!{
!/*
! * We need to pull exactly the number of characters given in
'len' out
! * of the string, and convert those.
! */
!char   *last;
!
!if (used < len)
!ereport(ERROR,
!(errcode(ERRCODE_INVALID_DATETIME_FORMAT),

So the function prologue disagrees with the code. In the first condition
strtol() can consume more than 'len' digits. In the else, we error out if
we don't have exactly 'len' characters.

What's the proper behaviour here?

- Doug
Salesforce

Re: [HACKERS] [PATCH] Pattern based listeners for asynchronous messaging (LISTEN/NOTIFY)

On 7/31/17 16:13, Markus Sintonen wrote:
> This patch has no know backward compatibility issues with the existing
> /LISTEN///UNLISTEN/ features. This is because patch extends the existing
> syntax by accepting quoted strings which define the patterns as opposed
> to the existing SQL literals.

I don't see that in the patch.  Your patch changes the syntax LISTEN
ColId to mean a regular expression.

Even then, having LISTEN "foo" and LISTEN 'foo' mean different things
would probably be confusing.

I would think about specifying an operator somewhere in the syntax, like
you have with LISTEN SIMILAR TO.  It would even be nice if a
non-built-in operator could be used for matching names.

Documentation is missing in the patch.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL_stashcache, or, what's our minimum Perl version?

On 7/31/17 15:38, Tom Lane wrote:
> Peter Eisentraut  writes:
>> One major PITA with the AC_PATH_* checks is that you can only override
>> them with environment variables that are full paths; otherwise the
>> environment variables are ignored.  For example, currently, running
> 
>> ./configure PYTHON=python3
> 
>> will result in the PYTHON setting being ignored.
> 
> Really?  That seems pretty broken, independently of how many variables
> are affected.  But the ones you'd be most likely to do that with are
> using AC_PATH_PROG already, I think.  Having lesser-used program variables
> behave inconsistently doesn't seem like much of a win.

Well, if we're fiddling around here, I would change them all to
AC_CHECK_PROG if possible.  Especially the PYTHON one annoys me all the
time.  CC is another one I set occasionally.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH] Pattern based listeners for asynchronous messaging (LISTEN/NOTIFY)

2017-07-31 Thread Markus Sintonen

This patch adds an ability to use patterns in *LISTEN* commands. Patch uses
'*SIMILAR TO*' patterns for matching *NOTIFY* channel names (
https://www.postgresql.org/docs/9.0/static/functions-matching.html#FUNCTIONS-SIMILARTO-REGEXP
).

This patch is related to old discussion in
https://www.postgresql.org/message-id/52693fc5.7070...@gmail.com. This
discussion contains the reasoning behind the pattern based matching of the
channel names.

Patch allows the following.
The listener is registered with following command, for example:
*LISTEN SIMILAR TO 'test%';*
which would match and receive a message from this example notfication:
*NOTIFY test_2;*
Unlistening the above registered pattern:
*UNLISTEN 'test%';*
More examples can be seen from the added regress and isolation tests.
Note that *UNLISTEN* does not allow pattern based unlistening of the
registered listeners. It merely matches the registered pattern by simple
string comparison.

This patch has no know backward compatibility issues with the existing
*LISTEN*/*UNLISTEN* features. This is because patch extends the existing
syntax by accepting quoted strings which define the patterns as opposed to
the existing SQL literals.

The patch also extends the isolationtester by adding an ability to monitor
registered notify messages. This is used to test the existing features as
well as the new pattern based feature. (Does not affect other existing
tests)

Further considerations for this patch:
- Change pattern type, alternatives would be e.g. *LIKE* or POSIX patterns
- Remove '*SIMILAR TO*' from the command (just use quoted string in
form of *LISTEN
'test_%'*)
- Allow *UNLISTEN SIMILAR TO 'xxx'* which would unregister matching
listeners. To me this feels confusing therefore it is not in the patch.

Some notes about patch:
- Most of the changes in async.c
- Regular expression is compiled and finally stored in top memory context
- RE compilation (+error check) happens before transaction commit
- RE is compiled in current transaction memory context from where it is
copied to top memory context on transaction commit (when everything first
succeeds)
- Compiled REs in current transaction are freed on transaction abort
- Compiled REs that were not applied as listeners (duplicates) are freed on
transaction commit
- REs freed when unlistened
- No obvious performance impacts (e.g. normal channel name listening uses
just strcmp)

Thoughts?

Best regards
Markus

listen-pattern.patch
Description: Binary data

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL - Weak DH group

2017-07-31 Thread Heikki Linnakangas


On 07/31/2017 02:27 PM, Heikki Linnakangas wrote:

Rebased patch attached, with proposed release notes included. Barring
new objections or arguments, I'll commit this (only) to v10 later today.


Ok, committed for v10. Thanks Nicolas and Damien, and everyone else 
involved!


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL_stashcache, or, what's our minimum Perl version?

On 7/31/17 15:17, Peter Eisentraut wrote:
> On 7/31/17 14:55, Tom Lane wrote:
>>> We use the "PATH" variants when we need a fully qualified name.  For
>>> example, at some point or another, we needed to substitute a fully
>>> qualified perl binary name into the headers of scripts.
>>
>>> If there is no such requirement, then we should use the non-PATH variants.
>>
>> Why?  That risks failures of various sorts, and you have not stated
>> any actual benefit of it.
> 
> What I wrote is merely a description of the current practice.  That
> practice was in turn developed out of ancient Autoconf standard
> practices.  There are arguments to be made for doing it differently.
> 
> One major PITA with the AC_PATH_* checks is that you can only override
> them with environment variables that are full paths; otherwise the
> environment variables are ignored.  For example, currently, running
> 
> ./configure PYTHON=python3
> 
> will result in the PYTHON setting being ignored.  Currently, this only
> affects a small number of variables, but if we expanded that, it would
> be a pretty significant usability change.

Plus certain special macros such as AC_PROG_CC don't have a PATH
variant, so you're always going to have some inconsistencies.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL_stashcache, or, what's our minimum Perl version?

Peter Eisentraut  writes:
> One major PITA with the AC_PATH_* checks is that you can only override
> them with environment variables that are full paths; otherwise the
> environment variables are ignored.  For example, currently, running

> ./configure PYTHON=python3

> will result in the PYTHON setting being ignored.

Really?  That seems pretty broken, independently of how many variables
are affected.  But the ones you'd be most likely to do that with are
using AC_PATH_PROG already, I think.  Having lesser-used program variables
behave inconsistently doesn't seem like much of a win.

I'd almost be inclined to say that we should override that behavior
of AC_PATH_PROG.  It is undocumented AFAICS, and it's not amazingly
well thought out, either.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL_stashcache, or, what's our minimum Perl version?

On 7/31/17 14:55, Tom Lane wrote:
>> We use the "PATH" variants when we need a fully qualified name.  For
>> example, at some point or another, we needed to substitute a fully
>> qualified perl binary name into the headers of scripts.
> 
>> If there is no such requirement, then we should use the non-PATH variants.
> 
> Why?  That risks failures of various sorts, and you have not stated
> any actual benefit of it.

What I wrote is merely a description of the current practice.  That
practice was in turn developed out of ancient Autoconf standard
practices.  There are arguments to be made for doing it differently.

One major PITA with the AC_PATH_* checks is that you can only override
them with environment variables that are full paths; otherwise the
environment variables are ignored.  For example, currently, running

./configure PYTHON=python3

will result in the PYTHON setting being ignored.  Currently, this only
affects a small number of variables, but if we expanded that, it would
be a pretty significant usability change.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server

2017-07-31 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> On Thu, Jul 27, 2017 at 10:27 AM, Stephen Frost  wrote:
> > * Noah Misch (n...@leadboat.com) wrote:
> >> This PostgreSQL 10 open item is past due for your status update.  Kindly 
> >> send
> >> a status update within 24 hours, and include a date for your subsequent 
> >> status
> >> update.  Refer to the policy on open item ownership:
> >
> > Based on the ongoing discussion, this is really looking like it's
> > actually a fix that needs to be back-patched to 9.6 rather than a PG10
> > open item.  I don't have any issue with keeping it as an open item
> > though, just mentioning it.  I'll provide another status update on or
> > before Monday, July 31st.
> >
> > I'll get to work on the back-patch and try to draft up something to go
> > into the release notes for 9.6.4.
> 
> Whether this is going to be back-patched or not, you should do
> something about it quickly, because we're wrapping a new beta and a
> full set of back-branch releases next week.  I'm personally hoping
> that what follows beta3 will be rc1, but if we have too much churn
> after beta3 we'll end up with a beta4, which could end up slipping the
> whole release cycle.

Yes, I've been working on this and the other issues with pg_dump today.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server

On Thu, Jul 27, 2017 at 10:27 AM, Stephen Frost  wrote:
> * Noah Misch (n...@leadboat.com) wrote:
>> This PostgreSQL 10 open item is past due for your status update.  Kindly send
>> a status update within 24 hours, and include a date for your subsequent 
>> status
>> update.  Refer to the policy on open item ownership:
>
> Based on the ongoing discussion, this is really looking like it's
> actually a fix that needs to be back-patched to 9.6 rather than a PG10
> open item.  I don't have any issue with keeping it as an open item
> though, just mentioning it.  I'll provide another status update on or
> before Monday, July 31st.
>
> I'll get to work on the back-patch and try to draft up something to go
> into the release notes for 9.6.4.

Whether this is going to be back-patched or not, you should do
something about it quickly, because we're wrapping a new beta and a
full set of back-branch releases next week.  I'm personally hoping
that what follows beta3 will be rc1, but if we have too much churn
after beta3 we'll end up with a beta4, which could end up slipping the
whole release cycle.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL_stashcache, or, what's our minimum Perl version?

Peter Eisentraut  writes:
> On 7/30/17 12:50, Tom Lane wrote:
>> The reason it does that seems to be that we use AC_CHECK_PROGS
>> rather than AC_PATH_PROGS for locating "prove".  I can see no
>> particular consistency to the decisions made in configure.in
>> about which to use:

> We use the "PATH" variants when we need a fully qualified name.  For
> example, at some point or another, we needed to substitute a fully
> qualified perl binary name into the headers of scripts.

> If there is no such requirement, then we should use the non-PATH variants.

Why?  That risks failures of various sorts, and you have not stated
any actual benefit of it.

In cases where people do things like sticking non-default Perl builds
into nonstandard places, failing to record the absolute path to the
program configure saw is both a documentation fail and a clear hazard
to build reproducibility.  I think that "you can change your PATH and
get a different Perl version without reconfiguring" is an anti-feature,
because it poses a very high risk of not actually working.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page (Was: [HACKERS] [WIP] Zipfian distribution in pgbench)

On Mon, Jul 31, 2017 at 1:54 PM, Peter Geoghegan  wrote:
> That is hard to justify. I don't think that failing to set LP_DEAD hints
> is the cost that must be paid to realize a benefit elsewhere, though. I
> don't see much problem with having both benefits consistently. It's
> actually very unlikely that VACUUM will run, and a TID will be recycled
> at exactly the wrong time. We could probably come up with a more
> discriminating way of detecting that that may have happened, at least
> for Postgres 11. We'd continue to use LSN; the slow path would be taken
> when the LSN changed, but we do not give up on setting LP_DEAD bits. I
> think we can justify going to the heap again in this slow path, if
> that's what it takes.

Yeah, that might possibly be a good approach.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL_stashcache, or, what's our minimum Perl version?

On 7/30/17 12:50, Tom Lane wrote:
> Noah Misch  writes:
>> On Sun, Jul 30, 2017 at 12:05:10PM -0400, Tom Lane wrote:
>>> Well, OK, but I'd still like to tweak configure so that it records
>>> an absolute path for prove rather than just setting PROVE=prove.
>>> That way you'd at least be able to tell from the configure log
>>> whether you are possibly at risk.
> 
>> That's an improvement.

I disagree with that, unless there is an actual risk.

> The reason it does that seems to be that we use AC_CHECK_PROGS
> rather than AC_PATH_PROGS for locating "prove".  I can see no
> particular consistency to the decisions made in configure.in
> about which to use:

We use the "PATH" variants when we need a fully qualified name.  For
example, at some point or another, we needed to substitute a fully
qualified perl binary name into the headers of scripts.

If there is no such requirement, then we should use the non-PATH variants.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

On Mon, Jul 31, 2017 at 1:27 PM, Alvaro Herrera
 wrote:
> Postgres-XL seems to manage this problem by using a transaction manager
> node, which is in charge of assigning snapshots.  I don't know how that
> works, but perhaps adding that concept here could be useful too.  One
> critical point to that design is that the app connects not directly to
> the underlying Postgres server but instead to some other node which is
> or connects to the node that manages the snapshots.
>
> Maybe Michael can explain in better detail how it works, and/or how (and
> if) it could be applied here.

I suspect that if you've got a central coordinator server that is the
jumping-off point for all distributed transactions, the Postgres-XL
approach is hard to beat (at least in concept, not sure about the
implementation).  That server is brokering all of the connections to
the data nodes anyway, so it might as well tell them all what
snapshots to use while it's there.  When you scale to multiple
coordinators, though, it's less clear that it's the best approach.
Now one coordinator has to be the GTM master, and that server is
likely to become a bottleneck -- plus talking to it involves extra
network hops for all the other coordinators. When you then move the
needle a bit further and imagine a system where the idea of a
coordinator doesn't even exist, and you've just got a loosely couple
distributed system where distributed transactions might arrive on any
node, all of which are also servicing local transactions, then it
seems pretty likely that the Postgres-XL approach is not the best fit.

We might want to support multiple models.  Which one to support first
is a harder question.  The thing I like least about the Postgres-XC
approach is it seems inevitable that, as Michael says, the central
server handing out XIDs and snapshots is bound to become a bottleneck.
That type of system implicitly constructs a total order of all
distributed transactions, but we don't really need a total order.  If
two transactions don't touch the same data and there's no overlapping
transaction that can notice the commit order, then we could make those
commit decisions independently on different nodes without caring which
one "happens first".  The problem is that it might take so much
bookkeeping to figure out whether that is in fact the case in a
particular instance that it's even more expensive than having a
central server that functions as a global bottleneck.

It might be worth some study not only of Postgres-XL but also of other
databases that claim to provide distributed transactional consistency
across nodes.  I've found literature on this topic from time to time
over the years, but I'm not sure what the best practices in this area
actually are. https://en.wikipedia.org/wiki/Global_serializability
claims that a technique called Commitment Ordering (CO) is teh
awesome, but I've got my doubts about whether that's really an
objective description of the state of the art.  One clue is that the
global serialiazability article says three separate times that the
technique has been widely misunderstood.  I'm not sure exactly which
Wikipedia guideline that violates, but I think Wikipedia is supposed
to summarize the views that exist on a topic in accordance with their
prevalence, not take a position on which view is correct.
https://en.wikipedia.org/wiki/Commitment_ordering contains citations
from the papers only of one guy, Yoav Raz, which is another hint that
this may not be as widely-regarded a technique as the person who wrote
these articles thinks it should be.  Anyway, it would be good to
understand what other well-regarded systems do before we choose what
we want to do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page (Was: [HACKERS] [WIP] Zipfian distribution in pgbench)

2017-07-31 Thread Peter Geoghegan


Robert Haas  wrote:

On Thu, Jul 27, 2017 at 10:05 PM, Peter Geoghegan  wrote:

I really don't know if that would be worthwhile. It would certainly fix
the regression shown in my test case, but that might not go far enough.
I strongly suspect that there are more complicated workloads where
LP_DEAD cleanup from SELECT statements matters, which is prevented by
the LSN check thing, just because there are always other sessions that
modify the page concurrently. This might be true of Alik's Zipfian test
case, for example.


I haven't studied the test case, but I think as a general principle it
makes sense to be happy when someone comes up with an algorithm that
holds a lock for a shorter period of time (and buffer pins are a type
of lock).  There are a number of places (fast-path locking, for
example, or vacuum skipping pinned heap pages) where we have
fast-paths that get taken most of the time and slow paths that get
used when concurrent activity happens; empirically, such things often
work out to a win.  I think it's disturbing that this code seems to be
taking the slow-path (which, in context, means skipping LP_DEAD
cleanup) even there is no concurrent activity.  That's hard to
justify.


That is hard to justify. I don't think that failing to set LP_DEAD hints
is the cost that must be paid to realize a benefit elsewhere, though. I
don't see much problem with having both benefits consistently. It's
actually very unlikely that VACUUM will run, and a TID will be recycled
at exactly the wrong time. We could probably come up with a more
discriminating way of detecting that that may have happened, at least
for Postgres 11. We'd continue to use LSN; the slow path would be taken
when the LSN changed, but we do not give up on setting LP_DEAD bits. I
think we can justify going to the heap again in this slow path, if
that's what it takes.


But the fact that it is taking the slow-path when there *is*
concurrent activity is harder to complain about.  That might win or it
might lose; the non-concurrent case only loses.


Let's wait to see what difference it makes if Alik's zipfian
distribution pgbench test case uses unlogged tables. That may gives us a
good sense of the problem for cases with contention/concurrency.

--
Peter Geoghegan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL_stashcache, or, what's our minimum Perl version?

Noah Misch  writes:
> On Sun, Jul 30, 2017 at 01:21:28AM -0400, Tom Lane wrote:
>> Anyway, pending some news about compatibility of the MSVC scripts,
>> I think we ought to adjust our docs to state that 5.8.3 is the
>> minimum supported Perl version.

> Works for me.

Done.  I have also reconfigured buildfarm member prairiedog to use
a non-MULTIPLICITY build of Perl 5.8.3, with the oldest Test::More
and IPC::Run versions I could lay my hands on.  Although I'd gotten
through a manual "make check-world" with this configuration in HEAD
before touching the buildfarm configuration, I see that it just fell
over in the back branches.  So there's still some more fixing to be
done, or else we'll need to change that claim again.  Will investigate
once the buildfarm run finishes.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page (Was: [HACKERS] [WIP] Zipfian distribution in pgbench)

On Thu, Jul 27, 2017 at 10:05 PM, Peter Geoghegan  wrote:
> I really don't know if that would be worthwhile. It would certainly fix
> the regression shown in my test case, but that might not go far enough.
> I strongly suspect that there are more complicated workloads where
> LP_DEAD cleanup from SELECT statements matters, which is prevented by
> the LSN check thing, just because there are always other sessions that
> modify the page concurrently. This might be true of Alik's Zipfian test
> case, for example.

I haven't studied the test case, but I think as a general principle it
makes sense to be happy when someone comes up with an algorithm that
holds a lock for a shorter period of time (and buffer pins are a type
of lock).  There are a number of places (fast-path locking, for
example, or vacuum skipping pinned heap pages) where we have
fast-paths that get taken most of the time and slow paths that get
used when concurrent activity happens; empirically, such things often
work out to a win.  I think it's disturbing that this code seems to be
taking the slow-path (which, in context, means skipping LP_DEAD
cleanup) even there is no concurrent activity.  That's hard to
justify.  But the fact that it is taking the slow-path when there *is*
concurrent activity is harder to complain about.  That might win or it
might lose; the non-concurrent case only loses.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2017-07-31 Thread Alvaro Herrera

Robert Haas wrote:

> An alternative approach is to have some kind of other identifier,
> let's call it a distributed transaction ID (DXID) which is mapped by
> each node onto a local XID.

Postgres-XL seems to manage this problem by using a transaction manager
node, which is in charge of assigning snapshots.  I don't know how that
works, but perhaps adding that concept here could be useful too.  One
critical point to that design is that the app connects not directly to
the underlying Postgres server but instead to some other node which is
or connects to the node that manages the snapshots.

Maybe Michael can explain in better detail how it works, and/or how (and
if) it could be applied here.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-31 Thread Mark Rofail

On Mon, Jul 31, 2017 at 5:18 PM, Alvaro Herrera 
wrote:

> Tom Lane wrote:
> > Alvaro Herrera  writes:
> > > ...  However, when you create an index, you can
> > > indicate which operator class to use, and it may not be the default
> one.
> > > If a different one is chosen at index creation time, then a query using
> > > COUNT(distinct) will do the wrong thing, because DISTINCT will select
> > > an equality type using the type's default operator class, not the
> > > equality that belongs to the operator class used to create the index.
> >
> > > That's wrong: DISTINCT should use the equality operator that
> corresponds
> > > to the index' operator class instead, not the default one.
> >
> > Uh, what?  Surely the semantics of count(distinct x) *must not* vary
> > depending on what indexes happen to be available.
>
> Err ...
>
> > I think what you meant to say is that the planner may only choose an
> > optimization of this sort when the index's opclass matches the one
> > DISTINCT will use, ie the default for the data type.


I understand the problem. I am currently researching how to resolve it.

Best Regards,
Mark Rofail

Re: [HACKERS] Increase Vacuum ring buffer.

2017-07-31 Thread Sokolov Yura


On 2017-07-27 11:53, Sokolov Yura wrote:

On 2017-07-26 20:28, Sokolov Yura wrote:

On 2017-07-26 19:46, Claudio Freire wrote:

On Wed, Jul 26, 2017 at 1:39 PM, Sokolov Yura
 wrote:

On 2017-07-24 12:41, Sokolov Yura wrote:
test_master_1/pretty.log

...

time   activity  tps  latency   stddev  min  max
11130 av+ch  198198ms374ms  7ms   1956ms
11160 av+ch  248163ms401ms  7ms   2601ms
11190 av+ch  321125ms363ms  7ms   2722ms
11220 av+ch 1155 35ms123ms  7ms   2668ms
11250 av+ch 1390 29ms 79ms  7ms   1422ms


vs


test_master_ring16_1/pretty.log
time   activity  tps  latency   stddev  min  max
11130 av+ch   26   1575ms635ms101ms   2536ms
11160 av+ch   25   1552ms648ms 58ms   2376ms
11190 av+ch   32   1275ms726ms 16ms   2493ms
11220 av+ch   23   1584ms674ms 48ms   2454ms
11250 av+ch   35   1235ms777ms 22ms   3627ms


That's a very huge change in latency for the worse

Are you sure that's the ring buffer's doing and not some methodology 
snafu?


Well, I tuned postgresql.conf so that there is no such
catastrophic slows down on master branch. (with default
settings such slowdown happens quite frequently).
bgwriter_lru_maxpages = 10 (instead of default 200) were one
of such tuning.

Probably there were some magic "border" that triggers this
behavior. Tuning postgresql.conf shifted master branch on
"good side" of this border, and faster autovacuum crossed it
to "bad side" again.

Probably, backend_flush_after = 2MB (instead of default 0) is
also part of this border. I didn't try to bench without this
option yet.

Any way, given checkpoint and autovacuum interference could be
such noticeable, checkpoint clearly should affect autovacuum
cost mechanism, imho.

With regards,


I'll run two times with default postgresql.conf (except
shared_buffers and maintence_work_mem) to find out behavior on
default setting.

Then I'll try to investigate checkpoint co-operation with
autovacuum to fix behavior with "tuned" postgresql.conf.



I've accidentally lost results of this run, so I will rerun it.

This I remembered:
- even with default settings, autovacuum runs 3 times faster:
9000s on master, 3000s with increased ring buffer.
So xlog-fsync really slows down autovacuum.
- but concurrent transactions slows down (not so extremely as in
previous test, but still significantly).
I could not draw pretty table now, cause I lost results. I'll do
it after re-run completes.

Could someone suggest, how to cooperate checkpoint with autovacuum,
to slow down autovacuum a bit during checkpoint?

With regards,
--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

On Fri, Jul 28, 2017 at 10:14 AM, Michael Paquier
 wrote:
> On Fri, Jul 28, 2017 at 7:28 AM, Masahiko Sawada  
> wrote:
>> That also requires to share the same XID space with all remote nodes.
>
> You are putting your finger on the main bottleneck with global
> consistency that XC and XL has because of that. And the source feeding
> the XIDs is a SPOF.
>
>> Perhaps the CSN based snapshot can make this more simple.
>
> Hm. This needs a closer look.

With or without CSNs, sharing the same XID space across all nodes is
undesirable in a loosely-coupled network.  If only a small fraction of
transactions are distributed, incurring the overhead of synchronizing
XID assignment for every transaction is not good.  Suppose node A
processes many transactions and node B only a few transactions; then,
XID advancement caused by node A forces node B to perform vacuum for
wraparound.  Not fun.  Or, if you have an OLTP workload running on A
and an OLTP workload running B that are independent of each other, and
occasional reporting queries that scan both, you'll be incurring the
overhead of keeping A and B consistent for a lot of transactions that
don't need it.  Of course, when A and B are tightly coupled and
basically all transactions are scanning both, locking the XID space
together *may* be the best approach, but even then there are notable
disadvantages - e.g. they can't both continue processing write
transactions if the connection between the two is severed.

An alternative approach is to have some kind of other identifier,
let's call it a distributed transaction ID (DXID) which is mapped by
each node onto a local XID.

Regardless of whether we share XIDs or DXIDs, we need a more complex
concept of transaction state than we have now.  Right now,
transactions are basically in-progress, committed, or aborted, but
there's also the state where the status of the transaction is known by
someone but not locally.  You can imagine something like: during the
prepare phase, all nodes set the status in clog to "prepared".  Then,
if that succeeds, the leader changes the status to "committed" or
"aborted" and tells all nodes to do the same.  Thereafter, any time
someone inquires about the status of that transaction, we go ask all
of the other nodes in the cluster; if they all think it's prepared,
then it's prepared -- but if any of them think it's committed or
aborted, then we change our local status to match and return that
status.  So once one node changes the status to committed or aborted
it can propagate through the cluster even if connectivity is lost for
a while.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/perl extension fails on Windows

2017-07-31 Thread Christoph Berg

Re: Tom Lane 2017-07-31 <30582.1501508...@sss.pgh.pa.us>
> Christoph Berg  writes:
> >>> The only interesting line in log/postmaster.log is a log_line_prefix-less
> >>> Util.c: loadable library and perl binaries are mismatched (got handshake 
> >>> key 0xd500080, needed 0xd600080)
> 
> Can we see the Perl-related output from configure, particularly the new
> lines about CFLAGS?

$ ./configure --with-perl

checking whether to build Perl modules... yes

checking for perl... /usr/bin/perl
configure: using perl 5.26.0
checking for Perl archlibexp... /usr/lib/x86_64-kfreebsd-gnu/perl/5.26
checking for Perl privlibexp... /usr/share/perl/5.26
checking for Perl useshrplib... true
checking for CFLAGS recommended by Perl... -D_REENTRANT -D_GNU_SOURCE -DDEBIAN 
-fwrapv -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE 
-D_FILE_OFFSET_BITS=64
checking for CFLAGS to compile embedded Perl... -DDEBIAN
checking for flags to link embedded Perl...   -fstack-protector-strong 
-L/usr/local/lib  -L/usr/lib/x86_64-kfreebsd-gnu/perl/5.26/CORE -lperl -ldl -lm 
-lpthread -lc -lcrypt

checking for perl.h... yes
checking for libperl... yes

Christoph


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

On Thu, Jul 27, 2017 at 8:25 AM, Ashutosh Bapat
 wrote:
> The remote transaction can be committed/aborted only after the fate of
> the local transaction is decided. If we commit remote transaction and
> abort local transaction, that's not good. AtEOXact* functions are
> called immediately after that decision in post-commit/abort phase. So,
> if we want to commit/abort the remote transaction immediately it has
> to be done in post-commit/abort processing. Instead if we delegate
> that to the remote transaction resolved backend (introduced by the
> patches) the delay between local commit and remote commits depends
> upon when the resolve gets a chance to run and process those
> transactions. One could argue that that delay would anyway exist when
> post-commit/abort processing fails to resolve remote transaction. But
> given the real high availability these days, in most of the cases
> remote transaction will be resolved in the post-commit/abort phase. I
> think we should optimize for most common case. Your concern is still
> valid, that we shouldn't raise an error or do anything critical in
> post-commit/abort phase. So we should device a way to send
> COMMIT/ABORT prepared messages to the remote server in asynchronous
> fashion carefully avoiding errors. Recent changes to 2PC have improved
> performance in that area to a great extent. Relying on resolver
> backend to resolve remote transactions would erode that performance
> gain.

I think there are two separate but interconnected issues here.  One is
that if we give the user a new command prompt without resolving the
remote transaction, then they might run a new query that sees their
own work as committed, which would be bad.  Or, they might commit,
wait for the acknowledgement, and then tell some other session to go
look at the data, and find it not there.  That would also be bad.  I
think the solution is likely to do something like what we did for
synchronous replication in commit
9a56dc3389b9470031e9ef8e45c95a680982e01a -- wait for the remove
transaction to be resolved (by the background process) but allow an
interrupt to escape the wait-loop.

The second issue is that having the resolver resolve transactions
might be slower than doing it in the foreground.  I don't necessarily
see a reason why that should be a big problem.  I mean, the resolver
might need to establish a separate connection, but if it keeps that
connection open for a while (say, 5 minutes) in case further
transactions arrive then it won't be an issue except on really
low-volume system which isn't really a case I think we need to worry
about very much.  Also, the hand-off to the resolver might take some
time, but that's equally true for sync rep and we're living with it
there.  Anything else is presumably just the resolver itself being
inefficient which seems like something that can simply be fixed.

FWIW, I don't think the present resolver implementation is likely to
be what we want.  IIRC, it's just calling an SQL function which
doesn't seem like a good approach.  Ideally we should stick an entry
into a shared memory queue and then ping the resolver via SetLatch,
and it can directly invoke an FDW method on the data from the shared
memory queue.  It should be possible to set things up so that a user
who wishes to do so can run multiple copies of the resolver thread at
the same time, which would be a good way to keep latency down if the
system is very busy with distributed transactions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clarification in pg10's pgupgrade.html step 10 (upgrading standby servers)

On Fri, Jul 28, 2017 at 10:35 AM, Andreas Joseph Krogh
 wrote:
> I'm reading https://www.postgresql.org/docs/10/static/pgupgrade.html to try
> to understand how to upgrade standby-servers using pg_upgrade with pg10.
>
> The text in step 10 sais:
> "You will not be running pg_upgrade on the standby servers, but rather
> rsync", which to me sounds like rsync, in step 10-f, should be issued on the
> standy servers. Is this the case? If so I don't understand how the standby's
> data is upgraded and what "remote_dir" is. If rsync is supposed to be issued
> on the primary then I think it should be explicitly mentioned, and step 10-f
> should provide a clarer example with more detailed values for the
> directory-structures involved.
>
> I really think section 10 needs improvement as I'm certainly not comfortable
> upgrading standbys following the existing procedure.

Yeah, I don't understand it either, and I have never been convinced
that there's any safe way to do it other than recloning the standbys
from the upgraded master.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [GOSC' 17][Performance report] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-07-31 Thread Mengxing Liu

Hi, all. I wrote a performance report to conclude what I've done so far.
https://docs.google.com/document/d/16A6rfJnQSTkd0SHK-2XzDiLj7aZ5nC189jGPcfVdhMQ/edit?usp=sharing



Three patch are attached. I used the benchmark below to test the performance.
https://github.com/liumx10/pg-bench
It requires golang (>= 1.6) if you want to reproduce the code. 


NOTE:
1. The reason why hash table or skip list didn't improve the performance is that
maintaining the conflict list became slower even though conflict tracking was 
faster.
So far, skip list is the most promising way. But the code is a little tricky.


BTW, if there is a case in which inserting an conflict element is rare but 
conflict checking is common,
my patch may be better. 


2. Reducing contention on global lock has a better performance in this 
evaluation.
But two weeks ago when I used another machine, it has a worse performance. 
It's hard to explain why... 


You can reply directly if you have any questions or comments. 

--

Sincerely


Mengxing Liu








HTAB-for-conflict-tracking.patch
Description: Binary data


skip-list-for-conflict-tracking.patch
Description: Binary data


reduce-contention-on-FinishedListLock.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Update comments in nodeModifyTable.c

On Fri, Jul 28, 2017 at 8:12 AM, Etsuro Fujita
 wrote:
> On 2017/07/26 22:39, Robert Haas wrote:
>> So the first part of the change weakens the assertion that a 'ctid' or
>> 'wholerow' attribute will always be present by saying that an FDW may
>> instead have other attributes sufficient to identify the row.
>
> That's right.
>
>> But
>> then the additional sentence says that there will be a 'wholerow'
>> attribute after all.  So this whole change seems to me to be going
>> around in a circle.
>
> What I mean by the additional one is: if the result table that is a foreign
> table has a row-level UPDATE/DELETE trigger, a 'wholerow' will also be
> present.  So, if the result table didn't have the trigger, there wouldn't be
> 'whole-row', so in that case it could be possible that there would be only
> attributes other than 'ctid' and 'wholerow'.

I think maybe something like this would be clearer, then:

 /*
  * Initialize the junk filter(s) if needed.  INSERT queries need a filter
  * if there are any junk attrs in the tlist.  UPDATE and DELETE always
- * need a filter, since there's always a junk 'ctid' or 'wholerow'
- * attribute present --- no need to look first.
+ * need a filter, since there's always at least one junk attribute present
+ * --- no need to look first.  Typically, this will be a 'ctid'
+ * attribute, but in the case of a foreign data wrapper it might be a
+ * 'wholerow' attribute or some other set of junk attributes sufficient to
+ * identify the remote row.
  *
  * If there are multiple result relations, each one needs its own junk
  * filter.  Note multiple rels are only possible for UPDATE/DELETE, so we

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Persistent wait event sets and socket changes

Hi Craig,

On 2017-07-31 14:08:58 +0800, Craig Ringer wrote:
> Hi all
> 
> I've been looking into the wait event set interface added in 9.6 with an
> eye to using it in an extension that maintains a set of non-blocking libpq
> connections to other PostgreSQL instances.
> 
> In the process I've been surprised to find that there does not appear to be
> any interface to remove a socket once added to the wait event set, or
> replace it with a new socket.
> 
> ModifyWaitEvent(...) doesn't take a pgsocket. There doesn't seem to be a
> way to remove an event from the set either.

Yea, and what's even more annoying, you can't "disable" waiting for
readyness events on a socket, we've an assert that we're waiting for
something.


> Discussion in
> https://www.postgresql.org/message-id/flat/20160114143931.GG10941%40awork2.anarazel.de
> talks about a proposed SetSocketToWaitOn(...)/AddSocketToWaitSet and
> RemoveSocketFromWaitSet etc. But it looks like it petered out and went
> nowhere, apparently mainly due to not being needed by any current core
> users.

I think it just needs somebody to push this forward.


> I'd like to add such interfaces at some point, but for now can work around
> it by destroying and re-creating the wait event set when the fd-set
> changes. So I'm posting mostly to confirm that it's not supposed to work,
> and ask if anyone thinks I should submit a comment patch to latch.c
> documenting it.

It doesn't work, right. I'm not sure I see the point of adding comments
explaining that a nonexistant interface doesn't exist?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-31 Thread Alvaro Herrera

Tom Lane wrote:
> Alvaro Herrera  writes:
> > ...  However, when you create an index, you can
> > indicate which operator class to use, and it may not be the default one.
> > If a different one is chosen at index creation time, then a query using
> > COUNT(distinct) will do the wrong thing, because DISTINCT will select
> > an equality type using the type's default operator class, not the
> > equality that belongs to the operator class used to create the index.
> 
> > That's wrong: DISTINCT should use the equality operator that corresponds
> > to the index' operator class instead, not the default one.
> 
> Uh, what?  Surely the semantics of count(distinct x) *must not* vary
> depending on what indexes happen to be available.

Err ...

> I think what you meant to say is that the planner may only choose an
> optimization of this sort when the index's opclass matches the one
> DISTINCT will use, ie the default for the data type.

Um, yeah, absolutely.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/perl extension fails on Windows

Christoph Berg  writes:
>>> The only interesting line in log/postmaster.log is a log_line_prefix-less
>>> Util.c: loadable library and perl binaries are mismatched (got handshake 
>>> key 0xd500080, needed 0xd600080)

Can we see the Perl-related output from configure, particularly the new
lines about CFLAGS?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 10 beta docs: different replication solutions

2017-07-31 Thread Merlin Moncure

On Sun, Jul 30, 2017 at 8:34 PM, Steve Singer  wrote:
>
> We don't seem to describe logical replication on
>
> https://www.postgresql.org/docs/10/static/different-replication-solutions.html
>
> The attached patch adds a section.

This is a good catch.  Two quick observations:

1) Super pedantic point. I don't like the 'repl.' abbreviation in the
'most common implementation' both for the existing hs/sr and for the
newly added logical.

2) This lingo:
+ Logical replication allows the data changes from individual tables
+ to be replicated. Logical replication doesn't require a particular server
+ to be designated as a master or a slave but allows data to flow
in multiple
+ directions. For more information on logical replication, see
.

Is good, but I would revise it just a bit to emphasize the
subscription nature of logical replication to link the concepts
expressed strongly in the main section.  For example:

Logical replication allows the data changes [remove: "from individual
tables to be replicated"] to be published to subscriber nodes.  Data
can flow in any direction between nodes on a per-table basis; there is
no concept of a master server.  Conflict resolution must be handled
completely by the application.  For more information on...

what do you think?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2017-07-31 Thread Peter Moser


On 06.04.2017 01:24, Andres Freund wrote:

Unfortunately I don't think this patch has received sufficient design
and implementation to consider merging it into v10.  As code freeze is
in two days, I think we'll have to move this to the next commitfest.


We rebased our patch on top of commit 
393d47ed0f5b764341c7733ef60e8442d3e9bdc2

from "Mon Jul 31 11:24:51 2017 +0900".

Best regards,
Anton, Johann, Michael, Peter
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7648201..a373358 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -919,6 +919,12 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_SeqScan:
 			pname = sname = "Seq Scan";
 			break;
+		case T_TemporalAdjustment:
+			if(((TemporalAdjustment *) plan)->temporalCl->temporalType == TEMPORAL_TYPE_ALIGNER)
+pname = sname = "Adjustment(for ALIGN)";
+			else
+pname = sname = "Adjustment(for NORMALIZE)";
+			break;
 		case T_SampleScan:
 			pname = sname = "Sample Scan";
 			break;
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 083b20f..b0d6d15 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -29,6 +29,6 @@ OBJS = execAmi.o execCurrent.o execExpr.o execExprInterp.o \
nodeCtescan.o nodeNamedtuplestorescan.o nodeWorktablescan.o \
nodeGroup.o nodeSubplan.o nodeSubqueryscan.o nodeTidscan.o \
nodeForeignscan.o nodeWindowAgg.o tstoreReceiver.o tqueue.o spi.o \
-   nodeTableFuncscan.o
+   nodeTableFuncscan.o nodeTemporalAdjustment.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 396920c..7dd7474 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -113,6 +113,7 @@
 #include "executor/nodeValuesscan.h"
 #include "executor/nodeWindowAgg.h"
 #include "executor/nodeWorktablescan.h"
+#include "executor/nodeTemporalAdjustment.h"
 #include "nodes/nodeFuncs.h"
 #include "miscadmin.h"
 
@@ -364,6 +365,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
  estate, eflags);
 			break;
 
+		case T_TemporalAdjustment:
+			result = (PlanState *) ExecInitTemporalAdjustment((TemporalAdjustment *) node,
+ estate, eflags);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized node type: %d", (int) nodeTag(node));
 			result = NULL;		/* keep compiler quiet */
@@ -711,6 +717,10 @@ ExecEndNode(PlanState *node)
 			ExecEndLimit((LimitState *) node);
 			break;
 
+		case T_TemporalAdjustmentState:
+			ExecEndTemporalAdjustment((TemporalAdjustmentState *) node);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized node type: %d", (int) nodeTag(node));
 			break;
diff --git a/src/backend/executor/nodeTemporalAdjustment.c b/src/backend/executor/nodeTemporalAdjustment.c
new file mode 100644
index 000..ff2aa85
--- /dev/null
+++ b/src/backend/executor/nodeTemporalAdjustment.c
@@ -0,0 +1,571 @@
+#include "postgres.h"
+#include "executor/executor.h"
+#include "executor/nodeTemporalAdjustment.h"
+#include "utils/memutils.h"
+#include "access/htup_details.h"/* for heap_getattr */
+#include "utils/lsyscache.h"
+#include "nodes/print.h"		/* for print_slot */
+#include "utils/datum.h"		/* for datumCopy */
+#include "utils/rangetypes.h"
+
+/*
+ * #define TEMPORAL_DEBUG
+ * XXX PEMOSER Maybe we should use execdebug.h stuff here?
+ */
+#ifdef TEMPORAL_DEBUG
+static char*
+datumToString(Oid typeinfo, Datum attr)
+{
+	Oid			typoutput;
+	bool		typisvarlena;
+	getTypeOutputInfo(typeinfo, , );
+	return OidOutputFunctionCall(typoutput, attr);
+}
+
+#define TPGdebug(...) 	{ printf(__VA_ARGS__); printf("\n"); fflush(stdout); }
+#define TPGdebugDatum(attr, typeinfo) 	TPGdebug("%s = %s %ld\n", #attr, datumToString(typeinfo, attr), attr)
+#define TPGdebugSlot(slot) { printf("Printing Slot '%s'\n", #slot); print_slot(slot); fflush(stdout); }
+
+#else
+#define datumToString(typeinfo, attr)
+#define TPGdebug(...)
+#define TPGdebugDatum(attr, typeinfo)
+#define TPGdebugSlot(slot)
+#endif
+
+/*
+ * isLessThan
+ *		We must check if the sweepline is before a timepoint, or if a timepoint
+ *		is smaller than another. We initialize the function call info during
+ *		ExecInit phase.
+ */
+static bool
+isLessThan(Datum a, Datum b, TemporalAdjustmentState* node)
+{
+	node->ltFuncCallInfo.arg[0] = a;
+	node->ltFuncCallInfo.arg[1] = b;
+	node->ltFuncCallInfo.argnull[0] = false;
+	node->ltFuncCallInfo.argnull[1] = false;
+
+	/* Return value is never null, due to the pre-defined sub-query output */
+	return DatumGetBool(FunctionCallInvoke(>ltFuncCallInfo));
+}
+
+/*
+ * isEqual
+ *		We must check if two timepoints are equal. We initialize the function
+ *		call info during ExecInit phase.
+ */
+static bool
+isEqual(Datum a, Datum b, TemporalAdjustmentState* node)
+{
+	node->eqFuncCallInfo.arg[0] = a;
+	node->eqFuncCallInfo.arg[1] = b;
+

Re: [HACKERS] Default Partition for Range

2017-07-31 Thread Beena Emerson

On Wed, Jul 26, 2017 at 7:05 PM, Jeevan Ladhe
 wrote:
> Hi Beena,
>
> I have posted the rebased patches[1] for default list partition.
> Your patch also needs a rebase.
>
> [1]
> https://www.postgresql.org/message-id/CAOgcT0OVwDu%2BbeChWb5R5s6rfKLCiWcZT5617hqu7T3GdA1hAw%40mail.gmail.com
>

Thanks for informing.
PFA the updated patch.
I have changed the numbering of enum PartitionRangeDatumKind since I
have to include DEFAULT as well. If you have better ideas, let me
know.



-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


default_range_partition_v8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-07-31 Thread Ashutosh Bapat

On Mon, Jul 31, 2017 at 8:32 AM, Robert Haas  wrote:
> On Fri, Jul 14, 2017 at 3:02 AM, Ashutosh Bapat
>  wrote:
>> Here's revised patch set with only 0004 revised. That patch deals with
>> creating multi-level inheritance hierarchy from multi-level partition
>> hierarchy. The original logic of recursively calling
>> inheritance_planner()'s guts over the inheritance hierarchy required
>> that for every such recursion we flatten many lists created by that
>> code. Recursion also meant that root->append_rel_list is traversed as
>> many times as the number of partitioned partitions in the hierarchy.
>> Instead the revised version keep the iterative shape of
>> inheritance_planner() intact, thus naturally creating flat lists,
>> iterates over root->append_rel_list only once and is still easy to
>> read and maintain.
>
> 0001-0003 look basically OK to me, modulo some cosmetic stuff.  Regarding 
> 0004:
>
> +if (brel->reloptkind != RELOPT_BASEREL &&
> +brte->relkind != RELKIND_PARTITIONED_TABLE)
>
> I spent a lot of time staring at this code before I figured out what
> was going on here.  We're iterating over simple_rel_array, so the
> reloptkind must be RELOPT_OTHER_MEMBER_REL if it isn't RELOPT_BASEREL.
> But does that guarantee that rtekind is RTE_RELATION such that
> brte->relkind will be initialized to a value?  I'm not 100% sure.

Comment in RangeTblEntry says
 952 /*
 953  * Fields valid for a plain relation RTE (else zero):
 954  *
... clipped portion for RTE_NAMEDTUPLESTORE related comment

 960 Oid relid;  /* OID of the relation */
 961 charrelkind;/* relation kind (see pg_class.relkind) */

This means that relkind will be 0 when rtekind != RTE_RELATION. So,
the condition holds. But code creating an RTE somewhere which is not
in sync with this comment would create a problem. So your suggestion
makes sense.

> I
> think it would be clearer to write this test like this:
>
> Assert(IS_SIMPLE_REL(brel));
> if (brel->reloptkind == RELOPT_OTHER_MEMBER_REL &&
> (brte->rtekind != RELOPT_BASEREL ||

Do you mean (brte_>rtekind != RTE_RELATION)?

> brte->relkind != RELKIND_PARTITIONED_TABLE))
> continue;
>
> Note that the way you wrote the comment is says if it *is* another
> REL, not if it's *not* a baserel; it's good if those kinds of little
> details match between the code and the comments.

I find the existing comment and code in this part of the function
differ. The comment just above the loop on simple_rel_array[], talks
about changing something in the child, but the very next line skips
child relations and later a loop on append_rel_list makes changes to
appropriate children. I guess, it's done that way to keep the code
working even after we introduce some RELOPTKIND other than BASEREL or
OTHER_MEMBER_REL for a simple rel. But your suggestion makes more
sense. Changed it according to your suggestion.

>
> It is not clear to me what the motivation is for the API changes in
> expanded_inherited_rtentry.  They don't appear to be necessary.

expand_inherited_rtentry() creates AppendRelInfos for all the children
of a given parent and collects them in a list. The list is appended to
root->append_rel_list at the end of the function. Now that function
needs to do this recursively. This means that for a partitioned
partition table its children's AppendRelInfos will be added to
root->append_rel_list before AppendRelInfo of that partitioned
partition table. inheritance_planner() assumes that the parent's
AppendRelInfo comes before its children in append_rel_list.This
assumption allows it to be simplified greately, retaining its
iterative form. My earlier patches had recursive version of
inheritance_planner(), which is complex. I have comments in this patch
explaining this.

Adding AppendRelInfos to root->append_rel_list as and when they are
created would keep parent AppendRelInfos before those of children. But
that function throws away the AppendRelInfo it created when their are
no real children i.e. in partitioned table's case when has no leaf
partitions. So, we can't do that. Hence, I chose to change the API to
return the list of AppendRelInfos when the given RTE has real
children.

> If
> they are necessary, you need to do a more thorough job updating the
> comments.  This one, in particular:
>
>   *  If so, add entries for all the child tables to the query's
>   *  rangetable, and build AppendRelInfo nodes for all the child tables
>   *  and add them to root->append_rel_list.  If not, clear the entry's

Done.

>
> And the comments could maybe say something like "We return the list of
> appinfos rather than directly appending it to append_rel_list because
> $REASON."

Done. Please check the attached version.

>
> - * is a partitioned table.
> + * RTE simply duplicates the parent *partitioned* table.
>   */
> -if (childrte->relkind

Re: [HACKERS] PostgreSQL not setting OpenSSL session id context?

2017-07-31 Thread Heikki Linnakangas


On 07/31/2017 02:24 AM, Shay Rojansky wrote:

Just to continue the above, I can confirm that adding a simple call
to SSL_CTX_set_session_id_context() to be_tls_init() with some arbitrary
const value fixes the error for me. Attached is a patch (ideally a test
should be done for this, but that's beyond what I can invest at the moment,
let me know if it's absolutely necessary).


I agree with Tom that we don't really want abbreviated SSL handshakes, 
or other similar optimizations, to take place. PostgreSQL connections 
are quite long-lived, so we have little to gain. But it makes the attack 
surface larger. There have been vulnerabilities related to SSL 
renegotiation, resumption, abbreviated handshakes, and all that.


I think we should actually call SSL_CTX_set_session_cache_mode(ctx, 
SSL_SESS_CACHE_OFF), to disable session caching altogether. I'm not sure 
if we still need to call SSL_CTX_set_session_cache_mode() if we do that.


I know next-to-nothing about .Net; is there some easy way to download a 
.Net client application and test this?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Minor comment update in partition.c

2017-07-31 Thread Beena Emerson

The commit d363d42bb9a4399a0207bd3b371c966e22e06bd3 changed
RangeDatumContent *content to PartitionRangeDatumKind *kind but a
comment on function partition_rbound_cmp was left unedited and it
still mentions content1 instead of kind1.

PFA the patch to fix this.

--

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


fix_comment.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL - Weak DH group

2017-07-31 Thread Heikki Linnakangas


On 07/13/2017 11:07 PM, Heikki Linnakangas wrote:

On 07/13/2017 10:13 PM, Robert Haas wrote:

On Thu, Jul 13, 2017 at 1:30 PM, Tom Lane  wrote:

Heikki Linnakangas  writes:

I don't think this can be backpatched. It changes the default DH
parameters from 1024 bits to 2048 bits. That's a good thing for
security, but older clients might not support it, and would refuse to
connect or would fall back to something less secure.


Do we have any hard information about which versions of which clients
might not support that?  (In particular I'm wondering if any still exist
in the wild.)


Yeah.  If we break clients for v10 two months from release, some
drivers won't be updated by release time, and that sounds pretty
unfriendly to me.  On the other hand, if there is only a theoretical
risk of breakage and no clients that we actually know about will have
a problem with it, then the argument for waiting is weaker.  I'm not
generally very excited about changing things after beta2, which is
where are, but if this is a security issue then we might need to hold
our nose and go ahead.  I'm against it if it's likely to cause
real-world connectivity problems, though.


Googling around, I believe Java 6 is the only straggler [1]. So we would
be breaking that. Java 7 also doesn't support DH parameters > 1024 bits,
but it supports ECDHE, which is prioritized over DH ciphers, so it
doesn't matter.

Java 6 was released back in 2006. The last public release was in 2013.
It wouldn't surprise me to still see it bundled with random proprietary
software packages, though. The official PostgreSQL JDBC driver still
supports it, but there has been discussion recently on dropping support
for it, and even for Java 7. [2]

I would be OK with breaking DH with Java 6 in PostgreSQL 10, especially
since there's a simple workaround (generate a 1024-bit DH parameters
file). I would be less enthusiastic about doing that in a minor release,
although maybe that wouldn't be too bad either, if we put a prominent
notice with the workaround in the release notes.


Some more information on the situation with JDK version 6: I installed 
Debian wheezy on a VM, with a OpenJDK 6, and tested connecting to a 
patched server with the JDBC driver. It worked! Googling around, it 
seems that this was fixed in later versions of OpenJDK 6 
(https://bugs.openjdk.java.net/browse/JDK-8062834). I then downloaded 
the latest Oracle JRE binary version, 6u45, available from 
http://www.oracle.com/technetwork/java/javase/downloads/java-archive-downloads-javase6-419409.html. 
With that, it does not work, you get errors like:


org.postgresql.util.PSQLException: SSL error: 
java.lang.RuntimeException: Could not generate DH keypair

...
Caused by: java.security.InvalidAlgorithmParameterException: Prime size 
must be multiple of 64, and can only range from 512 to 1024 (inclusive)


So, the last binary version downloadable from Oracle is affected, but 
recent versions of OpenJDK 6 work.


Rebased patch attached, with proposed release notes included. Barring 
new objections or arguments, I'll commit this (only) to v10 later today.


- Heikki
>From 93ef6ce1c203028384cf9febf3b4add715fff26f Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Mon, 31 Jul 2017 13:39:01 +0300
Subject: [PATCH 1/2] Always use 2048 bit DH parameters for OpenSSL ephemeral
 DH ciphers.

1024 bits is considered weak these days, but OpenSSL always passes 1024 as
the key length to the tmp_dh callback. All the code to handle other key
lengths is, in fact, dead.

To remedy those issues:

* Only include hard-coded 2048-bit parameters.
* Set the parameters directly with SSL_CTX_set_tmp_dh(), without the
  callback
* The name of the file containing the DH parameters is now a GUC. This
  replaces the old hardcoded "dh1024.pem" filename. (The files for other
  key lengths, dh512.pem, dh2048.pem, etc. were never actually used.)

Per report by Nicolas Guini and Damian Quiroga. Reviewed by Michael Paquier.

Discussion: https://www.postgresql.org/message-id/camxbouyjooautvozn6ofzym828anrdjuccotccquxjws-l2...@mail.gmail.com
---
 doc/src/sgml/config.sgml  |  24 +++
 src/backend/libpq/be-secure-openssl.c | 264 +-
 src/backend/libpq/be-secure.c |   1 +
 src/backend/utils/misc/guc.c  |  11 ++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/libpq/libpq.h |   1 +
 6 files changed, 133 insertions(+), 169 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b45b7f7f69..c33d6a0349 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1203,6 +1203,30 @@ include_dir 'conf.d'
   
  
 
+ 
+  ssl_dh_params_file (string)
+  
+   ssl_dh_params_file configuration parameter
+  
+  
+  
+   
+Specifies the name of the file containing Diffie-Hellman

Re: [HACKERS] [PATCH v3] pg_progress() SQL function to monitor progression of long running SQL queries/utilities

2017-07-31 Thread Pavel Stehule

2017-07-31 11:09 GMT+02:00 Remi Colinet :

>
>
> 2017-07-26 15:27 GMT+02:00 Robert Haas :
>
>> On Wed, Jun 21, 2017 at 10:01 AM, Remi Colinet 
>> wrote:
>> > test=# SELECT  pid, ppid, bid, concat(repeat(' ', 3 * indent),name),
>> value,
>> > unit FROM pg_progress(0,0);
>> >   pid  | ppid | bid |  concat  |  value   |  unit
>> > ---+--+-+--+--+-
>> >  14106 |0 |   4 | status   | query running|
>> >  14106 |0 |   4 | relationship | progression  |
>> >  14106 |0 |   4 |node name | Sort |
>> >  14106 |0 |   4 |sort status   | on tapes writing |
>> >  14106 |0 |   4 |completion| 0| percent
>> >  14106 |0 |   4 |relationship  | Outer|
>> >  14106 |0 |   4 |   node name  | Seq Scan |
>> >  14106 |0 |   4 |   scan on| t_10m|
>> >  14106 |0 |   4 |   fetched| 25049| block
>> >  14106 |0 |   4 |   total  | 83334| block
>> >  14106 |0 |   4 |   completion | 30   | percent
>> > (11 rows)
>> >
>> > test=#
>>
>> Somehow I imagined that the output would look more like what EXPLAIN
>> produces.
>>
>
>
> I had initially used the same output as for the ANALYZE command:
>
> test=# PROGRESS 14611;
>   PLAN PROGRESS
>
> 
> -
>  Gather Merge
>->  Sort=> dumping tuples to tapes
>  rows r/w merge 0/0 rows r/w effective 0/1464520 0%
>  Sort Key: md5
>  ->  Parallel Seq Scan on t_10m => rows 1464520/4166700 35% blks
> 36011/83334 43%
> (5 rows)
>
> test=#
>
> But this restricts the use to "human consumers". Using a table output with
> name/value pairs, allows the use by utilities for instance, without
> parsing. This is less handy for administrators, but far better for 3rd
> party utilities. One solution is otherwise to create a PL/SQL command on
> top of pg_progress() SQL function to produce an output similar to the one
> of the ANALYZE command.
>

you can support XML, JSON output format like EXPLAIN does.

 https://www.postgresql.org/docs/current/static/sql-explain.html

Regards

pavel

>
>
>> > If the one shared memory page is not enough for the whole progress
>> report,
>> > the progress report transfert between the 2 backends is done with a
>> series
>> > of request/response. Before setting the latch, the monitored backend
>> write
>> > the size of the data dumped in shared memory and set a status to
>> indicate
>> > that more data is to be sent through the shared memory page. The
>> monitoring
>> > backends get the result and sends an other signal, and then wait for the
>> > latch again. The monitored backend does not collect a new progress
>> report
>> > but continues to dump the already collected report. And the exchange
>> goes on
>> > until the full progress report has been dumped.
>>
>> This is basically what shm_mq does.  We probably don't want to
>> reinvent that code, as it has taken a surprising amount of debugging
>> to get it fully working.
>>
>
> Yes, I had once considered this solution but then moved away as I was
> unsure of the exact need for the transfert of the progress report between
> the monitored and the monitoring backends.
> I'am going to switch to shm_mq.
>
> Thx & Rgds
>
>
>
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>
>

Re: [HACKERS] map_partition_varattnos() and whole-row vars