date:20180208

Is there a cache consistent interface to tables ?

2018-02-08 Thread Garym

Hi, 
This is an odd request for help. I'm looking to expose an interface so an 
external app can insert to a table while maintaining cache consistency and 
inserts be promoted via wal. 

I need to support about 100k+ inserts/sec from a sensor data stream. It simply 
won't work using sql queries.  If the call overhead is too high for single 
calls, multiple records per call is better. The data must be available for 
selects in 500ms.  I current only have 24gb ram for pg, but production will be 
56gb. 

I'm taking this approach because pgpool2 chokes, delaying past requirements. I 
initially wanted to use wal, but masters don't want wal in feeds and slaves 
have unpredictable delays of seconds before provisioning occurs. 

Suggestions are appreciated. 

Gary



Sent from my iPad

Re: Using scalar function as set-returning: bug or feature?

2018-02-08 Thread Sergei Kornilov

Hello

> select into b from my_insert('from func atx');
You missed select something into b. For example,
select ret into b from my_insert('from func atx') as ret;

Using scalar function in from is not bug.
Silent assigning NULL for variables in "into" not matches same in "select"... I 
think better would be raise warning.

regards, Sergei

Re: Temporary tables prevent autovacuum, leading to XID wraparound

2018-02-08 Thread Masahiko Sawada

On Sat, Feb 3, 2018 at 1:48 AM, Robert Haas  wrote:
> On Fri, Feb 2, 2018 at 1:27 AM, Masahiko Sawada  wrote:
>> Thank you for suggestion. It sounds more smarter. So it would be more
>> better if we vacuums database for anti-wraparound in ascending order
>> of relfrozenxid?
>
> Currently, we're doing it based on datfrozenxid.  I was looking for a
> small, relatively safe change to improve things, which led to my
> proposal: continue using datfrozenxid when the database isn't being
> vacuumed, but when it is, substitute the datfrozenxid that we expect
> the database *will have* after the tables currently being vacuumed are
> finished for the actual datfrozenxid.
>
> On further reflection, though, I see problems.  Suppose db1 has
> age(datfrozenxid) = 600m and two tables with age(relfrozenxid) of 600m
> and 400m.  db2 has age(datfrozenxid) = 500m.  Then suppose #1 starts
> vacuuming the table with age 600m.  The AV launcher sees that, with
> that table out of the way, we'll be able to bring datfrozenxid forward
> to 400m and so sends worker #2 to db2.  But actually, when the worker
> in db1 finishes vacuuming the table with age 600m, it's going to have
> to vacuum the table with age 400m before performing any relfrozenxid
> update.  So actually, the table with age 400m is holding the
> relfrozenxid at 600m just as much as if it too had a relfrozenxid of
> 600m.  In fact, any tables in the same database that need routine
> vacuuming are a problem, too: we're going to vacuum all of those
> before updating relfrozenxid, too.
>
> Maybe we should start by making the scheduling algorithm used by the
> individual workers smarter:
>
> 1. Let's teach do_autovacuum() that, when there are any tables needing
> vacuum for wraparound, either for relfrozenxid or relminmxid, it
> should vacuum only those and forget about the rest.  This seems like
> an obvious optimization to prevent us from delaying
> datfrozenxid/datminmxid updates for the sake of vacuuming tables that
> are "merely" bloated.
>
> 2. Let's have do_autovacuum() sort the tables to be vacuumed in
> ascending order by relfrozenxid, so older tables are vacuumed first.
> A zero-order idea is to put tables needing relfrozenxid vacuuming
> before tables needing relminmxid vacuuming, and sort the latter by
> ascending relminmxid, but maybe there's something smarter possible
> there.  The idea here is to vacuum tables in order of priority rather
> than in whatever order they happen to be physically mentioned in
> pg_class.
>
> 3. Let's have do_autovacuum() make an extra call to
> vac_update_datfrozenxid() whenever the next table to be vacuumed is at
> least 10 million XIDs (or MXIDs) newer than the first one it vacuumed
> either since the last call to vac_update_datfrozenxid() or, if none,
> since it started.  That way, we don't have to wait for every single
> table to get vacuumed before we can consider advancing
> relfrozenxid/relminmxid.
>
> 4. When it's performing vacuuming for wraparound, let's have AV
> workers advertise in shared memory the oldest relfrozenxid and
> relminmxid that it might exist in the database.  Given #1 and #2, this
> is pretty easy, since we start by moving through tables in increasing
> relfrozenxid order and then shift to moving through them in increasing
> relminmxid order.  When we're working through the relfrozenxid tables,
> the oldest relminmxid doesn't move, and the oldest relfrozenxid is
> that of the next table in the list.  When we're working through the
> relminmxid tables, it's the reverse.  We need a little cleverness to
> figure out what value to advertise when we're on the last table in
> each list -- it should be the next-higher value, even though that will
> be above the relevant threshold, not a sentinel value.

So I think we should includes tables as well that are not at risk of
wraparound in order to get the next-higher value (that is, the oldest
table of the non-risked tables) instead of forgetting them. And then
we skip vacuuming them.

>
> 5. With those steps in place, I think we can now adopt my previous
> idea to have the AV launcher use any advertised relfrozenxid (and, as
> I now realize, relminmxid) instead of the ones found in pg_database,
> because now we know that individual workers are definitely focused on
> getting relfrozenxid (and/or relminmxid) as soon as possible, and
> vacuuming unrelated tables won't help them do it any faster.
>
> This gets us fairly close to vacuuming tables in decreasing order of
> wraparound danger across the entire cluster.  It's not perfect.  It
> prefers to keep vacuuming tables in the same database rather than
> having a worker exit and maybe launching a new one in a different
> database -- but the alternative is not very appealing.  If we didn't
> do it that way, and if we had a million tables with XIDs that were
> closely spaced spread across different databases, we'd have to
> terminate and relaunching workers at a very high rate to get
> everything sorted out, which wou

Using scalar function as set-returning: bug or feature?

2018-02-08 Thread Konstantin Knizhnik


Hi hackers,

I wonder if the following behavior is considered to be a bug in plpgsql 
or it is expected result:


create table my_data(id serial primary key, time timestamp, type text);

create or replace function my_insert(type text) RETURNS BOOLEAN
AS
$BODY$
BEGIN
    insert into my_data (time, type) values (now(), type);
    return true;
END
$BODY$
LANGUAGE plpgsql;

create or replace function my_call_insert() RETURNS BOOLEAN
AS
$BODY$
DECLARE
  b boolean;
BEGIN
    select into b from my_insert('from func atx');
    return b;
END
$BODY$
LANGUAGE plpgsql;

select my_call_insert();
 my_call_insert


(1 row)



So, if function returning boolean is used in from list, then no error or 
warning is produced, but null value is assigned to the target variable.

If this code is rewritten in more natural way:

    b := my_insert('from func atx');
or
   select my_insert('from func atx') into b;

then function returns expected result (true).
I spent some time investigating this code under debugger and looks like 
the reason of the problem is that tuple descriptor for the row returned 
by my_insert() contains no columns (number of attributes is zero). May 
be there are some reasons for such behavior, but I find it quite 
confusing and unnatural. I prefer to report error in this case or return 
tuple with single column, containing return value of the function.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] path toward faster partition pruning

2018-02-08 Thread Amit Langote

Hi Ashutosh.

On 2018/02/09 14:09, Ashutosh Bapat wrote:
> On Wed, Feb 7, 2018 at 7:17 PM, Robert Haas  wrote:
>> On Wed, Feb 7, 2018 at 8:37 AM, Ashutosh Bapat
>>  wrote:
>>> While looking at the changes in partition.c I happened to look at the
>>> changes in try_partition_wise_join(). They mark partitions deemed
>>> dummy by pruning as dummy relations. If we accept those changes, we
>>> could very well change the way we handle dummy partitioned tables,
>>> which would mean that we also revert the recent commit
>>> f069c91a5793ff6b7884120de748b2005ee7756f. But I guess, those changes
>>> haven't been reviewed yet and so not final.
>>
>> Well, if you have an opinion on those proposed changes, I'd like to hear it.
> 
> I am talking about changes after this comment
>  /*
> + * If either child_rel1 or child_rel2 is not a live partition, they'd
> + * not have been touched by set_append_rel_size.  So, its RelOptInfo
> + * would be missing some information that set_append_rel_size sets 
> for
> + * live partitions, such as the target list, child EQ members, etc.
> + * We need to make the RelOptInfo of even the dead partitions look
> + * minimally valid and as having a valid dummy path attached to it.
> + */
> 
> There are couple of problems with this change
> 1. An N way join may call try_partition_wise_join() with the same base
> relation on one side N times. The condition will be tried those many
> times.
> 
> 2. We will have to adjust or make similar changes in
> try_partition_wise_aggregate() proposed in the partition-wise
> aggregate patch. Right now it checks if the relation is dummy but it
> will have to check whether the pathlist is also NULL. Any
> partition-wise operation that we try in future will need this
> adjustment.
> 
> AFAIU, for pruned partitions, we don't set necessary properties of the
> corresponding RelOptInfo when it is pruned. If we were sure that we
> will not use that RelOptInfo anywhere in the rest of the planning,
> this would work. But that's not the case. AFAIU, current planner
> assumes that a relation which has not been eliminated before planning
> (DEAD relation), but later proved to not contribute any rows in the
> result, is marked dummy. Partition pruning breaks that assumption and
> thus may have other side effects, that we do not see right now. We
> have similar problem with dummy partitioned tables, but we have code
> in place to avoid looking at the pathlists of their children by not
> considering such a partitioned table as partitioned. May be we want to
> change that too.
> 
> Either we add refactoring patches to change the planner so that it
> doesn't assume something like that OR we make sure that the pruned
> partition's RelOptInfo have necessary properties and a dummy pathlist
> set. I will vote for second. We spend CPU cycles marking pruned
> partitions as dummy if the dummy pathlist is never used. May be we can
> avoid setting dummy pathlist if we can detect that a pruned partition
> is guaranteed not to be used, e.g when the corresponding partitioned
> relation does not participate in any join or other upper planning.

Thanks for the analysis.  I agree with all the points of concern.  so for
now, I have dropped all the changes from my patch that give rise to the
concerns.  With the new patch, changes to the existing optimizer code
beside introducing partprune.c in the util directory are pretty thin:

git diff master --stat src/backend/optimizer/
 src/backend/optimizer/path/allpaths.c  |   16 ++
 src/backend/optimizer/util/Makefile|2 +-
 src/backend/optimizer/util/clauses.c   |4 +-
 src/backend/optimizer/util/partprune.c | 1421 +++
 src/backend/optimizer/util/plancat.c   |   83 ---
 src/backend/optimizer/util/relnode.c   |8 +
 6 files changed, 1504 insertions(+), 30 deletions(-)

So, no refactoring the existing optimizer code, just replacing the
partition pruning mechanism with partprune.c functions.

> Apart from that another change that caught my eye is
> 
> Instead of going through root->append_rel_list to pick up the child
> appinfos, store them in an array called part_appinfos that stores
> partition appinfos in the same order as RelOptInfos are stored in
> part_rels, right when the latter are created.
> 
> Further, instead of going through root->pcinfo_list to get the list
> of partitioned child rels, which ends up including even the rels
> that are pruned by set_append_rel_size(), build up a list of "live"
> partitioned child rels and use the same to initialize partitioned_rels
> field of AppendPath.
> 
> That was voted down by Robert during partition-wise join
> implementation. And I agree with him. Any changes around changing that
> should change the way we handle AppendRelInfos for all relations, not
> just (declarative) partitioned relations.

I removed part_appinfos from the patch.  Also, I have made the changes
introducing live_partitioned_rels a separate patch,

Re: CALL stmt, ERROR: unrecognized node type: 113 bug

2018-02-08 Thread Michael Paquier

On Fri, Feb 02, 2018 at 04:07:28PM +0900, Michael Paquier wrote:
> You need to read that as "only a SubPlan can be executed after a SubLink
> has been processed by the planner", so please replace the last "latter"
> by "planner".

(I forgot to add Peter and Andrew in CC: previously, so done now.)

e4128ee7 is making is clear that SubLink are authorized when
transforming it in transformSubLink(), however I cannot think about a
use case so should we just forbid them, and this is actually untested.
So the patch attached does so.

The second problem involves a cache lookup failure for a type when
trying to use pg_get_functiondef on a procedure.  Luckily, it is
possible to make the difference between a procedure and a function by
checking if prorettype is InvalidOid or not.  There is room for a new
patch which supports pg_get_proceduredef() to generate the definition of
a procedure, with perhaps a dedicated psql shortcut, but that could
always be done later on.
--
Michael
From a33e015fa06c118c7430506ca2c42c1146deb439 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 9 Feb 2018 15:49:37 +0900
Subject: [PATCH] Fix minor issues with procedure calls

Procedures invoked with subqueries as arguments fail to treat them
correctly as those are generated as SubLink nodes which need to be
processed through the planner first to be correctly executed.  The CALL
infrastructure lacks the logic, and actually it may not make much sense
to support such cases as any application can use a proper SELECT query
to do the same, so block them.

A second problem is related to the use of pg_get_functiondef which
complains about a cache lookup failure when called on a procedure.  It
is possible to make the difference between a procedure and a function by
looking at prorettype, so block properly the case where this function is
called on a procedure.  There is room for support of a system function
which generates definitions for procedures, and which could share much
with pg_get_functiondef, but this is left as future work.

Bug report by Pavel Stehule, patch by Michael Paquier.

Discussion: https://postgr.es/m/CAFj8pRDxOwPPzpA8i+AQeDQFj7bhVw-dR2==rfwz3zmgkm5...@mail.gmail.com
---
 src/backend/parser/parse_expr.c| 4 +++-
 src/backend/utils/adt/ruleutils.c  | 5 +
 src/test/regress/expected/create_procedure.out | 8 
 src/test/regress/sql/create_procedure.sql  | 8 
 4 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index d45926f27f..031f1b72fb 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -1818,7 +1818,6 @@ transformSubLink(ParseState *pstate, SubLink *sublink)
 		case EXPR_KIND_RETURNING:
 		case EXPR_KIND_VALUES:
 		case EXPR_KIND_VALUES_SINGLE:
-		case EXPR_KIND_CALL:
 			/* okay */
 			break;
 		case EXPR_KIND_CHECK_CONSTRAINT:
@@ -1847,6 +1846,9 @@ transformSubLink(ParseState *pstate, SubLink *sublink)
 		case EXPR_KIND_PARTITION_EXPRESSION:
 			err = _("cannot use subquery in partition key expression");
 			break;
+		case EXPR_KIND_CALL:
+			err = _("cannot use subquery in CALL arguments");
+			break;
 
 			/*
 			 * There is intentionally no default: case here, so that the
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 28767a129a..dc3d3c7752 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -2472,6 +2472,11 @@ pg_get_functiondef(PG_FUNCTION_ARGS)
 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
  errmsg("\"%s\" is an aggregate function", name)));
 
+	if (!OidIsValid(proc->prorettype))
+		ereport(ERROR,
+(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is a procedure", name)));
+
 	/*
 	 * We always qualify the function name, to ensure the right function gets
 	 * replaced.
diff --git a/src/test/regress/expected/create_procedure.out b/src/test/regress/expected/create_procedure.out
index ccad5c87d5..41e0921b33 100644
--- a/src/test/regress/expected/create_procedure.out
+++ b/src/test/regress/expected/create_procedure.out
@@ -94,6 +94,14 @@ ALTER ROUTINE testfunc1a RENAME TO testfunc1;
 ALTER ROUTINE ptest1(text) RENAME TO ptest1a;
 ALTER ROUTINE ptest1a RENAME TO ptest1;
 DROP ROUTINE testfunc1(int);
+-- subqueries
+CALL ptest2((SELECT 5));
+ERROR:  cannot use subquery in CALL arguments
+LINE 1: CALL ptest2((SELECT 5));
+^
+-- Function definition
+SELECT pg_get_functiondef('ptest2'::regproc);
+ERROR:  "ptest2" is a procedure
 -- cleanup
 DROP PROCEDURE ptest1;
 DROP PROCEDURE ptest2;
diff --git a/src/test/regress/sql/create_procedure.sql b/src/test/regress/sql/create_procedure.sql
index 8c47b7e9ef..a140fef928 100644
--- a/src/test/regress/sql/create_procedure.sql
+++ b/src/test/regress/sql/create_procedure.sql
@@ -72,6 +72,14 @@ ALTER ROUTINE ptest1a RENAME TO ptest1;
 DROP ROUTINE testfunc1(int);
 
 
+-- subqueries
+CALL ptest2((SELECT 5));
+
+
+-- Fu

Re: Creation of wiki page for open items of v11

2018-02-08 Thread Ashutosh Bapat

On Fri, Feb 9, 2018 at 9:14 AM, Michael Paquier  wrote:
> Hi all,
>
> I begin seeing bugs related to new features of v11 popping up around,
> like this one for procedures:
> https://www.postgresql.org/message-id/CAFj8pRDxOwPPzpA8i%2BAQeDQFj7bhVw-dR2%3D%3DrfWZ3zMGkm568Q%40mail.gmail.com
>
> Are there any objections in creating a wiki page to track all those bugs
> and issues?  We don't want to lose track of all that.

+1.



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-02-08 Thread Ashutosh Bapat

On Fri, Feb 9, 2018 at 11:26 AM, Amit Langote
 wrote:
> On 2018/02/09 14:31, Ashutosh Bapat wrote:
>>> I also noticed that a later patch adds partsupfunc to PartitionScheme,
>>> which the pruning patch needs too.  So, perhaps would be nice to take out
>>> that portion of the patch.  That is, the changes to PartitionScheme struct
>>> definition and those to find_partition_scheme().
>>
>> I am not sure whether a patch with just that change and without any
>> changes to use that member will be acceptable. So leaving this aside.
>
> I asked, because with everything that I have now changed in the partition
> pruning patch, one would need to pass these FmgrInfo pointers down to
> partition bound searching functions from the optimizer.  If the changes to
> add partsupfunc to the optimizer were taken out from your main patch, the
> pruning patch could just start using it.  For now, I'm making those
> changes part of the pruning patch.

That's fine. Someone's patch will be committed first and the other
will just take out those changes. But I am open to separate those
changes into other patch if a committer feels so.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Proposal: partition pruning by secondary attributes

2018-02-08 Thread Ashutosh Bapat

On Thu, Feb 8, 2018 at 4:51 PM, Ildar Musin  wrote:
>
> The idea is to store min and max values of secondary attributes (like
> 'id' in the example above) for each partition somewhere in catalog and
> use it for partition pruning along with partitioning key.

Every insertion and update of secondary attribute will touch catalog
and update if required. That will increase the contention on catalog.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-02-08 Thread Amit Langote

On 2018/02/09 14:31, Ashutosh Bapat wrote:
>> I also noticed that a later patch adds partsupfunc to PartitionScheme,
>> which the pruning patch needs too.  So, perhaps would be nice to take out
>> that portion of the patch.  That is, the changes to PartitionScheme struct
>> definition and those to find_partition_scheme().
> 
> I am not sure whether a patch with just that change and without any
> changes to use that member will be acceptable. So leaving this aside.

I asked, because with everything that I have now changed in the partition
pruning patch, one would need to pass these FmgrInfo pointers down to
partition bound searching functions from the optimizer.  If the changes to
add partsupfunc to the optimizer were taken out from your main patch, the
pruning patch could just start using it.  For now, I'm making those
changes part of the pruning patch.

Thanks,
Amit

Re: non-bulk inserts and tuple routing

2018-02-08 Thread Amit Langote

Fujita-san,

Thanks a lot for the review.

I had mistakenly tagged these patches v24, but they were actually supposed
to be v5.  So the attached updated patch is tagged v6.

On 2018/02/07 19:36, Etsuro Fujita wrote:
>> (2018/02/05 14:34), Amit Langote wrote:
>>> The code in tupconv_map_for_subplan() currently assumes that it can rely
>>> on all leaf partitions having been initialized.
>
> On reflection I noticed this analysis is not 100% correct; I think what
> that function actually assumes is that all sublplans' partitions have
> already been initialized, not all leaf partitions.

Yes, you're right.

>>> Since we're breaking that
>>> assumption with this proposal, that needed to be changed. So the patch
>>> contained some refactoring to make it not rely on that assumption.
>
> I don't think we really need this refactoring because since that as in
> another patch you posted, we could initialize all subplans' partitions in
> ExecSetupPartitionTupleRouting, I think tupconv_map_for_subplan could be
> called without any changes to that function because of what I said above.

What my previous approach failed to consider is that in the update case,
we'd already have ResultRelInfo's for some of the leaf partitions
initialized, which could be saved into proute->partitions array right away.

Updated patch does things that way, so all the changes I had proposed to
tupconv_map_for_subplan are rendered unnecessary.

> Here are comments for the other patch (patch
> v24-0002-During-tuple-routing-initialize-per-partition-ob.patch):
>
> o On changes to ExecSetupPartitionTupleRouting:
>
> * The comment below wouldn't be correct; no ExecInitResultRelInfo in the
> patch.
>
> -   proute->partitions = (ResultRelInfo **) palloc(proute->num_partitions *
> -  sizeof(ResultRelInfo *));
> +   /*
> +* Actual ResultRelInfo's and TupleConversionMap's are allocated in
> +* ExecInitResultRelInfo().
> +*/
> +   proute->partitions = (ResultRelInfo **) palloc0(proute->num_partitions *
> +   sizeof(ResultRelInfo
*));

I removed the comment altogether, as the comments elsewhere make the point
clear.

> * The patch removes this from the initialization step for a subplan's
> partition, but I think it would be better to keep this here because I
> think it's a good thing to put the initialization stuff together into one
> place.
>
> -   /*
> -* This is required in order to we convert the partition's
> -* tuple to be compatible with the root partitioned table's
> -* tuple descriptor.  When generating the per-subplan result
> -* rels, this was not set.
> -*/
> -   leaf_part_rri->ri_PartitionRoot = rel;

It wasn't needed here with the previous approach, because we didn't touch
any ResultRelInfo's in ExecSetupPartitionTupleRouting, but I've added it
back in the updated patch.

> * I think it would be better to keep this comment here.
>
> -   /* Remember the subplan offset for this ResultRelInfo */

Fixed.

> * Why is this removed from that initialization?
>
> -   proute->partitions[i] = leaf_part_rri;

Because of the old approach.  Now it's back in.

> o On changes to ExecInitPartitionInfo:
>
> * I don't understand the step starting from this, but I'm wondering if
> that step can be removed by keeping the above setup of proute->partitions
> for the subplan's partition in ExecSetupPartitionTupleRouting.
>
> +   /*
> +* If we are doing tuple routing for update, try to reuse the
> +* per-subplan resultrel for this partition that ExecInitModifyTable()
> +* might already have created.
> +*/
> +   if (mtstate && mtstate->operation == CMD_UPDATE)

Done, as mentioned above.

On 2018/02/08 19:16, Etsuro Fujita wrote:
> Here are some minor comments:
> 
> o On changes to ExecInsert
> 
> * This might be just my taste, but I think it would be better to (1)
> change ExecInitPartitionInfo so that it creates and returns a
> newly-initialized ResultRelInfo for an initialized partition, and (2)
> rewrite this bit:
> 
> +   /* Initialize partition info, if not done already. */
> +   ExecInitPartitionInfo(mtstate, resultRelInfo, proute, estate,
> + leaf_part_index);
> +
>     /*
>  * Save the old ResultRelInfo and switch to the one corresponding to
>  * the selected partition.
>  */
>     saved_resultRelInfo = resultRelInfo;
>     resultRelInfo = proute->partitions[leaf_part_index];
> +   Assert(resultRelInfo != NULL);
> 
> to something like this (I would say the same thing to the copy.c changes):
> 
>     /*
>  * Save the old ResultRelInfo and switch to the one corresponding to
>  * the selected partition.
>  */
>     saved_resultRelInfo = resultRelInfo;
>     resultRelInfo = proute->partitions[leaf_part_index];
>     if (resultRelInfo == NU

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-02-08 Thread Ashutosh Bapat

On Thu, Feb 8, 2018 at 10:41 AM, Amit Langote
 wrote:
> On 2018/02/08 11:55, Amit Langote wrote:
>> Hi Ashutosh.
>>
>> On 2018/02/07 13:51, Ashutosh Bapat wrote:
>>> Here's a new patchset with following changes
>>>
>>> 1. Rebased on the latest head taking care of partition bound
>>> comparison function changes
>>
>> I was about to make these changes myself while revising the fast pruning
>> patch.  Instead, I decided to take a look at your patch and try to use it
>> in my tree.
>
> I also noticed that a later patch adds partsupfunc to PartitionScheme,
> which the pruning patch needs too.  So, perhaps would be nice to take out
> that portion of the patch.  That is, the changes to PartitionScheme struct
> definition and those to find_partition_scheme().

I am not sure whether a patch with just that change and without any
changes to use that member will be acceptable. So leaving this aside.

>
> Regarding the latter, wouldn't be nice to have a comment before the code
> that does the copying about why we don't compare the partsupfunc field to
> decide if we have a match or not.  I understand it's because the
> partsupfunc array contains pointers, not OIDs.  But maybe, that's too
> obvious to warrant a comment.

It's because partsupfuncs should point to the information of the same
function when partopfamily matches and partopcintype matches. I would
have added an assertion for that with a comment, but with the pointer
that would be risky. Or we can just assert that the oids match.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: ALTER TABLE ADD COLUMN fast default

2018-02-08 Thread Andrew Dunstan

On Mon, Feb 5, 2018 at 7:49 AM, Andrew Dunstan
 wrote:
> On Mon, Feb 5, 2018 at 7:19 AM, Thomas Munro
>  wrote:
>> On Fri, Jan 26, 2018 at 1:23 PM, Andrew Dunstan
>>  wrote:
>>> Yeah, thanks. revised patch attached
>>
>> FYI the identity regression test started failing recently with this
>> patch applied (maybe due to commit
>> 533c5d8bddf0feb1785b3da17c0d17feeaac76d8?)
>>
>
> Thanks. Probably the same bug Tomas Vondra found a few days ago. I'm on it.
>


Here's a version that fixes the above issue and also the issue with
VACUUM that Tomas Vondra reported. I'm still working on the issue with
aggregates that Tomas also reported.

cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


fast_default-v8.patch
Description: Binary data

Re: [HACKERS] path toward faster partition pruning

2018-02-08 Thread Ashutosh Bapat

On Wed, Feb 7, 2018 at 7:17 PM, Robert Haas  wrote:
> On Wed, Feb 7, 2018 at 8:37 AM, Ashutosh Bapat
>  wrote:
>> While looking at the changes in partition.c I happened to look at the
>> changes in try_partition_wise_join(). They mark partitions deemed
>> dummy by pruning as dummy relations. If we accept those changes, we
>> could very well change the way we handle dummy partitioned tables,
>> which would mean that we also revert the recent commit
>> f069c91a5793ff6b7884120de748b2005ee7756f. But I guess, those changes
>> haven't been reviewed yet and so not final.
>
> Well, if you have an opinion on those proposed changes, I'd like to hear it.

I am talking about changes after this comment
 /*
+ * If either child_rel1 or child_rel2 is not a live partition, they'd
+ * not have been touched by set_append_rel_size.  So, its RelOptInfo
+ * would be missing some information that set_append_rel_size sets for
+ * live partitions, such as the target list, child EQ members, etc.
+ * We need to make the RelOptInfo of even the dead partitions look
+ * minimally valid and as having a valid dummy path attached to it.
+ */

There are couple of problems with this change
1. An N way join may call try_partition_wise_join() with the same base
relation on one side N times. The condition will be tried those many
times.

2. We will have to adjust or make similar changes in
try_partition_wise_aggregate() proposed in the partition-wise
aggregate patch. Right now it checks if the relation is dummy but it
will have to check whether the pathlist is also NULL. Any
partition-wise operation that we try in future will need this
adjustment.

AFAIU, for pruned partitions, we don't set necessary properties of the
corresponding RelOptInfo when it is pruned. If we were sure that we
will not use that RelOptInfo anywhere in the rest of the planning,
this would work. But that's not the case. AFAIU, current planner
assumes that a relation which has not been eliminated before planning
(DEAD relation), but later proved to not contribute any rows in the
result, is marked dummy. Partition pruning breaks that assumption and
thus may have other side effects, that we do not see right now. We
have similar problem with dummy partitioned tables, but we have code
in place to avoid looking at the pathlists of their children by not
considering such a partitioned table as partitioned. May be we want to
change that too.

Either we add refactoring patches to change the planner so that it
doesn't assume something like that OR we make sure that the pruned
partition's RelOptInfo have necessary properties and a dummy pathlist
set. I will vote for second. We spend CPU cycles marking pruned
partitions as dummy if the dummy pathlist is never used. May be we can
avoid setting dummy pathlist if we can detect that a pruned partition
is guaranteed not to be used, e.g when the corresponding partitioned
relation does not participate in any join or other upper planning.

Apart from that another change that caught my eye is

Instead of going through root->append_rel_list to pick up the child
appinfos, store them in an array called part_appinfos that stores
partition appinfos in the same order as RelOptInfos are stored in
part_rels, right when the latter are created.

Further, instead of going through root->pcinfo_list to get the list
of partitioned child rels, which ends up including even the rels
that are pruned by set_append_rel_size(), build up a list of "live"
partitioned child rels and use the same to initialize partitioned_rels
field of AppendPath.

That was voted down by Robert during partition-wise join
implementation. And I agree with him. Any changes around changing that
should change the way we handle AppendRelInfos for all relations, not
just (declarative) partitioned relations.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-02-08 Thread Masahiko Sawada

On Fri, Feb 9, 2018 at 12:45 AM, Claudio Freire  wrote:
> On Thu, Feb 8, 2018 at 1:36 AM, Masahiko Sawada  wrote:
>> On Tue, Feb 6, 2018 at 9:51 PM, Claudio Freire  
>> wrote:
>>> I can look into doing 3, that *might* get rid of the need to do that
>>> initial FSM vacuum, but all other intermediate vacuums are still
>>> needed.
>>
>> Understood. So how about that this patch focuses only make FSM vacuum
>> more frequently and leaves the initial FSM vacuum and the handling
>> cancellation cases? The rest can be a separate patch.
>
> Ok.
>
> Attached split patches. All but the initial FSM pass is in 1, the
> initial FSM pass as in the original patch is in 2.
>
> I will post an alternative patch for 2 when/if I have one. In the
> meanwhile, we can talk about 1.

Thank you for updating the patch!

+/* Tree pruning for partial vacuums */
+if (threshold)
+{
+child_avail = fsm_get_avail(page, slot);
+if (child_avail >= threshold)
+continue;
+}

I think slots in a non-leaf page might not have a correct value
because fsm_set_avail doesn't propagate up beyond pages. So do we need
to check the root of child page rather than a slot in the non-lead
page? I might be missing something though.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: ldapi support

2018-02-08 Thread Thomas Munro

On Fri, Feb 9, 2018 at 4:05 PM, Peter Eisentraut
 wrote:
> After the LDAP code was switched to use ldap_initialize() as part of the
> ldaps support, ldapi (LDAP over Unix-domain sockets) also works.  I
> noticed an old bug report (#13625) that asked for it.  So I suggest this
> patch to document this and add some tests.
>
> One flaw is that this only works when using the URL syntax.  Providing a
> separate option would require coding URL escaping, since ultimately an
> URL must be composed and passed to ldap_initialize().  But since
> OpenLDAP apparently now considers URLs to be the preferred form for
> connection parameters, I'm comfortable just sticking to that format.

Nice.  The test doesn't actually succeed in reloading the pg_hba.conf
file though:

2018-02-09 16:41:15.886 NZDT [24472] LOG:  received SIGHUP, reloading
configuration files
2018-02-09 16:41:15.893 NZDT [24472] LOG:  unsupported LDAP URL scheme: ldapi
2018-02-09 16:41:15.893 NZDT [24472] LOG:  pg_hba.conf was not reloaded

I think hba.c needs to learn to consider "ldapi" to be acceptable
(after it parses the URL).  Then I think when
InitializeLDAPConnection() reconstitutes the URL with psprintf, it'll
probably need to avoid sticking :port on the end.

The fact that we take the URL to pieces and then stick it back
together again may seem a bit odd, but it is required by the
documentation (ldap_initialize() wants a URL "containing only the
schema, the host, and the port fields").

I see there is another scheme called "cldap" (which seems to be
something like LDAP over UDP).  I wonder if anyone cares about that.

-- 
Thomas Munro
http://www.enterprisedb.com

Creation of wiki page for open items of v11

2018-02-08 Thread Michael Paquier

Hi all,

I begin seeing bugs related to new features of v11 popping up around,
like this one for procedures:
https://www.postgresql.org/message-id/CAFj8pRDxOwPPzpA8i%2BAQeDQFj7bhVw-dR2%3D%3DrfWZ3zMGkm568Q%40mail.gmail.com

Are there any objections in creating a wiki page to track all those bugs
and issues?  We don't want to lose track of all that.

Thanks,
--
Michael


signature.asc
Description: PGP signature

ldapi support

2018-02-08 Thread Peter Eisentraut

After the LDAP code was switched to use ldap_initialize() as part of the
ldaps support, ldapi (LDAP over Unix-domain sockets) also works.  I
noticed an old bug report (#13625) that asked for it.  So I suggest this
patch to document this and add some tests.

One flaw is that this only works when using the URL syntax.  Providing a
separate option would require coding URL escaping, since ultimately an
URL must be composed and passed to ldap_initialize().  But since
OpenLDAP apparently now considers URLs to be the preferred form for
connection parameters, I'm comfortable just sticking to that format.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 4b24fa8aaf68327f773327269b23b6b129df858c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 8 Feb 2018 10:21:43 -0500
Subject: [PATCH] Document support for LDAP over Unix-domain sockets (ldapi)

After the LDAP code was switched to use ldap_initialize() as part of the
ldaps support, ldapi also works, so we might as well document it and add
some tests.

Bug: #13625
---
 doc/src/sgml/client-auth.sgml | 10 ++
 src/test/ldap/t/001_auth.pl   | 25 +++--
 2 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index 53832d08e2..8ed95bef4e 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -1618,6 +1618,7 @@ LDAP Authentication
  other LDAP options in a more compact and standard form.  The format is
 
 
ldap[s]://host[:port]/basedn[?[attribute][?[scope][?[filter
+ldapi://path/basedn[?[attribute][?[scope][?[filter
 
  scope must be one
  of base, one, 
sub,
@@ -1640,6 +1641,15 @@ LDAP Authentication
  ldapurl.
 
 
+
+ The URL scheme ldapi makes the LDAP connection
+ over a Unix-domain socket.  The path must
+ be URL-encoded, for example
+ ldapi://%2Fusr%2Flocal%2Fvar%2Fldapi.  This
+ connection method can only specified as a URL, not via separate
+ options.
+
+
 
  For non-anonymous binds, ldapbinddn
  and ldapbindpasswd must be specified as separate
diff --git a/src/test/ldap/t/001_auth.pl b/src/test/ldap/t/001_auth.pl
index 5508da459f..71045a5866 100644
--- a/src/test/ldap/t/001_auth.pl
+++ b/src/test/ldap/t/001_auth.pl
@@ -2,7 +2,7 @@
 use warnings;
 use TestLib;
 use PostgresNode;
-use Test::More tests => 19;
+use Test::More tests => 20;
 
 my ($slapd, $ldap_bin_dir, $ldap_schema_dir);
 
@@ -32,6 +32,16 @@
 
 $ENV{PATH} = "$ldap_bin_dir:$ENV{PATH}" if $ldap_bin_dir;
 
+sub url_encode {
+   my $string = shift;
+   $string =~ s/([^a-zA-Z0-9\-_.!~*'()])/sprintf("%%%02X", ord($1))/ge;
+   $string =~ tr/ /+/;
+   return $string;
+}
+
+# for Unix-domain socket
+my $tempdir_short = TestLib::tempdir_short;
+
 my $ldap_datadir = "${TestLib::tmp_check}/openldap-data";
 my $slapd_certs = "${TestLib::tmp_check}/slapd-certs";
 my $slapd_conf = "${TestLib::tmp_check}/slapd.conf";
@@ -41,7 +51,9 @@
 my $ldap_server = 'localhost';
 my $ldap_port = int(rand() * 16384) + 49152;
 my $ldaps_port = $ldap_port + 1;
+my $ldapi_socket = "$tempdir_short/ldapi";
 my $ldap_url = "ldap://$ldap_server:$ldap_port";;
+my $ldapi_url = "ldapi://" . url_encode($ldapi_socket);
 my $ldaps_url = "ldaps://$ldap_server:$ldaps_port";
 my $ldap_basedn = 'dc=example,dc=net';
 my $ldap_rootdn = 'cn=Manager,dc=example,dc=net';
@@ -86,7 +98,7 @@
 system_or_bail "openssl", "req", "-new", "-nodes", "-keyout", 
"$slapd_certs/server.key", "-out", "$slapd_certs/server.csr", "-subj", 
"/cn=server";
 system_or_bail "openssl", "x509", "-req", "-in", "$slapd_certs/server.csr", 
"-CA", "$slapd_certs/ca.crt", "-CAkey", "$slapd_certs/ca.key", 
"-CAcreateserial", "-out", "$slapd_certs/server.crt";
 
-system_or_bail $slapd, '-f', $slapd_conf, '-h', "$ldap_url $ldaps_url";
+system_or_bail $slapd, '-f', $slapd_conf, '-h', "$ldap_url $ldapi_url 
$ldaps_url";
 
 END
 {
@@ -162,6 +174,15 @@ sub test_access
 $ENV{"PGPASSWORD"} = 'secret1';
 test_access($node, 'test1', 0, 'search+bind with LDAP URL authentication 
succeeds');
 
+note "ldapi";
+
+unlink($node->data_dir . '/pg_hba.conf');
+$node->append_conf('pg_hba.conf', qq{local all all ldap 
ldapurl="$ldapi_url/$ldap_basedn?uid?sub"});
+$node->reload;
+
+$ENV{"PGPASSWORD"} = 'secret1';
+test_access($node, 'test1', 0, 'authentication using ldapi URL succeeds');
+
 note "search filters";
 
 unlink($node->data_dir . '/pg_hba.conf');

base-commit: b3a101eff0fd3747bebf547b1769e28f820f4515
-- 
2.16.1

Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

2018-02-08 Thread Etsuro Fujita


(2018/02/09 4:32), Robert Haas wrote:

On Thu, Feb 8, 2018 at 11:05 AM, Tom Lane  wrote:

According to

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2018-02-08%2001%3A45%3A01

there's still an intermittent issue.  I ran "make installcheck" in
contrib/postgres_fdw in a loop, and got a similar failure on the
47th try --- my result duplicates the second plan change shown by
rhinoceros, but not the first one.  I speculate that the plan change
is a result of autovacuum kicking in partway through the run.


Will look into this.


Hmm.  Maybe inserting an ANALYZE command in the right place would fix it?


VACUUM to the right tables in the right place might better fix it? 
Another idea would be to modify test cases added by that commit so that 
they don't modify the existing tables to not break the existing test cases?


Best regards,
Etsuro Fujita

Re: [HACKERS] MERGE SQL Statement for PG11

2018-02-08 Thread Peter Geoghegan

On Wed, Feb 7, 2018 at 7:51 PM, Pavan Deolasee  wrote:
> I understand getting EPQ semantics right is very important. Can you please
> (once again) summarise your thoughts on what you think is the *most*
> appropriate behaviour? I can then think how much efforts might be involved
> in that. If the efforts are disproportionately high, we can discuss if
> settling for some not-so-nice semantics, like we apparently did for
> partition key updates.

I personally believe that the existing EPQ semantics are already
not-so-nice. They're what we know, though, and we haven't actually had
any real world complaints, AFAIK.

My concern is mostly just that MERGE manages to behave in a way that
actually "provides a single SQL statement that can conditionally
INSERT, UPDATE or DELETE rows, a task that would otherwise require
multiple procedural language statements", as the docs put it. As long
as MERGE manages to do something as close to that high level
description as possible in READ COMMITTED mode (with our current
semantics for multiple statements in RC taken as the baseline), then
I'll probably be happy.

Some novel new behavior -- "EPQ with a twist"-- is clearly necessary.
I feel a bit uneasy about it because anything that anybody suggests is
likely to be at least a bit arbitrary (EPQ itself is kind of
arbitrary). We only get to make a decision on how "EPQ with a twist"
will work once, and that should be a decision that is made following
careful deliberation. Ambiguity is much more likely to kill a patch
than a specific technical defect, at least in my experience. Somebody
can usually just fix a technical defect.

> I am sorry, I know you and Simon have probably done that a few times already
> and I should rather study those proposals first. So it's okay if you don't
> want to repeat those; I will work on them next week once I am back from
> holidays.

Unfortunately, I didn't get very far with Simon on this. I didn't
really start talking about this until recently, though, so it's not
like you missed much. The first time I asked Simon about this was
January 23rd, and I first proposed something about 10 days ago.
Something very tentative.

(I did ask some questions about EPQ, and even WHEN ... AND quals much
earlier, but that was in the specific context of a debate about
MERGE's use of ON CONFLICT's speculative insertion mechanism. I
consider this to be a totally different discussion, that ended before
Simon even posted his V1 patch, and isn't worth spending your time on
now.)

> TBH I did not consider partitioning any less complex and it was indeed very
> complex, requiring at least 3 reworks by me. And from what I understood, it
> would have been a blocker too. So is subquery handling and RLS. That's why I
> focused on addressing those items while you and Simon were still debating
> EPQ semantics.

Sorry if I came across as dismissive of that effort. That was
certainly not my intention. I am pleasantly surprised that you've
managed to move a number of things forward rather quickly.

I'll rephrase: while it would probably have been a blocker in theory
(I didn't actually weigh in on that), I doubted that it would actually
end up doing so in practice (and it now looks like I was right to
doubt that, since you got it done). It was a theoretical blocker, as
opposed to an open item that could drag on indefinitely despite
everyone's best efforts. Obviously details matter, and obviously there
are a lot of details to get right outside of RC semantics, but it
seems wise to focus on the big risk that is EPQ/RC conflict handling.

The only other thing that comes close to that risk is the risk that
we'll get stuck on RLS. Though even the RLS discussion may actually
end up being blocked on this crucial question of EPQ/RC conflict
handling. Did you know that the RLS docs [1] have a specific
discussion of the implications of EPQ for users of RLS, and that it
mentions doing things like using SELECT ... FOR SHARE to work around
the problem? It has a whole example of a scenario that users actually
kind of need to know about, at least in theory. RC conflict handling
semantics could bleed into a number of other things.

I'll need to think some more about RC conflict handling (deciding what
"EPQ with a twist" actually means), since I haven't focused on MERGE
recently. Bear with me.

[1] https://www.postgresql.org/docs/current/static/ddl-rowsecurity.html
-- 
Peter Geoghegan

Re: PostgreSQL 2018-02-08 Security Update Release

2018-02-08 Thread Tatsuo Ishii

> Greetings Tatsuo,
> 
> * Tatsuo Ishii (is...@sraoss.co.jp) wrote:
>> In
>> https://www.postgresql.org/about/news/1829/
>> 
>> URL links to both CVE-2018-1052 and CVE-2018-1053 give me a 404 error.
>> I am the only one who are getting the error?
> 
> Unfortunately, we don't have any control over when RedHat updates their
> system for the CVEs.

That means the URL on our page is correct, but their system not yet
prepare the contens.

Thanks for the clarification!
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: PostgreSQL 2018-02-08 Security Update Release

2018-02-08 Thread Stephen Frost

Greetings Tatsuo,

* Tatsuo Ishii (is...@sraoss.co.jp) wrote:
> In
> https://www.postgresql.org/about/news/1829/
> 
> URL links to both CVE-2018-1052 and CVE-2018-1053 give me a 404 error.
> I am the only one who are getting the error?

Unfortunately, we don't have any control over when RedHat updates their
system for the CVEs.

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: PostgreSQL 2018-02-08 Security Update Release

2018-02-08 Thread Tatsuo Ishii

In
https://www.postgresql.org/about/news/1829/

URL links to both CVE-2018-1052 and CVE-2018-1053 give me a 404 error.
I am the only one who are getting the error?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

From: Stephen Frost 
Subject: PostgreSQL 2018-02-08 Security Update Release
Date: Thu, 8 Feb 2018 08:59:05 -0500
Message-ID: <20180208135905.gq2...@tamriel.snowman.net>

> 2018-02-08 Security Update Release
> ==
> 
> The PostgreSQL Global Development Group has released an update to all 
> supported
> versions of our database system, including 10.2, 9.6.7, 9.5.11, 9.4.16, 
> 9.3.21.
> This release fixes two security issues. This release also fixes issues with
> VACUUM, GIN indexes, and hash indexes that could lead to data corruption, as
> well as fixes for using parallel queries and logical replication.
> 
> All users using the affected versions of PostgreSQL should update as soon as
> possible. Please see the notes on "Updating" below for any post-update steps
> that may be required.
> 
> Please note that PostgreSQL changed its versioning scheme with the release of
> version 10.0, so updating to version 10.2 from 10.0 or 10.1 is considered a
> minor update.
> 
> Security Issues
> ---
> 
> Two security vulnerabilities have been fixed by this release:
> 
> * CVE-2018-1052: Fix the processing of partition keys containing multiple
> expressions
> * CVE-2018-1053: Ensure that all temporary files made with "pg_upgrade" are
> non-world-readable
> 
> Bug Fixes and Improvements
> --
> 
> This update fixes over 60 bugs reported in the last few months. Some of these
> issues affect only version 10, but many affect all supported versions:
> 
> * Fix crash and potential disclosure of backend memory when processing 
> partition
> keys containing multiple expressions
> * Fix potential disclosure of temporary files containing database passwords
> created by pg_upgrade by not allowing these files to be world-accessible
> * Fix cases where VACUUM would not remove dead rows if they were updated while
> "key-share" locked, leading to potential data corruption
> * Fix for GIN indexes to prevent bloat by ensuring the pending-insertions list
> is cleaned up by VACUUM
> * Fix potential index corruption with hash indexes due to failure to mark
> metapages as dirty
> * Fix several potential crash scenarios for parallel queries, including when a
> bitmap heap scan cannot allocate memory
> * Fix several potential hang-ups in parallel queries, including when a 
> parallel
> worker fails to start
> * Fix collection of EXPLAIN statistics from parallel workers
> * Prevent fake deadlock failures when multiple sessions are running
> CREATE INDEX CONCURRENTLY
> * Fix for trigger behavior when using logical replication
> * Several fixes for "walsender" functionality to improve stability as well as
> visibility into the replication process
> * Fix logical decoding to correctly clean up disk files for crashed 
> transactions
> * Several fixes for identity columns, including disallowing identity columns 
> on
> tables derived from composite types and partitions
> * Fix handling of list partitioning constraints for partition keys of boolean
> and array types
> * Fix incorrectly generated plans for UPDATE and DELETE queries when a table 
> has
> a mix of inherited regular and foreign child tables
> * Fix incorrect query results from cases involving GROUPING SETS when used 
> with
> flattened subqueries
> * Fix UNION/INTERSECT/EXCEPT over zero columns, e.g. "SELECT UNION SELECT;"
> * Several fixes for subqueries within a LATERAL subquery
> * Several improvements for query planning estimation
> * Allow a client that supports SCRAM channel binding, such as a future version
> of PostgreSQL or libpq, to connect to a PostgreSQL 10 server
> * Fix sample INSTR() functions used to help transition from Oracle(r) PL/SQL 
> to
> PostgreSQL PL/pgSQL to correctly match Oracle functional behavior
> * Fix pg_dump to make permissions (ACL), security label, and comment entries
> reliably identifiable in archive outputs
> * Modify behavior for contrib/cube's "cube ~> int" operator to make it
> compatible with KNN search. This is a backwards incompatible change and any
> expression indexes or materialized views using this operator will need to be
> reindexed and refreshed, respectively.
> * Several fixes in contrib/postgres_fdw to prevent query planner errors
> * Added modern examples of auto-start scripts for PostgreSQL on macOS in the
> contrib/start-scripts/macos directory
> * Several fixes for Windows, including postmaster startup and compatibility 
> with
> libperl
> * Spinlock fixes and support for Motorola 68K and 88K architectures
> 
> This update also contains tzdata release 2018c, with updates for DST law 
> changes
> in Brazil, Sao Tome and Principe, plus historical corrections for Bolivia,
> Japan,

Re: Fix a typo in walsender.c

2018-02-08 Thread atorikoshi


On 2018/02/09 4:40, Robert Haas wrote:

On Thu, Feb 8, 2018 at 1:32 AM, atorikoshi
 wrote:

Attached a minor patch for variable name in comment:


Committed.



Thank you!

Re: update tuple routing and triggers

2018-02-08 Thread Amit Langote

On 2018/02/09 4:31, Robert Haas wrote:
> On Wed, Feb 7, 2018 at 8:26 PM, Amit Langote
>  wrote:
>> OK, done in the attached.
> 
> Committed.  Thanks.

Thank you.  Sorry, missed renaming leafrel that you did yourself.

Regards,
Amit

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2018-02-08 Thread Alvaro Herrera

Claudio Freire wrote:

> I don't like looping, though, seems overly cumbersome. What's worse?
> maintaining that fragile weird loop that might break (by causing
> unexpected output), or a slight slowdown of the test suite?
>
> I don't know how long it might take on slow machines, but in my
> machine, which isn't a great machine, while the vacuum test isn't fast
> indeed, it's just a tiny fraction of what a simple "make check" takes.
> So it's not a huge slowdown in any case.

Well, what about a machine running tests under valgrind, or the weird
cache-clobbering infuriatingly slow code?  Or buildfarm members running
on really slow hardware?  These days, a few people have spent a lot of
time trying to reduce the total test time, and it'd be bad to lose back
the improvements for no good reason.

I grant you that the looping I proposed is more complicated, but I don't
see any reason why it would break.

Another argument against the LOCK pg_class idea is that it causes an
unnecessary contention point across the whole parallel test group --
with possible weird side effects.  How about a deadlock?

Other than the wait loop I proposed, I think we can make a couple of
very simple improvements to this test case to avoid a slowdown:

1. the DELETE takes about 1/4th of the time and removes about the same
number of rows as the one using the IN clause:
  delete from vactst where random() < 3.0 / 4;

2. Use a new temp table rather than vactst.  Everything is then faster.

3. Figure out the minimum size for the table that triggers the behavior
   you want.  Right now you use 400k tuples -- maybe 100k are sufficient?
   Don't know.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: JIT compiling with LLVM v10.0

2018-02-08 Thread Thomas Munro

On Fri, Feb 9, 2018 at 3:14 AM, Dmitry Dolgov <9erthali...@gmail.com> wrote:
> $ ./configure --prefix=/build/postgres-jit/ --with-llvm
> --enable-debug --enable-depend --enable-cassert

> /usr/include/c++/5/bits/c++0x_warning.h:32:2: error: #error This
> file requires compiler and library support for the ISO C++ 2011
> standard. This support must be enabled with the -std=c++11 or
> -std=gnu++11 compiler options.

Did you try passing CXXFLAGS="-std=c++11" to configure?

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Fix a typo in walsender.c

2018-02-08 Thread Robert Haas

On Thu, Feb 8, 2018 at 1:32 AM, atorikoshi
 wrote:
> Attached a minor patch for variable name in comment:

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

2018-02-08 Thread Robert Haas

On Thu, Feb 8, 2018 at 11:05 AM, Tom Lane  wrote:
> Etsuro Fujita  writes:
>> (2018/02/08 10:40), Robert Haas wrote:
>>> Uggh, I missed the fact that they were doing that.  It's probably
>>> actually useful test coverage, but it's not surprising that it isn't
>>> stable.
>
>> That was my purpose, but I agree with the instability.  Thanks again,
>> Robert!
>
> According to
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2018-02-08%2001%3A45%3A01
>
> there's still an intermittent issue.  I ran "make installcheck" in
> contrib/postgres_fdw in a loop, and got a similar failure on the
> 47th try --- my result duplicates the second plan change shown by
> rhinoceros, but not the first one.  I speculate that the plan change
> is a result of autovacuum kicking in partway through the run.

Hmm.  Maybe inserting an ANALYZE command in the right place would fix it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: update tuple routing and triggers

2018-02-08 Thread Robert Haas

On Wed, Feb 7, 2018 at 8:26 PM, Amit Langote
 wrote:
> OK, done in the attached.

Committed.  Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: it's a feature, but it feels like a bug

2018-02-08 Thread Rafal Pietrak



W dniu 08.02.2018 o 05:51, David Fetter pisze:
> On Wed, Feb 07, 2018 at 10:26:50PM -0500, Tom Lane wrote:
>> Rafal Pietrak  writes:
[-]
> 
> CREATE TABLE foo(b BOOLEAN, i INTEGER NOT NULL, t TEXT NOT NULL) PARTITION BY 
> LIST (b);
> CREATE TABLE foo_true PARTITION OF foo (PRIMARY KEY(i, t)) FOR VALUES IN 
> ('true');
> CREATE TABLE bar(foo_i INTEGER NOT NULL, foo_t TEXT NOT NULL, FOREIGN 
> KEY(foo_i, foo_t) REFERENCES foo_true);

This is exactly my current setup.

It creates other problems with manageing my dataset, so I'm looking for
other ways to lay down the schema.

thenx,

-R

Re: it's a feature, but it feels like a bug

2018-02-08 Thread Rafal Pietrak

W dniu 08.02.2018 o 04:26, Tom Lane pisze:
> Rafal Pietrak  writes:
[]
> 
>> And it is sort of "couterintuitive" - as you can see, there is a UNIQUE
>> index for test(a,b) target; admitedly partial, but  why should that
>> matter?
> 
> Because the index fails to guarantee uniqueness of (a,b) in rows where d
> isn't true.  There could be many duplicates in such rows, possibly even of
> (a,b) pairs that also appear --- though only once --- in rows where d is
> true.
> 
> If there were a way to say that the FK is only allowed to reference rows
> where d is true, then this index could support an FK like that.  But
> there's no way to express such a thing in SQL.

I sort of knew/expected that.

But I'd like to approach the sources anyway.

> 
> Personally I'd think about separating your rows-where-d-is-true into
> their own table, which could have a normal PK index.  You could still
> create a union view over that table and the one with the other rows
> to satisfy whatever queries want to think the two kinds of rows
> are the same thing.  But I'd offer that if one set of rows has (a,b)
> as a PK and the other does not, they are not really the same kind
> of thing.

Actually, they are. the explanation of my schema would be lengthy, but
in showt, lets say, I'm talking of a mail-hub, where : A=mbox-owner-id,
B=message-UNIQUE-id, C=the-other-entity-id, D=flas-inbox-outbox'; the
table contains every message anyone send or received. only sender
assigns ID to a message. So:
all outgoing messages have unique (A,B), and D=true
all received messages have unique (B,C), and D=false
those messages are parsed, digested, and they update columns of their
respective rows.
... the tricky part is, that some of them must form explicit lists. This
is column (E). This is why I need to have an FK (E,A) --> (B,A).

Currently, to use FK in this dataset I have the main table split into:
inbox, and outbox. Unfortunately this fires back as the entire schema
effectively has to have twice the number of relations, and FK
interlinking it growing almost as O(2) with tables. At the point that I
am, this is already unmanagable.

So I'm quite desperate to "do it some other way". Like patching postgresql.

I was thinking, that: an attempt to "alter table add constraint ..
foreign key..." could:
a) identify if the target table has ANY sort of UNIQUE index covering
provided list of columns (even if it is a partial index)
b) if that index is only partial, locate the condition and use it during
insert/update/etc and retrieval of target row.
c) if that index is functional index, locate that function and use it
during insert/update/etc.

So I'd appreciate some guidence which part of the sources I should study
first.

regards,

-R

Re: Proposal: partition pruning by secondary attributes

2018-02-08 Thread Andres Freund

On 2018-02-08 14:48:34 -0300, Alvaro Herrera wrote:
> Ildar Musin wrote:
> 
> > The idea is to store min and max values of secondary attributes (like
> > 'id' in the example above) for each partition somewhere in catalog and
> > use it for partition pruning along with partitioning key. You can think
> > of it as somewhat like BRIN index but for partitions.
> 
> What is the problem with having a BRIN index?

Building plans to scan the individual partitions, lock them, open the
relevant files, etc is often going to be significantly more expensive
than pruning at plan time.

But there also seems to be a number of fairly nasty locking issues with
this proposal, leaving the amount of required code aside.

Greetings,

Andres Freund

Re: [HACKERS] path toward faster partition pruning

2018-02-08 Thread Alvaro Herrera

Robert Haas wrote:
> On Wed, Feb 7, 2018 at 3:42 AM, Ashutosh Bapat
>  wrote:

> > partition.c seems to have two kinds of functions 1. that build and
> > manage relcache, creates quals from bounds etc. which are metadata
> > management kind 2. partition bound comparison functions, and other
> > optimizer related functions. May be we should divide the file that
> > way. The first category code remains in catalog/ as it is today. The
> > second catagory functions move to optimizer/.
> 
> It would be sensible to separate functions that build and manage data
> in the relcache from other functions.  I think we should consider
> moving the existing functions of that type from partition.c to
> src/backend/utils/cache/partcache.c.

FWIW I've been thinking that perhaps we need some other separation of
code better than statu quo.  The current partition.c file includes stuff
for several modules and ISTM all these new patches are making more and
more of a mess.  So +1 to the general idea of splitting things up.
Maybe partcache.c is not ambitious enough, but it seems a good first
step.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Query running for very long time (server hanged) with parallel append

2018-02-08 Thread Robert Haas

On Wed, Feb 7, 2018 at 2:11 AM, Amit Khandekar  wrote:
> Attached is the patch accordingly.

OK, I see.  That makes sense; committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Proposal: partition pruning by secondary attributes

2018-02-08 Thread Alvaro Herrera

Ildar Musin wrote:

> The idea is to store min and max values of secondary attributes (like
> 'id' in the example above) for each partition somewhere in catalog and
> use it for partition pruning along with partitioning key. You can think
> of it as somewhat like BRIN index but for partitions.

What is the problem with having a BRIN index?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] A design for amcheck heapam verification

2018-02-08 Thread Peter Geoghegan

On Thu, Feb 8, 2018 at 6:05 AM, Andrey Borodin  wrote:
> I do not see a reason behind hashing the seed.

It made some sense when I was XOR'ing it to mix. A uniform
distribution of bits seemed desirable then, since random() won't use
the most significant bit -- it generates random numbers in the range
of 0 to 2^31-1. It does seem unnecessary now.

> Also, I'd like to reformulate this paragraph. I understand what you want to 
> say, but the sentence is incorrect.
> + * The Bloom filter behaves non-deterministically when caller passes a random
> + * seed value.  This ensures that the same false positives will not occur 
> from
> + * one run to the next, which is useful to some callers.
> Bloom filter behaves deterministically, but differently. This does not 
> ensures any thing, but probably will give something with hight probability.

I agree that that's unclear. I should probably cut it down, and say
something like "caller can pass a random seed to make it unlikely that
the same false positives will occur from one run to the next".

-- 
Peter Geoghegan

Re: New gist vacuum.

2018-02-08 Thread David Steele

Hi Andrey,

On 2/7/18 10:46 AM, Andrey Borodin wrote:
>> 7 февр. 2018 г., в 18:39, David Steele  написал(а):
>>
>> Hi Andrey,
>>
>> On 1/21/18 5:34 AM, Andrey Borodin wrote:
>>> Hello, Alexander!
 16 янв. 2018 г., в 21:42, Andrey Borodin  написал(а):
 Please find README patch attached.
>>>
>>> Here's v2 version. Same code, but x2 comments. Also fixed important typo in 
>>> readme BFS->DFS. Feel free to ping me any time with questions.
>>
>> This patch has not gotten review and does not seem like a good fit for
>> the last PG11 CF so I am marking it Returned with Feedback.
> 
> Why do you think this patch does not seem good fit for CF?

Apologies for the brevity.  I had about 40 patches to go through yesterday.

The reason it does not seem a good fit is that it's a new, possibly
invasive patch that has not gotten any review in the last three CFs
since it was reintroduced.  I'm not sure why that's the case and I have
no opinion about the patch itself, but there it is.

We try to avoid new patches in the last CF that could be destabilizing
and this patch appears to be in that category.  I know it has been
around for a while, but the lack of review makes it "new" in the context
of the last CF for PG11.

> I've been talking with Alexander just yesterday at PgConf.Russia, and he was 
> going to provide review.

Great!  I'd suggest you submit this patch for the CF after 2018-03.

However, that's completely up to you.

Regards,
-- 
-David
da...@pgmasters.net

Re: SSL test names

2018-02-08 Thread Peter Eisentraut

On 2/7/18 23:18, Michael Paquier wrote:
> You need to update the comment on top of test_connect_ok in
> ServerSetup.pm.

done and committed

> Wouldn't it be better to use the expected result
> as an argument and merge test_connect_ok and test_connect_fails?

That doesn't seem to be the general style, and I think it's more
readable the way it is now.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Proposal: generic WAL compression

2018-02-08 Thread Markus Nullmeier

On 07/02/18 19:42, Stephen Frost wrote:
>> really useful for my "OUZO" project (a fork of the RUM access method).
> 
> Glad to hear that you're continuing to work on it.

Yes, it will be available on Github eventually.

>> One general comment I can already make is that enabling compression
>> should be made optional, which appears to be a small and easy addition
>> to the generic WAL API.
> 
> The question would be- is there a way for us to determine when it should
> be enabled or not enabled, or do we have to ask the user to tell us?

My suggestion would be adding something like

   #define GENERIC_XLOG_MOVE_DETECT_ON   0x0002  /* search for moved
parts of page image */
   #define GENERIC_XLOG_MOVE_DETECT_OFF  0x0004  /* do not search for
moved parts of page image */

This would be used for the 'flags' argument of GenericXLogRegisterBuffer(),
allowing code using generic WAL (such as custom access methods) to
control this kind of compression on a per buffer basis, if need be.
Then it would be up to the author of any extension relying on generic
WAL to decide if and how the user will be given control of WAL
compression.

However,

> Asking the user is something we'd really not do unless we absolutely
> have to.  Much better is if we can figure out when it's best to enable
> or disable (or use/not use) based on some criteria and do so
> automatically.

this is a point I did not think of before :-)  Probably a logical
choice would be having "GENERIC_XLOG_MOVE_DETECT" default to true
for a 'wal_level' value of 'replica' (the default setting nowadays)
or higher, and default to false for 'minimal'.
In this way, generic WAL will be automatically compressed when it is
potentially stored for some longer time, or consuming network bandwidth.

-- 
Markus Nullmeierhttp://www.g-vo.org
German Astrophysical Virtual Observatory (GAVO)

Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

2018-02-08 Thread Tom Lane

Etsuro Fujita  writes:
> (2018/02/08 10:40), Robert Haas wrote:
>> Uggh, I missed the fact that they were doing that.  It's probably
>> actually useful test coverage, but it's not surprising that it isn't
>> stable.

> That was my purpose, but I agree with the instability.  Thanks again, 
> Robert!

According to

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2018-02-08%2001%3A45%3A01

there's still an intermittent issue.  I ran "make installcheck" in
contrib/postgres_fdw in a loop, and got a similar failure on the
47th try --- my result duplicates the second plan change shown by
rhinoceros, but not the first one.  I speculate that the plan change
is a result of autovacuum kicking in partway through the run.

regards, tom lane

Re: In logical replication concurrent update of partition key creates a duplicate record on standby.

2018-02-08 Thread amul sul

On Thu, Feb 8, 2018 at 5:55 PM, Amit Kapila  wrote:
> On Wed, Feb 7, 2018 at 6:00 PM, Amit Kapila  wrote:
>> On Wed, Feb 7, 2018 at 3:42 PM, amul sul  wrote:
>>> On Wed, Feb 7, 2018 at 3:03 PM, Amit Khandekar  
>>> wrote:
 On 7 February 2018 at 13:53, amul sul  wrote:
> Hi,
>
> If an update of partition key involves tuple movement from one partition 
> to
> another partition then there will be a separate delete on one partition 
> and
> insert on the other partition made.
>
> In the logical replication if an update performed on the master and 
> standby at
> the same moment, then replication worker tries to replicate delete + 
> insert
> operation on standby. While replying master changes on standby for the 
> delete
> operation worker will log "concurrent update, retrying" message (because 
> the
> update on standby has already deleted) and move forward to reply the next
> insert operation. Standby update also did the same delete+insert is as 
> part of
> the update of partition key in a result there will be two records 
> inserted on
> standby.

 A quick thinking on how to resolve this makes me wonder if we can
 manage to pass some information through logical decoding that the
 delete is part of a partition key update. This is analogous to how we
 set some information locally in the tuple by setting
 tp.t_data->t_ctid.ip_blkid to InvalidBlockNumber.

>>>
>>> +1,
>>>
>>
>> I also mentioned the same thing in the other thread [1], but I think
>> that alone won't solve the dual record problem as you are seeing.  I
>> think we need to do something for next insert as you are suggesting.
>>
>
> Can you please once check what was the behavior before Update Tuple
> routing patch (Commit-id: 2f178441) went in?
>

Before this commit such update will be failed with following error:

postgres=#  update foo set a=2, b='node1_update' where a=1;
ERROR:  new row for relation "foo1" violates partition constraint
DETAIL:  Failing row contains (2, node1_update).

Regards,
Amul

Re: In logical replication concurrent update of partition key creates a duplicate record on standby.

2018-02-08 Thread amul sul

On Wed, Feb 7, 2018 at 6:00 PM, Amit Kapila  wrote:
> On Wed, Feb 7, 2018 at 3:42 PM, amul sul  wrote:
>> On Wed, Feb 7, 2018 at 3:03 PM, Amit Khandekar  
>> wrote:
>>> On 7 February 2018 at 13:53, amul sul  wrote:
 Hi,

 If an update of partition key involves tuple movement from one partition to
 another partition then there will be a separate delete on one partition and
 insert on the other partition made.

 In the logical replication if an update performed on the master and 
 standby at
 the same moment, then replication worker tries to replicate delete + insert
 operation on standby. While replying master changes on standby for the 
 delete
 operation worker will log "concurrent update, retrying" message (because 
 the
 update on standby has already deleted) and move forward to reply the next
 insert operation. Standby update also did the same delete+insert is as 
 part of
 the update of partition key in a result there will be two records inserted 
 on
 standby.
>>>
>>> A quick thinking on how to resolve this makes me wonder if we can
>>> manage to pass some information through logical decoding that the
>>> delete is part of a partition key update. This is analogous to how we
>>> set some information locally in the tuple by setting
>>> tp.t_data->t_ctid.ip_blkid to InvalidBlockNumber.
>>>
>>
>> +1,
>>
>
> I also mentioned the same thing in the other thread [1], but I think
> that alone won't solve the dual record problem as you are seeing.  I
> think we need to do something for next insert as you are suggesting.
>
>> also if  worker failed to reply delete operation on standby then
>> we need to decide what will be the next step, should we skip follow
>> insert operation or error out or something else.
>>
>
> That would be tricky, do you see any simple way of doing either of those.
>

Not really, like ExecUpdate for an update of partition key if delete is failed
then the further insert will be skipped, but you are correct, it might be more
tricky than I can think -- there is no guarantee that the next insert operation
which replication worker trying to replicate is part of the update of partition
key mechanism.  How can one identify that an insert operation on one relation is
related to previously deleting operation on some other relation?

Regards,
Amul

Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-02-08 Thread Claudio Freire

On Thu, Feb 8, 2018 at 1:36 AM, Masahiko Sawada  wrote:
> On Tue, Feb 6, 2018 at 9:51 PM, Claudio Freire  wrote:
>> I can look into doing 3, that *might* get rid of the need to do that
>> initial FSM vacuum, but all other intermediate vacuums are still
>> needed.
>
> Understood. So how about that this patch focuses only make FSM vacuum
> more frequently and leaves the initial FSM vacuum and the handling
> cancellation cases? The rest can be a separate patch.

Ok.

Attached split patches. All but the initial FSM pass is in 1, the
initial FSM pass as in the original patch is in 2.

I will post an alternative patch for 2 when/if I have one. In the
meanwhile, we can talk about 1.
From 0a980794c90973383e5a63f64839616a79542260 Mon Sep 17 00:00:00 2001
From: Claudio Freire 
Date: Thu, 8 Feb 2018 12:39:15 -0300
Subject: [PATCH 2/2] Vacuum: do a partial FSM vacuum at the beginning

A partial FSM vacuum right at the start of the vacuum process
helps ameliorate issues with cancelled autovacuums never updating
the FSM.
---
 src/backend/commands/vacuumlazy.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 9ad0f34..2965204 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -103,6 +103,19 @@
  */
 #define PREFETCH_SIZE			((BlockNumber) 32)

+/*
+ * If autovacuums get regularly cancelled, the FSM may never be
+ * vacuumed. To work around that, we perform an initial partial
+ * FSM vacuum at the beginning of the vacuum process, to quickly
+ * make existing unmarked free space visible. To avoid useless
+ * redundant work, however, we avoid recursing into branches
+ * that already have a set amount of free space, we only try
+ * to discover unmarked free space. This value controls how
+ * much free space is enough free space in that context. The
+ * value is in the terms of the FSM map.
+ */
+#define INITIAL_PARTIAL_FSM_VACUUM_THRESHOLD	64
+
 typedef struct LVRelStats
 {
 	/* hasindex = true means two-pass strategy; false means one-pass */
@@ -250,6 +263,17 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 	vacrelstats->pages_removed = 0;
 	vacrelstats->lock_waiter_detected = false;

+	/*
+	 * Vacuum the Free Space Map partially before we start.
+	 * If an earlier vacuum was canceled, and that's likely in
+	 * highly contended tables, we may need to finish up. If we do
+	 * it now, we make the space visible to other backends regardless
+	 * of whether we succeed in finishing this time around.
+	 * Don't bother checking branches that already have usable space,
+	 * though.
+	 */
+	FreeSpaceMapVacuum(onerel, INITIAL_PARTIAL_FSM_VACUUM_THRESHOLD);
+
 	/* Open all indexes of the relation */
 	vac_open_indexes(onerel, RowExclusiveLock, &nindexes, &Irel);
 	vacrelstats->hasindex = (nindexes > 0);
-- 
1.8.4.5

From fdb2e0ae486fc6160df48c6a668eafa50b32814a Mon Sep 17 00:00:00 2001
From: Claudio Freire 
Date: Fri, 28 Jul 2017 21:31:59 -0300
Subject: [PATCH 1/2] Vacuum: Update FSM more frequently

Make Vacuum update the FSM more frequently, to avoid the case where
autovacuum fails to reach the point where it updates the FSM in
highly contended tables.

Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids
recursing into branches that already contain enough free space, to
avoid having to traverse the whole FSM and thus induce quadratic
costs. Intermediate FSM vacuums are only supposed to make enough
free space visible to avoid extension until the final (non-partial)
FSM vacuum.

Run partial FSM vacuums after each index pass, which is a point
at which whole ranges of the heap have been thorougly cleaned, and
we can expect no further updates to those ranges of the FSM save
for concurrent activity. When there are no indexes, and thus no
index passes, run partial FSM vacuums every 8GB of dirtied pages
or 1/8th of the relation, whichever is highest. This allows some
partial work to be made visible without incurring quadratic cost.

In any case, FSM are small in relation to the table, so even when
quadratic cost is involved, it should not be problematic. Index
passes already incur quadratic cost, and the addition of the FSM
is unlikely to be measurable.

Run a partial FSM vacuum with a low pruning threshold right at
the beginning of the VACUUM to finish up any work left over from
prior canceled vacuum runs, something that is common in highly
contended tables when running only autovacuum.
---
 src/backend/access/brin/brin.c|  2 +-
 src/backend/access/brin/brin_pageops.c| 10 +++
 src/backend/commands/vacuumlazy.c | 50 +--
 src/backend/storage/freespace/freespace.c | 29 ++
 src/backend/storage/freespace/indexfsm.c  |  2 +-
 src/include/storage/freespace.h   |  2 +-
 6 files changed, 78 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin

Re: Disabling src/test/[ssl|ldap] when not building with SSL/LDAP support

2018-02-08 Thread Tom Lane

Michael Paquier  writes:
> In order to run tests consistently on the whole tree, I use a simple
> alias which tests also things like src/test/ssl and src/test/ldap on the
> way.

> Lately, I am getting annoyed by $subject when working on OpenSSL stuff
> as sometimes I need to test things with and without SSL support to make
> sure that a patch is rightly shaped.  However if configure is built
> without SSL support then src/test/ssl just fails.  The same applies to
> src/test/ldap.

> Could it be possible to disable them using something like the attached
> if a build is done without SSL and/or LDAP?

That seems like possibly not such a great idea.  If somebody were using
a run of src/test/ssl to check their build, they would now get no
notification if they'd forgotten to build --with-openssl.

Perhaps we could introduce some new targets, "check-if-enabled" or so,
that act like you want.

regards, tom lane

Re: JIT compiling with LLVM v10.0

2018-02-08 Thread Andres Freund

On 2018-02-08 15:14:42 +0100, Dmitry Dolgov wrote:
> > On 8 February 2018 at 10:29, Andreas Karlsson  wrote:
> >> On 02/07/2018 03:54 PM, Andres Freund wrote:
> >>
> >> I've pushed v10.0. The big (and pretty painful to make) change is that
> >> now all the LLVM specific code lives in src/backend/jit/llvm, which is
> >> built as a shared library which is loaded on demand.
> >
> >
> > It does not seem to be possible build without LLVM anymore.

Yea, wrong header included. Will fix.


> Maybe I'm doing something wrong, but I also see some issues during compilation
> even with llvm included (with gcc 5.4.0 and 7.1.0). Is it expected, do I need
> to use another version to check it out?
> 
> $ git rev-parse HEAD
> e24cac5951575cf86f138080acec663a0a05983e
> 
> $ ./configure --prefix=/build/postgres-jit/ --with-llvm
> --enable-debug --enable-depend --enable-cassert

Seems you need to provide a decent C++ compiler (via CXX=... to
configure). Will test that it actually works with a recent clang.

Greetings,

Andres Freund

Re: non-bulk inserts and tuple routing

2018-02-08 Thread Robert Haas

On Thu, Feb 8, 2018 at 5:16 AM, Etsuro Fujita
 wrote:
> if (resultRelInfo == NULL);
> {
> /* Initialize partition info. */
> resultRelInfo = ExecInitPartitionInfo(mtstate,
>   saved_resultRelInfo,
>   proute,
>   estate,
>   leaf_part_index);
> }

I'm pretty sure that code has one semicolon too many.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] PATCH: enabling parallel execution for cursors explicitly (experimental)

2018-02-08 Thread Robert Haas

On Thu, Feb 8, 2018 at 9:04 AM, Amit Kapila  wrote:
> I guess workers need to wait till leader become active and processes
> the error message.

So if you kill a worker, it doesn't die but sits there waiting for
someone to run a command in the leader?  That sounds terrible.

>> Also, if you're OK with not being able to do anything except fetch
>> from the cursor until you reach the end, then couldn't you just issue
>> the query without the cursor and use PQsetSingleRowMode?
>
> I think there is a lot of cursor usage via plpgsql in which it could be 
> useful.

I don't see how.  If you can't do anything but fetch from the cursor
while the cursor is open, then you can't go start doing things in
plpgsql code either.  If you allow plpgsql code to run during that
time, then it can do arbitrary stuff which breaks the design
assumptions around parallel mode, like calling parallel-unsafe
functions or trying to do transaction control or running DML.

The whole point here is that the parallel-mode stuff is only designed
to cleanly error out in cases that we think we can hit during the
course of executing a query.  In other cases, it may elog(), Assert(),
crash, give wrong answers, etc.  As soon as you let parallel-mode
survive beyond the context of a query, you've opened a huge can of
worms which you will not be easily able to shut.  If we restrict what
you can do outside of a query to a very narrow set of operations, then
it might survive, but "execute arbitrary plpgsql code" is not a narrow
set of operations.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: JIT compiling with LLVM v10.0

2018-02-08 Thread Dmitry Dolgov

> On 8 February 2018 at 10:29, Andreas Karlsson  wrote:
>> On 02/07/2018 03:54 PM, Andres Freund wrote:
>>
>> I've pushed v10.0. The big (and pretty painful to make) change is that
>> now all the LLVM specific code lives in src/backend/jit/llvm, which is
>> built as a shared library which is loaded on demand.
>
>
> It does not seem to be possible build without LLVM anymore.

Maybe I'm doing something wrong, but I also see some issues during compilation
even with llvm included (with gcc 5.4.0 and 7.1.0). Is it expected, do I need
to use another version to check it out?

$ git rev-parse HEAD
e24cac5951575cf86f138080acec663a0a05983e

$ ./configure --prefix=/build/postgres-jit/ --with-llvm
--enable-debug --enable-depend --enable-cassert

In file included from llvmjit_error.cpp:22:0:
/usr/lib/llvm-5.0/include/llvm/Support/ErrorHandling.h:47:36:
warning: identifier 'nullptr' is a keyword in C++11 [-Wc++0x-compat]
void *user_data = nullptr);
^
In file included from /usr/include/c++/5/cinttypes:35:0,
 from /usr/lib/llvm-5.0/include/llvm/Support/DataTypes.h:39,
 from /usr/lib/llvm-5.0/include/llvm-c/Types.h:17,
 from ../../../../src/include/jit/llvmjit.h:13,
 from llvmjit_error.cpp:24:
/usr/include/c++/5/bits/c++0x_warning.h:32:2: error: #error This
file requires compiler and library support for the ISO C++ 2011
standard. This support must be enabled with the -std=c++11 or
-std=gnu++11 compiler options.
 #error This file requires compiler and library support \
  ^
In file included from llvmjit_error.cpp:22:0:
/usr/lib/llvm-5.0/include/llvm/Support/ErrorHandling.h:47:54:
error: 'nullptr' was not declared in this scope
void *user_data = nullptr);
  ^
/usr/lib/llvm-5.0/include/llvm/Support/ErrorHandling.h:57:56:
error: 'nullptr' was not declared in this scope
  void *user_data = nullptr) {
^
/usr/lib/llvm-5.0/include/llvm/Support/ErrorHandling.h:98:56:
error: 'nullptr' was not declared in this scope
  void *user_data = nullptr);
^
/usr/lib/llvm-5.0/include/llvm/Support/ErrorHandling.h:121:45:
error: 'nullptr' was not declared in this scope
 llvm_unreachable_internal(const char *msg = nullptr, const char
*file = nullptr,
 ^
/usr/lib/llvm-5.0/include/llvm/Support/ErrorHandling.h:121:73:
error: 'nullptr' was not declared in this scope
 llvm_unreachable_internal(const char *msg = nullptr, const char
*file = nullptr,
 ^
../../../../src/Makefile.global:838: recipe for target
'llvmjit_error.o' failed
make[2]: *** [llvmjit_error.o] Error 1
make[2]: Leaving directory '/postgres/src/backend/jit/llvm'
Makefile:42: recipe for target 'all-backend/jit/llvm-recurse' failed
make[1]: *** [all-backend/jit/llvm-recurse] Error 2
make[1]: Leaving directory '/postgres/src'
GNUmakefile:11: recipe for target 'all-src-recurse' failed
make: *** [all-src-recurse] Error 2

Re: [HACKERS] A design for amcheck heapam verification

2018-02-08 Thread Andrey Borodin

Hi, Peter!

> 8 февр. 2018 г., в 4:56, Peter Geoghegan  написал(а):
> 
> * Faster modulo operations.
> 
> * Removed sdbmhash().

Thanks! I definitely like how Bloom filter is implemented now.

I could not understand meaning of this, but apparently this will not harm
+   /*
+* Caller will probably use signed 32-bit pseudo-random number, so hash
+* caller's value to get 64-bit seed value
+*/
+   filter->seed = DatumGetUInt64(hash_uint32_extended(seed, 0));
I do not see a reason behind hashing the seed.

Also, I'd like to reformulate this paragraph. I understand what you want to 
say, but the sentence is incorrect.
+ * The Bloom filter behaves non-deterministically when caller passes a random
+ * seed value.  This ensures that the same false positives will not occur from
+ * one run to the next, which is useful to some callers.
Bloom filter behaves deterministically, but differently. This does not ensures 
any thing, but probably will give something with hight probability.

Thanks!

Best regards, Andrey Borodin.

Re: [HACKERS] PATCH: enabling parallel execution for cursors explicitly (experimental)

2018-02-08 Thread Amit Kapila

On Wed, Feb 7, 2018 at 9:47 PM, Robert Haas  wrote:
> On Mon, Jan 22, 2018 at 7:05 AM, Amit Kapila  wrote:
>> On error, workers should be terminated.  What kind of problem do you
>> have in mind?
>
> Hmm.  Yeah, I guess that makes sense.  If the only thing you can do is
> fetch from the cursor -- and you have to make sure to lock down
> protocol messages as well as SQL commands -- and if any error kills
> the workers -- then I guess this might be workable.  However, I think
> there are still problems.  Say the worker hits an error while the
> leader is idle; how do we report the error?
>

I guess workers need to wait till leader become active and processes
the error message.

>  It's a protocol version
> for the leader to send an unsolicited ErrorResponse, which is why we
> have to use FATAL rather than ERROR in various places for recovery
> conflicts, query cancellation, etc.
>



> Also, if you're OK with not being able to do anything except fetch
> from the cursor until you reach the end, then couldn't you just issue
> the query without the cursor and use PQsetSingleRowMode?
>

I think there is a lot of cursor usage via plpgsql in which it could be useful.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

autovacuum: change priority of the vacuumed tables

2018-02-08 Thread Ildus Kurbangaliev

Hi,

Attached patch adds 'autovacuum_table_priority' to the current list of
automatic vacuuming settings. It's used in sorting of vacuumed tables in
autovacuum worker before actual vacuum.

The idea is to give possibility to the users to prioritize their tables
in autovacuum process.

-- 
---
Regards,
Ildus Kurbangaliev
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index c45979dee4..b7383a7136 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6250,6 +6250,21 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  autovacuum_table_priority (integer)
+  
+   autovacuum_table_priority configuration parameter
+  
+  
+  
+   
+Specifies the priority of the table in automatic
+VACUUM operations. 0 by default, bigger value gives
+more priority.
+   
+  
+ 
+
 

 
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 46276ceff1..2476dfe943 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -291,6 +291,15 @@ static relopt_int intRelOpts[] =
 		},
 		-1, -1, INT_MAX
 	},
+	{
+		{
+			"autovacuum_table_priority",
+			"Sets the priority of the table in autovacuum process",
+			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST,
+			ShareUpdateExclusiveLock
+		},
+		0, INT_MIN, INT_MAX
+	},
 	{
 		{
 			"toast_tuple_target",
@@ -1353,6 +1362,8 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
 		offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, multixact_freeze_table_age)},
 		{"log_autovacuum_min_duration", RELOPT_TYPE_INT,
 		offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, log_min_duration)},
+		{"autovacuum_table_priority", RELOPT_TYPE_INT,
+		offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, priority)},
 		{"toast_tuple_target", RELOPT_TYPE_INT,
 		offsetof(StdRdOptions, toast_tuple_target)},
 		{"autovacuum_vacuum_scale_factor", RELOPT_TYPE_REAL,
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 702f8d8188..174de4a08c 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -174,15 +174,22 @@ typedef struct avw_dbase
 	PgStat_StatDBEntry *adw_entry;
 } avw_dbase;
 
-/* struct to keep track of tables to vacuum and/or analyze, in 1st pass */
+/* struct for vacuumed or analyzed relation */
 typedef struct av_relation
+{
+	Oid			ar_relid;
+	int			ar_priority;	/* bigger- more important, used for sorting */
+} av_relation;
+
+/* struct to keep track of toast tables to vacuum and/or analyze, in 1st pass */
+typedef struct av_toastrelation
 {
 	Oid			ar_toastrelid;	/* hash key - must be first */
 	Oid			ar_relid;
 	bool		ar_hasrelopts;
 	AutoVacOpts ar_reloptions;	/* copy of AutoVacOpts from the main table's
  * reloptions, or NULL if none */
-} av_relation;
+} av_toastrelation;
 
 /* struct to keep track of tables to vacuum and/or analyze, after rechecking */
 typedef struct autovac_table
@@ -1919,6 +1926,16 @@ get_database_list(void)
 	return dblist;
 }
 
+/* qsort comparator for av_relation, using priority */
+static int
+autovac_comparator(const void *a, const void *b)
+{
+	av_relation	   *r1 = (av_relation *) lfirst(*(ListCell **) a);
+	av_relation	   *r2 = (av_relation *) lfirst(*(ListCell **) b);
+
+	return r2->ar_priority - r1->ar_priority;
+}
+
 /*
  * Process a database table-by-table
  *
@@ -1932,7 +1949,7 @@ do_autovacuum(void)
 	HeapTuple	tuple;
 	HeapScanDesc relScan;
 	Form_pg_database dbForm;
-	List	   *table_oids = NIL;
+	List	   *optables = NIL;
 	List	   *orphan_oids = NIL;
 	HASHCTL		ctl;
 	HTAB	   *table_toast_map;
@@ -2021,7 +2038,7 @@ do_autovacuum(void)
 	/* create hash table for toast <-> main relid mapping */
 	MemSet(&ctl, 0, sizeof(ctl));
 	ctl.keysize = sizeof(Oid);
-	ctl.entrysize = sizeof(av_relation);
+	ctl.entrysize = sizeof(av_toastrelation);
 
 	table_toast_map = hash_create("TOAST to main relid map",
   100,
@@ -2101,9 +2118,14 @@ do_autovacuum(void)
   effective_multixact_freeze_max_age,
   &dovacuum, &doanalyze, &wraparound);
 
-		/* Relations that need work are added to table_oids */
+		/* Relations that need work are added to optables */
 		if (dovacuum || doanalyze)
-			table_oids = lappend_oid(table_oids, relid);
+		{
+			av_relation *rel = (av_relation *) palloc(sizeof(av_relation));
+			rel->ar_relid = relid;
+			rel->ar_priority = relopts != NULL ? relopts->priority : 0;
+			optables = lappend(optables, rel);
+		}
 
 		/*
 		 * Remember TOAST associations for the second pass.  Note: we must do
@@ -2112,7 +2134,7 @@ do_autovacuum(void)
 		 */
 		if (OidIsValid(classForm->reltoastrelid))
 		{
-			av_relation *hentry;
+			av_toastrelation *hentry;
 			bool		found;
 
 			hentry = hash_search(table_toast_map,
@@ -2168,7 +2190,7 @@ do_autovacuum(void)
 		relopts = extract_autovac_opts(tuple, pg_cl

Re: PDF Builds on borka (Debian/stretch) broken - 9.6

2018-02-08 Thread Stephen Frost

Peter,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 2/7/18 17:21, Stephen Frost wrote:
> > Looks like Nov 15 (which I believe is when the stretch upgrade was done)
> > it was upgraded:
> > 
> > 2017-11-15 17:38:55 upgrade openjade:amd64 1.4devel1-21.1 1.4devel1-21.3+b1
> > 
> > That doesn't look like a terribly large version bump to me tho..
> 
> You probably had the openjade1.3 package installed, and maybe it got
> deleted or the alternative uninstalled during the upgrade.

Thanks for that, yes, that got uninstalled but I've now re-installed it
and the PDF builds for 9.6-9.3 are working and will be included in
today's release.

Thanks again!

Stephen


signature.asc
Description: PGP signature

Re: [HACKERS] Partition-wise aggregation/grouping

2018-02-08 Thread Jeevan Chalke

Hi,

In this attached version, I have rebased my changes over new design of
partially_grouped_rel. The preparatory changes of adding
partially_grouped_rel are in 0001.

Also to minimize finalization code duplication, I have refactored them into
two separate functions, finalize_sorted_partial_agg_path() and
finalize_hashed_partial_agg_path(). I need to create these two functions as
current path creation order in like,
Sort Agg Path
Sort Agg Path - Parallel Aware (Finalization needed here)
Hash Agg Path
Hash Agg Path - Parallel Aware (Finalization needed here)
And if we club those finalizations together, then path creation order will
be changed and it may result in the existing plan changes.
Let me know if that's OK, I will merge them together as they are distinct
anyways. These changes are part of 0002.

0003 - 0006 are refactoring patches as before.

0007 is the main patch per new design. I have removed
create_partition_agg_paths() altogether as finalization code is reused.
Also, renamed preferFullAgg with forcePartialAgg as we forcefully needed a
partial path from nested level if the parent is doing a partial
aggregation. add_single_path_to_append_rel() is no more exists and also
there is no need to pass OtherUpperPathExtraData to
add_paths_to_append_rel().

0008 - 0009, testcase and postgres_fdw changes.

Please have a look at new changes and let me know if I missed any.

Thanks

On Fri, Feb 2, 2018 at 7:29 PM, Robert Haas  wrote:

> On Fri, Feb 2, 2018 at 8:25 AM, Jeevan Chalke
>  wrote:
> >> The problem is that create_partition_agg_paths() is doing *exactly*
> >> same thing that add_paths_to_grouping_rel() is already doing inside
> >> the blocks that say if (grouped_rel->partial_pathlist).  We don't need
> >> two copies of that code.  Both of those places except to take a
> >> partial path that has been partially aggregated and produce a
> >> non-partial path that is fully aggregated.  We do not need or want two
> >> copies of that code.
> >
> > OK. Got it.
> >
> > Will try to find a common place for them and will also check how it goes
> > with your suggested design change.
> >
> >> Here's another way to look at it.  We have four kinds of things.
> >>
> >> 1. Partially aggregated partial paths
> >> 2. Partially aggregated non-partial paths
> >> 3. Fully aggregated partial paths
> >> 4. Fully aggregated non-partial paths
>
> So in the new scheme I'm proposing, you've got a partially_grouped_rel
> and a grouped_rel.  So all paths of type #1 go into
> partially_grouped_rel->partial_pathlist, paths of type #2 go into
> partially_grouped_rel->pathlist, type #3 (if we have any) goes into
> grouped_rel->partial_pathlist, and type #4 goes into
> grouped_rel->pathlist.
>
> > add_paths_to_append_rel() -> generate_mergeappend_paths() does not
> consider
> > partial_pathlist. Thus we will never see MergeAppend over parallel scan
> > given by partial_pathlist. And thus plan like:
> > -> Gather Merge
> >   -> MergeAppend
> > is not possible with current HEAD.
> >
> > Are you suggesting we should implement that here? I think that itself is
> a
> > separate task.
>
> Oh, I didn't realize that wasn't working already.  I agree that it's a
> separate task from this patch, but it's really too bad that it doesn't
> already work.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

partition-wise-agg-v13.tar.gz
Description: GNU Zip compressed data

Re: SSL test names

2018-02-08 Thread Daniel Gustafsson

> On 07 Feb 2018, at 17:54, Peter Eisentraut  
> wrote:

> I have found the old way very confusing while working with several
> SSL-related patches recently.

Agreed.  I had similar, but way uglier, hacks in my Secure Transport branch.
+1 on something like this.

cheers ./daniel

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-02-08 Thread Robert Haas

On Tue, Feb 6, 2018 at 3:53 PM, Robert Haas  wrote:
> On Tue, Feb 6, 2018 at 2:11 PM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> Unfortunately valgrind does not work at all on my laptop -- the server
>>> appears to start, but as soon as you try to connect, the whole thing
>>> dies with an error claiming that the startup process has failed.  So I
>>> can't easily test this at the moment.  I'll try to get it working,
>>> here or elsewhere, but thought I'd send the above reply first.
>>
>> Do you want somebody who does have a working valgrind installation
>> (ie me) to take responsibility for pushing this patch?
>
> I committed it before seeing this.  It probably would've been better
> if you had done it, but I assume Peter tested it, so let's see what
> the BF thinks.

skink and lousyjack seem happy now, so I think it worked.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] logical decoding of two-phase transactions

2018-02-08 Thread Stas Kelvich

Hi!

Thanks for working on this patch.

Reading through patch I’ve noticed that you deleted call to SnapBuildCommitTxn()
in DecodePrepare(). As you correctly spotted upthread there was unnecessary
code that marked transaction as running after decoding of prepare. However call
marking it as committed before decoding of prepare IMHO is still needed as
SnapBuildCommitTxn does some useful thing like setting base snapshot for parent
transactions which were skipped because of SnapBuildXactNeedsSkip().

E.g. current code will crash in assert for following transaction:

BEGIN;
SAVEPOINT one;
CREATE TABLE test_prepared_savepoints (a int);
PREPARE TRANSACTION 'x';
COMMIT PREPARED 'x';
:get_with2pc_nofilter
:get_with2pc_nofilter  <- second call will crash decoder

With following backtrace:

   frame #3: 0x00010dc47b40 
postgres`ExceptionalCondition(conditionName="!(txn->ninvalidations == 0)", 
errorType="FailedAssertion", fileName="reorderbuffer.c", lineNumber=1944) at 
assert.c:54
frame #4: 0x00010d9ff4dc 
postgres`ReorderBufferForget(rb=0x7fe1ab832318, xid=816, lsn=35096144) at 
reorderbuffer.c:1944
frame #5: 0x00010d9f055c postgres`DecodePrepare(ctx=0x7fe1ab81b918, 
buf=0x7ffee2650408, parsed=0x7ffee2650088) at decode.c:703
frame #6: 0x00010d9ef718 postgres`DecodeXactOp(ctx=0x7fe1ab81b918, 
buf=0x7ffee2650408) at decode.c:310

That can be fixed by calling SnapBuildCommitTxn() in DecodePrepare()
which I believe is safe because during normal work prepared transaction
holds relation locks until commit/abort and in between nobody can access
altered relations (or just I don’t know such situations — that was the reason
why i had marked that xids as running in previous versions).

> On 6 Feb 2018, at 15:20, Nikhil Sontakke  wrote:
> 
> Hi all,
> 
>> 
>> PFA, patch which applies cleanly against latest git head. I also
>> removed unwanted newlines and took care of the cleanup TODO about
>> making ReorderBufferTXN structure using a txn_flags field instead of
>> separate booleans for various statuses like has_catalog_changes,
>> is_subxact, is_serialized etc. The patch uses this txn_flags field for
>> the newer prepare related info as well.
>> 
>> "make check-world" passes ok, including the additional regular and tap
>> tests that we have added as part of this patch.
>> 
> 
> PFA, latest version of this patch.
> 
> This latest version takes care of the abort-while-decoding issue along
> with additional test cases and documentation changes.
> 
> 

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: In logical replication concurrent update of partition key creates a duplicate record on standby.

2018-02-08 Thread Amit Kapila

On Wed, Feb 7, 2018 at 6:00 PM, Amit Kapila  wrote:
> On Wed, Feb 7, 2018 at 3:42 PM, amul sul  wrote:
>> On Wed, Feb 7, 2018 at 3:03 PM, Amit Khandekar  
>> wrote:
>>> On 7 February 2018 at 13:53, amul sul  wrote:
 Hi,

 If an update of partition key involves tuple movement from one partition to
 another partition then there will be a separate delete on one partition and
 insert on the other partition made.

 In the logical replication if an update performed on the master and 
 standby at
 the same moment, then replication worker tries to replicate delete + insert
 operation on standby. While replying master changes on standby for the 
 delete
 operation worker will log "concurrent update, retrying" message (because 
 the
 update on standby has already deleted) and move forward to reply the next
 insert operation. Standby update also did the same delete+insert is as 
 part of
 the update of partition key in a result there will be two records inserted 
 on
 standby.
>>>
>>> A quick thinking on how to resolve this makes me wonder if we can
>>> manage to pass some information through logical decoding that the
>>> delete is part of a partition key update. This is analogous to how we
>>> set some information locally in the tuple by setting
>>> tp.t_data->t_ctid.ip_blkid to InvalidBlockNumber.
>>>
>>
>> +1,
>>
>
> I also mentioned the same thing in the other thread [1], but I think
> that alone won't solve the dual record problem as you are seeing.  I
> think we need to do something for next insert as you are suggesting.
>

Can you please once check what was the behavior before Update Tuple
routing patch (Commit-id: 2f178441) went in?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Proposal: partition pruning by secondary attributes

2018-02-08 Thread Ildar Musin


Hello, hackers!

Sorry if this have already been discussed. I've had this idea some time
ago and then successfully forgot about it until pgconf.ru, where I had a
conversation with one of postgres users. His situation could be
described as this: they have a table with id, timestamp and some other
attributes, which is partitioned by (let's say) timestamp column. In
different contexts they may want to filter the table either by id or by
timestamp attribute (but not both). In this case pruning will only work
for timestamp column.

The idea is to store min and max values of secondary attributes (like
'id' in the example above) for each partition somewhere in catalog and
use it for partition pruning along with partitioning key. You can think
of it as somewhat like BRIN index but for partitions. And it will have
similar limitations. For example, we may benefit if secondary attribute
values are monotonically increase or decrease, but would be unhelpful if
they are scattered, or if table wasn't partitioned by range.

I wanted to ask community's opinion would it be worth considering.
Thanks!

--
Ildar Musin
i.mu...@postgrespro.ru

Re: non-bulk inserts and tuple routing

2018-02-08 Thread Etsuro Fujita


(2018/02/07 19:36), Etsuro Fujita wrote:

(2018/02/05 19:43), Etsuro Fujita wrote:

(2018/02/05 14:34), Amit Langote wrote:

Here is the updated version that contains two patches as described
above.


Here are some minor comments:

o On changes to ExecInsert

* This might be just my taste, but I think it would be better to (1) 
change ExecInitPartitionInfo so that it creates and returns a 
newly-initialized ResultRelInfo for an initialized partition, and (2) 
rewrite this bit:


+   /* Initialize partition info, if not done already. */
+   ExecInitPartitionInfo(mtstate, resultRelInfo, proute, estate,
+ leaf_part_index);
+
/*
 * Save the old ResultRelInfo and switch to the one 
corresponding to

 * the selected partition.
 */
saved_resultRelInfo = resultRelInfo;
resultRelInfo = proute->partitions[leaf_part_index];
+   Assert(resultRelInfo != NULL);

to something like this (I would say the same thing to the copy.c changes):

/*
 * Save the old ResultRelInfo and switch to the one corresponding to
 * the selected partition.
 */
saved_resultRelInfo = resultRelInfo;
resultRelInfo = proute->partitions[leaf_part_index];
if (resultRelInfo == NULL);
{
/* Initialize partition info. */
resultRelInfo = ExecInitPartitionInfo(mtstate,
  saved_resultRelInfo,
  proute,
  estate,
  leaf_part_index);
}

This would make ExecInitPartitionInfo more simple because it can assume 
that the given partition has not been initialized yet.


o On changes to execPartition.h

* Please add a brief decsription about partition_oids to the comments 
for this struct.


@@ -91,6 +91,7 @@ typedef struct PartitionTupleRouting
 {
PartitionDispatch *partition_dispatch_info;
int num_dispatch;
+   Oid*partition_oids;

That's it.

Best regards,
Etsuro Fujita

Re: JIT compiling with LLVM v10.0

2018-02-08 Thread Andreas Karlsson


On 02/07/2018 03:54 PM, Andres Freund wrote:

I've pushed v10.0. The big (and pretty painful to make) change is that
now all the LLVM specific code lives in src/backend/jit/llvm, which is
built as a shared library which is loaded on demand.


It does not seem to be possible build without LLVM anymore.

Error:

In file included from planner.c:32:0:
../../../../src/include/jit/llvmjit.h:13:10: fatal error: 
llvm-c/Types.h: No such file or directory

 #include 
  ^~~~

Options:

./configure --prefix=/home/andreas/dev/postgresql-inst 
--enable-tap-tests --enable-cassert --enable-debug


I also noticed the following typo:

diff --git a/configure.in b/configure.in
index b035966c0a..b89c4a138a 100644
--- a/configure.in
+++ b/configure.in
@@ -499,7 +499,7 @@ fi
 if test "$enable_coverage" = yes; then
   if test "$GCC" = yes; then
 CFLAGS="$CFLAGS -fprofile-arcs -ftest-coverage"
-CFLAGS="$CXXFLAGS -fprofile-arcs -ftest-coverage"
+CXXFLAGS="$CXXFLAGS -fprofile-arcs -ftest-coverage"
   else
 AC_MSG_ERROR([--enable-coverage is supported only when using GCC])
   fi

Andreas

Re: [HACKERS] More stats about skipped vacuums

2018-02-08 Thread Kyotaro HORIGUCHI

At Thu, 08 Feb 2018 18:04:15 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20180208.180415.112312013.horiguchi.kyot...@lab.ntt.co.jp>
> > > I suggest we remove support for dynamic_shared_memory_type = none first,
> > > and see if we get any complaints.  If we don't, then future patches can
> > > rely on it being present.
> > 
> > If we remove it in v11, it'd still be maybe a year from now before we'd
> > have much confidence from that alone that nobody cares.  I think the lack
> > of complaints about it in 9.6 and 10 is a more useful data point.
> 
> So that means that we are assumed to be able to rely on the
> existence of DSM at the present since over a year we had no
> complain despite the fact that DSM is silently turned on? And
> apart from that we are ready to remove 'none' from the options of
> dynamic_shared_memory_type right now?

I found the follwoing commit related to this.

| commit d41ab71712a4457ed39d5471b23949872ac91def
| Author: Robert Haas 
| Date:   Wed Oct 16 09:41:03 2013 -0400
| 
| initdb: Suppress dynamic shared memory when probing for max_connections.
| 
| This might not be the right long-term solution here, but it will
| hopefully turn the buildfarm green again.
| 
| Oversight noted by Andres Freund

The discussion is found here.

https://www.postgresql.org/message-id/ca+tgmoyhiigrcvssjhmbsebmof2zx_9_9rwd75cwvu99yrd...@mail.gmail.com

I suppose that the problem has not been resolved yet..

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [HACKERS] More stats about skipped vacuums

2018-02-08 Thread Kyotaro HORIGUCHI

Hello,

At Wed, 07 Feb 2018 16:59:20 -0500, Tom Lane  wrote in 
<3246.1518040...@sss.pgh.pa.us>
> Robert Haas  writes:
> > It seems to me that there was a thread where Tom proposed removing
> > support for dynamic_shared_memory_type = none.
> 
> I think you're recalling <32138.1502675...@sss.pgh.pa.us>, wherein
> I pointed out that
> 
> >>> Whether that's worth the trouble is debatable.  The current code
> >>> in initdb believes that every platform has some type of DSM support
> >>> (see choose_dsm_implementation).  Nobody's complained about that,
> >>> and it certainly works on every buildfarm animal.  So for all we know,
> >>> dynamic_shared_memory_type = none is broken already.
> 
> (That was in fact in the same thread Kyotaro-san just linked to about
> reimplementing the stats collector.)
> 
> It's still true that we've no reason to believe there are any supported
> platforms that haven't got some sort of DSM.  Performance might be a
> different question, of course ... but it's hard to believe that
> transferring stats through DSM wouldn't be better than writing them
> out to files.

Good to hear. Thanks.

> > I suggest we remove support for dynamic_shared_memory_type = none first,
> > and see if we get any complaints.  If we don't, then future patches can
> > rely on it being present.
> 
> If we remove it in v11, it'd still be maybe a year from now before we'd
> have much confidence from that alone that nobody cares.  I think the lack
> of complaints about it in 9.6 and 10 is a more useful data point.

So that means that we are assumed to be able to rely on the
existence of DSM at the present since over a year we had no
complain despite the fact that DSM is silently turned on? And
apart from that we are ready to remove 'none' from the options of
dynamic_shared_memory_type right now?

If I may rely on DSM, fallback stuff would not be required.


>   regards, tom lane

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

63 matches

Mail list logo