Re: [HACKERS] Runtime Partition Pruning

2018-04-25 Thread David Rowley
On 25 April 2018 at 05:10, Alvaro Herrera  wrote:
> Pushed.  Thanks!

Thanks!

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-04-25 Thread David Rowley
On 25 April 2018 at 06:21, Alvaro Herrera  wrote:
> Andres Freund wrote:
>
>> What I wonder, after skimming this change, is where the relevant
>> expression context is reset?  That's not really related to this change
>> but the wider thread, I just noticed it while looking at this.
>
> Do you mean ResetExprContext?  We use the plan's context, so it should
> occur together with the plan reset itself, i.e. nodeAppend.c should do
> it somewhere.
>
> ... Hmm, it appears we don't do it anywhere there actually.

It's not immediately obvious to me why this is required.

All the expressions that are initialized here must live the entire
length of the executor run, and since they're allocated in the
ExecutorState context they'll be destroyed in FreeExecutorState().

If we do need to call ResetExprContext for these, then we'd just need
to invent a teardown for ExecSetupPartitionPruneState which would free
off the memory allocated and call ResetExprContext for all non-NULL
ExprStates in each context->exprstates. This function would need to be
called from the node's End function for any node that's set up a
PartitionPruneState.

Do we really need to do this given that the memory context these are
allocated in will be released a moment later anyway?

Just to ensure I'm not dreaming, I ran the following and couldn't see
the backend's memory consumption move.

create table lp (a int, value int) partition by list(a);
create table lp_1 partition of lp for values in(1);
create table lp_2 partition of lp for values in(2);
create function lp_value(p_a int) returns int as $$ select value from
lp where a = p_a; $$ language sql;
insert into lp values(1,10),(2,20);
select sum(lp_value(x)) from generate_Series(1,2) x,
generate_series(1,1000);

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-04-24 Thread Alvaro Herrera
Andres Freund wrote:

> What I wonder, after skimming this change, is where the relevant
> expression context is reset?  That's not really related to this change
> but the wider thread, I just noticed it while looking at this.

Do you mean ResetExprContext?  We use the plan's context, so it should
occur together with the plan reset itself, i.e. nodeAppend.c should do
it somewhere.

... Hmm, it appears we don't do it anywhere there actually.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-04-24 Thread Andres Freund
On 2018-04-19 12:04:35 +1200, David Rowley wrote:
> On 19 April 2018 at 03:13, Robert Haas  wrote:
> > 10% is more than a "slight" improvement, I'd say!  It's certainly got
> > to be worth avoiding the repeated calls to ExecInitExpr, whatever we
> > do about the memory contexts.

Yea, that seems important. Good that that got in.

What I wonder, after skimming this change, is where the relevant
expression context is reset?  That's not really related to this change
but the wider thread, I just noticed it while looking at this.

Greetings,

Andres Freund



Re: [HACKERS] Runtime Partition Pruning

2018-04-24 Thread Alvaro Herrera
Pushed.  Thanks!

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-04-24 Thread Alvaro Herrera
Amit Langote wrote:

> That's neat!  Definitely agree that we should call ExecInitExpr just once
> here.  The patch looks good too, except the long line.

How about this as a small tweak?  Determine the array index using a
macro, which serves as documentation.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 135740cd17e8c05f36a3b91b3af29515df6a2d3b Mon Sep 17 00:00:00 2001
From: "dgrow...@gmail.com" 
Date: Thu, 19 Apr 2018 11:51:16 +1200
Subject: [PATCH v3] Initialize ExprStates once in run-time partition pruning
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Instead of doing ExecInitExpr every time a Param needs to be evaluated
in run-time partition pruning, do it once during run-time pruning
set-up and cache the exprstate in PartitionPruneContext, saving a lot of
work.

Author: David Rowley
Reviewed-by: Amit Langote, Álvaro Herrera
Discussion: 
https://postgr.es/m/CAKJS1f8-x+q-90QAPDu_okhQBV4DPEtPz8CJ=m0940gyt4d...@mail.gmail.com
---
 src/backend/executor/execPartition.c | 35 +++
 src/backend/partitioning/partprune.c | 27 +--
 src/include/partitioning/partprune.h |  9 +
 3 files changed, 61 insertions(+), 10 deletions(-)

diff --git a/src/backend/executor/execPartition.c 
b/src/backend/executor/execPartition.c
index f7bbb804aa..f7418f64b1 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1442,7 +1442,9 @@ ExecSetupPartitionPruneState(PlanState *planstate, List 
*partitionpruneinfo)
PartitionDesc partdesc;
Relationrel;
PartitionKey partkey;
+   ListCell   *lc2;
int partnatts;
+   int n_steps;
 
pprune->present_parts = bms_copy(pinfo->present_parts);
pprune->subnode_map = palloc(sizeof(int) * pinfo->nparts);
@@ -1465,6 +1467,7 @@ ExecSetupPartitionPruneState(PlanState *planstate, List 
*partitionpruneinfo)
 
partkey = RelationGetPartitionKey(rel);
partdesc = RelationGetPartitionDesc(rel);
+   n_steps = list_length(pinfo->pruning_steps);
 
context->strategy = partkey->strategy;
context->partnatts = partnatts = partkey->partnatts;
@@ -1476,6 +1479,38 @@ ExecSetupPartitionPruneState(PlanState *planstate, List 
*partitionpruneinfo)
context->boundinfo = partition_bounds_copy(partdesc->boundinfo, 
partkey);
context->planstate = planstate;
context->safeparams = NULL; /* empty for now */
+   context->exprstates = palloc0(sizeof(ExprState *) * n_steps * 
partnatts);
+
+   /* Initialize expression states for each expression */
+   foreach(lc2, pinfo->pruning_steps)
+   {
+   PartitionPruneStepOp *step = (PartitionPruneStepOp *) 
lfirst(lc2);
+   ListCell   *lc3;
+   int keyno;
+
+   /* not needed for other step kinds */
+   if (!IsA(step, PartitionPruneStepOp))
+   continue;
+
+   Assert(list_length(step->exprs) <= partnatts);
+
+   keyno = 0;
+   foreach(lc3, step->exprs)
+   {
+   Expr   *expr = (Expr *) lfirst(lc3);
+   int stateidx;
+
+   /* not needed for Consts */
+   if (!IsA(expr, Const))
+   {
+   stateidx = PruneCxtStateIdx(partnatts,
+   
step->step.step_id, keyno);
+   context->exprstates[stateidx] =
+   ExecInitExpr(expr, 
context->planstate);
+   }
+   keyno++;
+   }
+   }
 
pprune->pruning_steps = pinfo->pruning_steps;
pprune->extparams = bms_copy(pinfo->extparams);
diff --git a/src/backend/partitioning/partprune.c 
b/src/backend/partitioning/partprune.c
index f8844ef2eb..62159477c1 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -169,7 +169,7 @@ static PruneStepResult 
*perform_pruning_combine_step(PartitionPruneContext *cont
 static bool match_boolean_partition_clause(Oid partopfamily, Expr *clause,
   Expr *partkey, Expr 
**outconst);
 static bool partkey_datum_from_expr(PartitionPruneContext 

Re: [HACKERS] Runtime Partition Pruning

2018-04-23 Thread Alvaro Herrera
Anybody wanna argue against pushing this patch now?  I'm inclined to
push it on the grounds of being closure for already committed work, but
there are possible arguments about this being new development after
feature freeze.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-04-18 Thread Amit Langote
Hi David.

On 2018/04/19 9:04, David Rowley wrote:
> On 19 April 2018 at 03:13, Robert Haas  wrote:
>> On Mon, Apr 16, 2018 at 10:46 PM, David Rowley
>>  wrote:
>>> The patch does happen to improve performance slightly, but that is
>>> most likely due to the caching of the ExprStates rather than the
>>> change of memory management. It's not really possible to do that with
>>> the reset unless we stored the executor's memory context in
>>> PartitionPruneContext and did a context switch back inside
>>> partkey_datum_from_expr before calling ExecInitExpr.
>>
>> 10% is more than a "slight" improvement, I'd say!  It's certainly got
>> to be worth avoiding the repeated calls to ExecInitExpr, whatever we
>> do about the memory contexts.
> 
> I've attached a patch which does just this. On benchmarking again with
> this single change performance has improved 15% over master.
> 
> Also, out of curiosity, I also checked what this performed like before
> the run-time pruning patch was committed (5c0675215). Taking the
> average of the times below, it seems without this patch the
> performance of this case has improved about 356% and about 410% with
> this patch. So, I agree, it might be worth considering.
> 
> create table p (a int, value int) partition by hash (a);
> select 'create table p'||x|| ' partition of p for values with (modulus
> 10, remainder '||x||');' from generate_series(0,9) x;
> \gexec
> create table t1 (a int);
> 
> insert into p select x,x from generate_Series(1,1000) x;
> insert into t1 select x from generate_series(1,1000) x;
> 
> create index on p(a);
> 
> set enable_hashjoin = 0;
> set enable_mergejoin = 0;
> explain analyze select count(*) from t1 inner join p on t1.a=p.a;
> 
> -- Unpatched
> Execution Time: 20413.975 ms
> Execution Time: 20232.050 ms
> Execution Time: 20229.116 ms
> 
> -- Patched
> Execution Time: 17758.111 ms
> Execution Time: 17645.151 ms
> Execution Time: 17492.260 ms
> 
> -- 5c0675215e153ba1297fd494b34af2fdebd645d1
> Execution Time: 72875.161 ms
> Execution Time: 71817.757 ms
> Execution Time: 72411.730 ms

That's neat!  Definitely agree that we should call ExecInitExpr just once
here.  The patch looks good too, except the long line.  Maybe:

@@ -1514,13 +1514,15 @@ ExecSetupPartitionPruneState(PlanState *planstate,
List *partitionpruneinfo)
 foreach(lc3, step->exprs)
 {
 Expr   *expr = (Expr *) lfirst(lc3);
+int step_id = step->step.step_id;

 /*
  * partkey_datum_from_expr does not need an expression state
  * to evaluate a Const.
  */
 if (!IsA(expr, Const))
-context->exprstates[step->step.step_id * partnatts +
keyno] = ExecInitExpr(expr, context->planstate);
+context->exprstates[step_id * partnatts + keyno] =
+ExecInitExpr(expr, context->planstate);

Thanks,
Amit




Re: [HACKERS] Runtime Partition Pruning

2018-04-18 Thread David Rowley
On 19 April 2018 at 12:04, David Rowley  wrote:
> insert into p select x,x from generate_Series(1,1000) x;
> insert into t1 select x from generate_series(1,1000) x;

Correction. These were meant to read:

insert into p select x,x from generate_Series(1,1000) x;
insert into t1 select x from generate_series(1,1000) x;

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-04-18 Thread David Rowley
On 19 April 2018 at 03:13, Robert Haas  wrote:
> On Mon, Apr 16, 2018 at 10:46 PM, David Rowley
>  wrote:
>> I did go and start working on a patch to test how possible this would
>> be and came up with the attached. I've left a stray
>> MemoryContextStatsDetail call in there which does indicate that
>> something is not being freed. I'm just not sure what it is yet.
>>
>> The patch does happen to improve performance slightly, but that is
>> most likely due to the caching of the ExprStates rather than the
>> change of memory management. It's not really possible to do that with
>> the reset unless we stored the executor's memory context in
>> PartitionPruneContext and did a context switch back inside
>> partkey_datum_from_expr before calling ExecInitExpr.
>
> 10% is more than a "slight" improvement, I'd say!  It's certainly got
> to be worth avoiding the repeated calls to ExecInitExpr, whatever we
> do about the memory contexts.

I've attached a patch which does just this. On benchmarking again with
this single change performance has improved 15% over master.

Also, out of curiosity, I also checked what this performed like before
the run-time pruning patch was committed (5c0675215). Taking the
average of the times below, it seems without this patch the
performance of this case has improved about 356% and about 410% with
this patch. So, I agree, it might be worth considering.

create table p (a int, value int) partition by hash (a);
select 'create table p'||x|| ' partition of p for values with (modulus
10, remainder '||x||');' from generate_series(0,9) x;
\gexec
create table t1 (a int);

insert into p select x,x from generate_Series(1,1000) x;
insert into t1 select x from generate_series(1,1000) x;

create index on p(a);

set enable_hashjoin = 0;
set enable_mergejoin = 0;
explain analyze select count(*) from t1 inner join p on t1.a=p.a;

-- Unpatched
Execution Time: 20413.975 ms
Execution Time: 20232.050 ms
Execution Time: 20229.116 ms

-- Patched
Execution Time: 17758.111 ms
Execution Time: 17645.151 ms
Execution Time: 17492.260 ms

-- 5c0675215e153ba1297fd494b34af2fdebd645d1
Execution Time: 72875.161 ms
Execution Time: 71817.757 ms
Execution Time: 72411.730 ms

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


0001-Initialize-expr-states-once-in-run-time-partition-pr.patch
Description: Binary data


Re: [HACKERS] Runtime Partition Pruning

2018-04-18 Thread Robert Haas
On Mon, Apr 16, 2018 at 10:46 PM, David Rowley
 wrote:
> I did go and start working on a patch to test how possible this would
> be and came up with the attached. I've left a stray
> MemoryContextStatsDetail call in there which does indicate that
> something is not being freed. I'm just not sure what it is yet.
>
> The patch does happen to improve performance slightly, but that is
> most likely due to the caching of the ExprStates rather than the
> change of memory management. It's not really possible to do that with
> the reset unless we stored the executor's memory context in
> PartitionPruneContext and did a context switch back inside
> partkey_datum_from_expr before calling ExecInitExpr.

10% is more than a "slight" improvement, I'd say!  It's certainly got
to be worth avoiding the repeated calls to ExecInitExpr, whatever we
do about the memory contexts.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Runtime Partition Pruning

2018-04-16 Thread David Rowley
On 17 April 2018 at 14:33, Alvaro Herrera  wrote:
> David Rowley wrote:
>
>> For a while, during my review of the faster partition pruning patch I
>> was asking Amit to add pfree() calls in various places for this exact
>> reason, but in the end, I gave up and decided it was easier to just
>> create a new memory context to call the planner function from. I've
>> now forgotten the exact reason why I finally decided it was too much
>> trouble.  The pruning code now works using your step logic so perhaps
>> that reason no longer applies, although, on a quick scan of the
>> pruning code now, it seems to require that get_matching_partitions
>> performs a deep pfree of each PruneStepResult. However, there is still
>> partkey_datum_from_expr which performs ExecInitExpr, although perhaps
>> that can just be done once, and the result stashed in the
>> PartitionPruneContext.
>
> I think trying to replace a well-placed MemoryContextReset (or Delete)
> with a bunch of individual pfrees is pointless.

I agree. I think I'd sleep better at night with the context reset in
there rather than hoping we've managed to pfree everything.

I did go and start working on a patch to test how possible this would
be and came up with the attached. I've left a stray
MemoryContextStatsDetail call in there which does indicate that
something is not being freed. I'm just not sure what it is yet.

The patch does happen to improve performance slightly, but that is
most likely due to the caching of the ExprStates rather than the
change of memory management. It's not really possible to do that with
the reset unless we stored the executor's memory context in
PartitionPruneContext and did a context switch back inside
partkey_datum_from_expr before calling ExecInitExpr.

My test case was as follows:

create table p (a int, value int) partition by hash (a);
select 'create table p'||x|| ' partition of p for values with (modulus
10, remainder '||x||');' from generate_series(0,9) x;
\gexec
create table t1 (a int);

insert into p select x,x from generate_Series(1,1000) x;
insert into t1 select x from generate_series(1,1000) x;

create index on p(a);

set enable_hashjoin = 0;
set enable_mergejoin = 0;
explain analyze select count(*) from t1 inner join p on t1.a=p.a;


-- Unpatched
Execution Time: 19725.981 ms
Execution Time: 19533.655 ms
Execution Time: 19542.854 ms

-- Patched
Execution Time: 17389.537 ms
Execution Time: 17603.802 ms
Execution Time: 17618.670 ms

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


recycle_mem_part_prune.patch
Description: Binary data


Re: [HACKERS] Runtime Partition Pruning

2018-04-16 Thread Alvaro Herrera
David Rowley wrote:

> For a while, during my review of the faster partition pruning patch I
> was asking Amit to add pfree() calls in various places for this exact
> reason, but in the end, I gave up and decided it was easier to just
> create a new memory context to call the planner function from. I've
> now forgotten the exact reason why I finally decided it was too much
> trouble.  The pruning code now works using your step logic so perhaps
> that reason no longer applies, although, on a quick scan of the
> pruning code now, it seems to require that get_matching_partitions
> performs a deep pfree of each PruneStepResult. However, there is still
> partkey_datum_from_expr which performs ExecInitExpr, although perhaps
> that can just be done once, and the result stashed in the
> PartitionPruneContext.

I think trying to replace a well-placed MemoryContextReset (or Delete)
with a bunch of individual pfrees is pointless.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-04-16 Thread David Rowley
On 14 April 2018 at 05:04, Robert Haas  wrote:
> But I wonder why it's the executor's job to clean up after the
> planner, instead of adjusting the relevant planner functions to avoid
> leaking memory?

It might be possible, but it might also be risky and difficult.

For a while, during my review of the faster partition pruning patch I
was asking Amit to add pfree() calls in various places for this exact
reason, but in the end, I gave up and decided it was easier to just
create a new memory context to call the planner function from. I've
now forgotten the exact reason why I finally decided it was too much
trouble.  The pruning code now works using your step logic so perhaps
that reason no longer applies, although, on a quick scan of the
pruning code now, it seems to require that get_matching_partitions
performs a deep pfree of each PruneStepResult. However, there is still
partkey_datum_from_expr which performs ExecInitExpr, although perhaps
that can just be done once, and the result stashed in the
PartitionPruneContext.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-04-13 Thread Robert Haas
On Thu, Apr 12, 2018 at 6:01 PM, David Rowley
 wrote:
> On 13 April 2018 at 04:57, Robert Haas  wrote:
>> BTW, looking at ExecSetupPartitionPruneState:
>>
>> /*
>>  * Create a sub memory context which we'll use when making calls to 
>> the
>>  * query planner's function to determine which partitions will
>> match.  The
>>  * planner is not too careful about freeing memory, so we'll ensure 
>> we
>>  * call the function in this context to avoid any memory leaking in 
>> the
>>  * executor's memory context.
>>  */
>>
>> This is a sloppy cut-and-paste job, not only because somebody changed
>> one copy of the word "planner" to "executor" and left the others
>> untouched, but also because the rationale isn't really correct for the
>> executor anyway, which has memory contexts all over the place and
>> frees them all the time.  I don't know whether the context is not
>> needed at all or whether the context is needed but the rationale is
>> different, but I don't buy that explanation.
>
> The comment is written exactly as intended. Unsure which of the
> "planner"s you think should be "executor".
>
> The context is needed. I can easily produce an OOM without it.

Oh, crap.  You know, I totally misread what that comment was trying to
say.  Sorry.

But I wonder why it's the executor's job to clean up after the
planner, instead of adjusting the relevant planner functions to avoid
leaking memory?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Runtime Partition Pruning

2018-04-13 Thread Amit Langote
On 2018/04/13 14:48, Amit Langote wrote:
> On 2018/04/13 14:38, Amit Langote wrote:
>> About the specific relation_open(.., NoLock) under question, I think there
>> might be a way to address this by opening the tables with the appropriate
>> lock mode in partitioned_rels list in ExecLockNonleafAppendTables
> 
> That may have sounded a bit confusing:
> 
> I meant to say: "by opening the tables in partitioned_rels list with the
> appropriate lock mode in ExecLockNonleafAppendTables"
> 
>> Attached a PoC patch.
>
> There were a couple of unnecessary hunks, which removed in the attached.

Sorry, still a couple more were unnecessary.

Thanks,
Amit
diff --git a/src/backend/executor/execPartition.c 
b/src/backend/executor/execPartition.c
index 11139f743d..e3b3911945 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1244,7 +1244,8 @@ adjust_partition_tlist(List *tlist, TupleConversionMap 
*map)
  * PartitionPruneInfo.
  */
 PartitionPruneState *
-ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
+ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo,
+Relation 
*partitioned_rels)
 {
PartitionPruningData *prunedata;
PartitionPruneState *prunestate;
@@ -1303,10 +1304,11 @@ ExecSetupPartitionPruneState(PlanState *planstate, List 
*partitionpruneinfo)
pprune->subpart_map = pinfo->subpart_map;
 
/*
-* Grab some info from the table's relcache; lock was already 
obtained
-* by ExecLockNonLeafAppendTables.
+* ExecLockNonLeafAppendTables already opened the relation for 
us.
 */
-   rel = relation_open(pinfo->reloid, NoLock);
+   Assert(partitioned_rels[i] != NULL);
+   rel = partitioned_rels[i];
+   Assert(RelationGetRelid(rel) == pinfo->reloid);
 
partkey = RelationGetPartitionKey(rel);
partdesc = RelationGetPartitionDesc(rel);
@@ -1336,8 +1338,6 @@ ExecSetupPartitionPruneState(PlanState *planstate, List 
*partitionpruneinfo)
prunestate->execparams = bms_add_members(prunestate->execparams,

 pinfo->execparams);
 
-   relation_close(rel, NoLock);
-
i++;
}
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index b963cae730..58a0961eb1 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -858,22 +858,49 @@ ShutdownExprContext(ExprContext *econtext, bool isCommit)
 /*
  * ExecLockNonLeafAppendTables
  *
- * Locks, if necessary, the tables indicated by the RT indexes contained in
- * the partitioned_rels list.  These are the non-leaf tables in the partition
- * tree controlled by a given Append or MergeAppend node.
+ * Opens and/or locks, if necessary, the tables indicated by the RT indexes
+ * contained in the partitioned_rels list.  These are the non-leaf tables in
+ * the partition tree controlled by a given Append or MergeAppend node.
  */
 void
-ExecLockNonLeafAppendTables(List *partitioned_rels, EState *estate)
+ExecLockNonLeafAppendTables(PlanState *planstate,
+   EState *estate,
+   List *partitioned_rels)
 {
PlannedStmt *stmt = estate->es_plannedstmt;
ListCell   *lc;
+   int i;
 
+   /*
+* If we're going to be performing pruning, allocate space for Relation
+* pointers to be used later when setting up partition pruning state in
+* ExecSetupPartitionPruneState.
+*/
+   if (IsA(planstate, AppendState))
+   {
+   AppendState *appendstate = (AppendState *) planstate;
+   Append *node = (Append *) planstate->plan;
+
+   if (node->part_prune_infos != NIL)
+   {
+   Assert(list_length(node->part_prune_infos) ==
+  list_length(partitioned_rels));
+   appendstate->partitioned_rels = (Relation *)
+   
palloc(sizeof(Relation) *
+  
list_length(node->part_prune_infos));
+   appendstate->num_partitioned_rels =
+  
list_length(node->part_prune_infos);
+   }
+   }
+
+   i = 0;
foreach(lc, partitioned_rels)
{
ListCell   *l;
Index   rti = lfirst_int(lc);
boolis_result_rel = false;
Oid relid = getrelid(rti, 

Re: [HACKERS] Runtime Partition Pruning

2018-04-12 Thread Amit Langote
On 2018/04/13 14:38, Amit Langote wrote:
> About the specific relation_open(.., NoLock) under question, I think there
> might be a way to address this by opening the tables with the appropriate
> lock mode in partitioned_rels list in ExecLockNonleafAppendTables

That may have sounded a bit confusing:

I meant to say: "by opening the tables in partitioned_rels list with the
appropriate lock mode in ExecLockNonleafAppendTables"

> Attached a PoC patch.
There were a couple of unnecessary hunks, which removed in the attached.

Thanks,
Amit
diff --git a/src/backend/executor/execPartition.c 
b/src/backend/executor/execPartition.c
index 11139f743d..e3b3911945 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1244,7 +1244,8 @@ adjust_partition_tlist(List *tlist, TupleConversionMap 
*map)
  * PartitionPruneInfo.
  */
 PartitionPruneState *
-ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
+ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo,
+Relation 
*partitioned_rels)
 {
PartitionPruningData *prunedata;
PartitionPruneState *prunestate;
@@ -1303,10 +1304,11 @@ ExecSetupPartitionPruneState(PlanState *planstate, List 
*partitionpruneinfo)
pprune->subpart_map = pinfo->subpart_map;
 
/*
-* Grab some info from the table's relcache; lock was already 
obtained
-* by ExecLockNonLeafAppendTables.
+* ExecLockNonLeafAppendTables already opened the relation for 
us.
 */
-   rel = relation_open(pinfo->reloid, NoLock);
+   Assert(partitioned_rels[i] != NULL);
+   rel = partitioned_rels[i];
+   Assert(RelationGetRelid(rel) == pinfo->reloid);
 
partkey = RelationGetPartitionKey(rel);
partdesc = RelationGetPartitionDesc(rel);
@@ -1336,8 +1338,6 @@ ExecSetupPartitionPruneState(PlanState *planstate, List 
*partitionpruneinfo)
prunestate->execparams = bms_add_members(prunestate->execparams,

 pinfo->execparams);
 
-   relation_close(rel, NoLock);
-
i++;
}
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index b963cae730..58a0961eb1 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -858,22 +858,49 @@ ShutdownExprContext(ExprContext *econtext, bool isCommit)
 /*
  * ExecLockNonLeafAppendTables
  *
- * Locks, if necessary, the tables indicated by the RT indexes contained in
- * the partitioned_rels list.  These are the non-leaf tables in the partition
- * tree controlled by a given Append or MergeAppend node.
+ * Opens and/or locks, if necessary, the tables indicated by the RT indexes
+ * contained in the partitioned_rels list.  These are the non-leaf tables in
+ * the partition tree controlled by a given Append or MergeAppend node.
  */
 void
-ExecLockNonLeafAppendTables(List *partitioned_rels, EState *estate)
+ExecLockNonLeafAppendTables(PlanState *planstate,
+   EState *estate,
+   List *partitioned_rels)
 {
PlannedStmt *stmt = estate->es_plannedstmt;
ListCell   *lc;
+   int i;
 
+   /*
+* If we're going to be performing pruning, allocate space for Relation
+* pointers to be used later when setting up partition pruning state in
+* ExecSetupPartitionPruneState.
+*/
+   if (IsA(planstate, AppendState))
+   {
+   AppendState *appendstate = (AppendState *) planstate;
+   Append *node = (Append *) planstate->plan;
+
+   if (node->part_prune_infos != NIL)
+   {
+   Assert(list_length(node->part_prune_infos) ==
+  list_length(partitioned_rels));
+   appendstate->partitioned_rels = (Relation *)
+   
palloc(sizeof(Relation) *
+  
list_length(node->part_prune_infos));
+   appendstate->num_partitioned_rels =
+  
list_length(node->part_prune_infos);
+   }
+   }
+
+   i = 0;
foreach(lc, partitioned_rels)
{
ListCell   *l;
Index   rti = lfirst_int(lc);
boolis_result_rel = false;
Oid relid = getrelid(rti, 
estate->es_range_table);
+   int lockmode;
 
/* If this is a result relation, 

Re: [HACKERS] Runtime Partition Pruning

2018-04-12 Thread Amit Langote
On 2018/04/13 1:57, Robert Haas wrote:
>> It might be possible to do something better in each module by keeping
>> an array indexed by RTI which have each entry NULL initially then on
>> first relation_open set the element in the array to that pointer.
> 
> I'm not sure that makes a lot of sense in the planner, but in the
> executor it might be a good idea.   See also
> https://www.postgresql.org/message-id/CA%2BTgmoYKToP4-adCFFRNrO21OGuH%3Dphx-fiB1dYoqksNYX6YHQ%40mail.gmail.com
> for related discussion.  I think that a coding pattern where we rely
> on relation_open(..., NoLock) is inherently dangerous -- it's too easy
> to be wrong about whether the lock is sure to have been taken.  It
> would be much better to open the relation once and hold onto the
> pointer, not just for performance reasons, but for robustness.

About the specific relation_open(.., NoLock) under question, I think there
might be a way to address this by opening the tables with the appropriate
lock mode in partitioned_rels list in ExecLockNonleafAppendTables and
keeping the Relation pointers around until ExecEndNode.  Then instead of
ExecSetupPartitionPruneState doing relation_open/close(.., NoLock), it
just reuses the one that's passed by the caller.

Attached a PoC patch.  David, thoughts?

Thanks,
Amit
diff --git a/src/backend/executor/execPartition.c 
b/src/backend/executor/execPartition.c
index 11139f743d..e3b3911945 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1244,7 +1244,8 @@ adjust_partition_tlist(List *tlist, TupleConversionMap 
*map)
  * PartitionPruneInfo.
  */
 PartitionPruneState *
-ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
+ExecSetupPartitionPruneState(PlanState *planstate, List *partitionpruneinfo,
+Relation 
*partitioned_rels)
 {
PartitionPruningData *prunedata;
PartitionPruneState *prunestate;
@@ -1303,10 +1304,11 @@ ExecSetupPartitionPruneState(PlanState *planstate, List 
*partitionpruneinfo)
pprune->subpart_map = pinfo->subpart_map;
 
/*
-* Grab some info from the table's relcache; lock was already 
obtained
-* by ExecLockNonLeafAppendTables.
+* ExecLockNonLeafAppendTables already opened the relation for 
us.
 */
-   rel = relation_open(pinfo->reloid, NoLock);
+   Assert(partitioned_rels[i] != NULL);
+   rel = partitioned_rels[i];
+   Assert(RelationGetRelid(rel) == pinfo->reloid);
 
partkey = RelationGetPartitionKey(rel);
partdesc = RelationGetPartitionDesc(rel);
@@ -1336,8 +1338,6 @@ ExecSetupPartitionPruneState(PlanState *planstate, List 
*partitionpruneinfo)
prunestate->execparams = bms_add_members(prunestate->execparams,

 pinfo->execparams);
 
-   relation_close(rel, NoLock);
-
i++;
}
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index b963cae730..58a0961eb1 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -858,22 +858,49 @@ ShutdownExprContext(ExprContext *econtext, bool isCommit)
 /*
  * ExecLockNonLeafAppendTables
  *
- * Locks, if necessary, the tables indicated by the RT indexes contained in
- * the partitioned_rels list.  These are the non-leaf tables in the partition
- * tree controlled by a given Append or MergeAppend node.
+ * Opens and/or locks, if necessary, the tables indicated by the RT indexes
+ * contained in the partitioned_rels list.  These are the non-leaf tables in
+ * the partition tree controlled by a given Append or MergeAppend node.
  */
 void
-ExecLockNonLeafAppendTables(List *partitioned_rels, EState *estate)
+ExecLockNonLeafAppendTables(PlanState *planstate,
+   EState *estate,
+   List *partitioned_rels)
 {
PlannedStmt *stmt = estate->es_plannedstmt;
ListCell   *lc;
+   int i;
 
+   /*
+* If we're going to be performing pruning, allocate space for Relation
+* pointers to be used later when setting up partition pruning state in
+* ExecSetupPartitionPruneState.
+*/
+   if (IsA(planstate, AppendState))
+   {
+   AppendState *appendstate = (AppendState *) planstate;
+   Append *node = (Append *) planstate->plan;
+
+   if (node->part_prune_infos != NIL)
+   {
+   Assert(list_length(node->part_prune_infos) ==
+  list_length(partitioned_rels));
+   appendstate->partitioned_rels = (Relation *)
+   

Re: [HACKERS] Runtime Partition Pruning

2018-04-12 Thread David Rowley
On 13 April 2018 at 04:57, Robert Haas  wrote:
> BTW, looking at ExecSetupPartitionPruneState:
>
> /*
>  * Create a sub memory context which we'll use when making calls to 
> the
>  * query planner's function to determine which partitions will
> match.  The
>  * planner is not too careful about freeing memory, so we'll ensure we
>  * call the function in this context to avoid any memory leaking in 
> the
>  * executor's memory context.
>  */
>
> This is a sloppy cut-and-paste job, not only because somebody changed
> one copy of the word "planner" to "executor" and left the others
> untouched, but also because the rationale isn't really correct for the
> executor anyway, which has memory contexts all over the place and
> frees them all the time.  I don't know whether the context is not
> needed at all or whether the context is needed but the rationale is
> different, but I don't buy that explanation.

The comment is written exactly as intended. Unsure which of the
"planner"s you think should be "executor".

The context is needed. I can easily produce an OOM without it.



-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-04-12 Thread Robert Haas
On Thu, Apr 12, 2018 at 12:40 AM, David Rowley
 wrote:
> I guess the problem there would be there's nothing to say that parse
> analysis will shortly be followed by a call to the planner, and a call
> to the planner does not mean the plan is about to be executed. So I
> don't think it would be possible to keep pointers to relcache entries
> between these modules, and it would be hard to determine whose
> responsibility it would be to call relation_close().

Yeah, that's definitely a non-starter.

> It might be possible to do something better in each module by keeping
> an array indexed by RTI which have each entry NULL initially then on
> first relation_open set the element in the array to that pointer.

I'm not sure that makes a lot of sense in the planner, but in the
executor it might be a good idea.   See also
https://www.postgresql.org/message-id/CA%2BTgmoYKToP4-adCFFRNrO21OGuH%3Dphx-fiB1dYoqksNYX6YHQ%40mail.gmail.com
for related discussion.  I think that a coding pattern where we rely
on relation_open(..., NoLock) is inherently dangerous -- it's too easy
to be wrong about whether the lock is sure to have been taken.  It
would be much better to open the relation once and hold onto the
pointer, not just for performance reasons, but for robustness.

BTW, looking at ExecSetupPartitionPruneState:

/*
 * Create a sub memory context which we'll use when making calls to the
 * query planner's function to determine which partitions will
match.  The
 * planner is not too careful about freeing memory, so we'll ensure we
 * call the function in this context to avoid any memory leaking in the
 * executor's memory context.
 */

This is a sloppy cut-and-paste job, not only because somebody changed
one copy of the word "planner" to "executor" and left the others
untouched, but also because the rationale isn't really correct for the
executor anyway, which has memory contexts all over the place and
frees them all the time.  I don't know whether the context is not
needed at all or whether the context is needed but the rationale is
different, but I don't buy that explanation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Runtime Partition Pruning

2018-04-11 Thread David Rowley
On 11 April 2018 at 09:32, Alvaro Herrera  wrote:
> Robert Haas wrote:
>> On Mon, Apr 9, 2018 at 2:28 PM, Alvaro Herrera  
>> wrote:
>
>> >> I don't get this.  The executor surely had to (and did) open all of
>> >> the relations somewhere even before this patch.
>
>> > I was worried that this coding could be seen as breaking modularity, or
>> > trying to do excessive work.  However, after looking closer at it, it
>> > doesn't really look like it's the case.  So, nevermind.
>>
>> Well what I'm saying is that it shouldn't be necessary.  If the
>> relations are being opened already and the pointers to the relcache
>> entries are being saved someplace, you shouldn't need to re-open them
>> elsewhere to get pointers to the relcache entries.
>
> I looked a bit more into this.  It turns out that we have indeed opened
> the relation before -- first in parserOpenTable (for addRangeTableEntry),
> then in expandRTE, then in QueryRewrite, then in subquery_planner, then
> in get_relation_info.
>
> So, frankly, since each module thinks it's okay to open it every once in
> a while, I'm not sure we should be terribly stressed about doing it once
> more for partition pruning.  Particularly since communicating the
> pointer seems to be quite troublesome.

I guess the problem there would be there's nothing to say that parse
analysis will shortly be followed by a call to the planner, and a call
to the planner does not mean the plan is about to be executed. So I
don't think it would be possible to keep pointers to relcache entries
between these modules, and it would be hard to determine whose
responsibility it would be to call relation_close().

It might be possible to do something better in each module by keeping
an array indexed by RTI which have each entry NULL initially then on
first relation_open set the element in the array to that pointer.

This might mean we'd save a few relation_open calls, but I don't know
if there would be a way to somehow remove the Relation from the array
on relation_close.  Having something like this might mean we could
detect lock upgrade hazards more easily, but the whole thing is a
cache on top of a cache which does seem a bit weird.  relation_open()
should be pretty cheap if the relation is already open. It's just a
hash table lookup. What is described above just changes that to an
array lookup.  It also does nothing for index_open.

However, something like the above would simplify
ExecLockNonLeafAppendTables() a bit and get rid of the O(N^2) which
checks the partition is not a result relation.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-04-11 Thread Amit Langote
On 2018/04/11 6:32, Alvaro Herrera wrote:
> Robert Haas wrote:
>> On Mon, Apr 9, 2018 at 2:28 PM, Alvaro Herrera  
>> wrote:
> 
 I don't get this.  The executor surely had to (and did) open all of
 the relations somewhere even before this patch.
> 
>>> I was worried that this coding could be seen as breaking modularity, or
>>> trying to do excessive work.  However, after looking closer at it, it
>>> doesn't really look like it's the case.  So, nevermind.
>>
>> Well what I'm saying is that it shouldn't be necessary.  If the
>> relations are being opened already and the pointers to the relcache
>> entries are being saved someplace, you shouldn't need to re-open them
>> elsewhere to get pointers to the relcache entries.
> 
> I looked a bit more into this.  It turns out that we have indeed opened
> the relation before -- first in parserOpenTable (for addRangeTableEntry),
> then in expandRTE, then in QueryRewrite, then in subquery_planner, then
> in get_relation_info.
> 
> So, frankly, since each module thinks it's okay to open it every once in
> a while, I'm not sure we should be terribly stressed about doing it once
> more for partition pruning.  Particularly since communicating the
> pointer seems to be quite troublesome.

Maybe, Robert was saying somewhere in the executor itself, before
ExecInitAppend, or more precisely, ExecSetupPartitionPruneState is called.
 But that doesn't seem to be the case.

For the result relation case (insert/update/delete on a partitioned
table), we don't need to do extra relation_opens, as InitPlan opens the
needed relations when building the ResultRelInfo(s), from where they're
later accessed, for example, in ExecInitModifyTable.

However, in the Append/MergeAppend cases, we don't, at any earlier point
in the executor, open the partitioned tables.  InitPlan doesn't touch
them.  In ExecInitAppend, ExecLockNonLeafAppendTables that we call before
calling ExecSetupPartitionPruneState does not open, just locks them using
LockRelationOid.

So, relation_open on partitioned tables in ExecSetupPartitionPruneState
seem to be the first time they're opened *in the executor*.

Thanks,
Amit




Re: [HACKERS] Runtime Partition Pruning

2018-04-10 Thread Alvaro Herrera
Robert Haas wrote:
> On Mon, Apr 9, 2018 at 2:28 PM, Alvaro Herrera  
> wrote:

> >> I don't get this.  The executor surely had to (and did) open all of
> >> the relations somewhere even before this patch.

> > I was worried that this coding could be seen as breaking modularity, or
> > trying to do excessive work.  However, after looking closer at it, it
> > doesn't really look like it's the case.  So, nevermind.
> 
> Well what I'm saying is that it shouldn't be necessary.  If the
> relations are being opened already and the pointers to the relcache
> entries are being saved someplace, you shouldn't need to re-open them
> elsewhere to get pointers to the relcache entries.

I looked a bit more into this.  It turns out that we have indeed opened
the relation before -- first in parserOpenTable (for addRangeTableEntry),
then in expandRTE, then in QueryRewrite, then in subquery_planner, then
in get_relation_info.

So, frankly, since each module thinks it's okay to open it every once in
a while, I'm not sure we should be terribly stressed about doing it once
more for partition pruning.  Particularly since communicating the
pointer seems to be quite troublesome.

To figure out, I used the attached patch (not intended for application)
to add a backtrace to each log message, plus a couple of accusatory
elog() calls in relation_open and ExecSetupPartitionPruneState.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 5d682c8cb31f79c79c2ba4cafeb876f10b072cfc Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 10 Apr 2018 18:30:29 -0300
Subject: [PATCH] errbacktrace

---
 src/backend/utils/error/elog.c  | 59 +
 src/include/postgres_ext.h  |  1 +
 src/include/utils/elog.h|  3 ++
 src/interfaces/libpq/fe-protocol3.c |  3 ++
 4 files changed, 66 insertions(+)

diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 16531f7a0f..e531d0009e 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -62,6 +62,7 @@
 #ifdef HAVE_SYSLOG
 #include 
 #endif
+#include 
 
 #include "access/transam.h"
 #include "access/xact.h"
@@ -493,6 +494,8 @@ errfinish(int dummy,...)
pfree(edata->hint);
if (edata->context)
pfree(edata->context);
+   if (edata->backtrace)
+   pfree(edata->backtrace);
if (edata->schema_name)
pfree(edata->schema_name);
if (edata->table_name)
@@ -811,6 +814,41 @@ errmsg(const char *fmt,...)
return 0;   /* return value does 
not matter */
 }
 
+#define BACKTRACE_FRAMES 100
+int
+errbacktrace(void)
+{
+   ErrorData   *edata = [errordata_stack_depth];
+   MemoryContext oldcontext;
+   void   *buf[BACKTRACE_FRAMES];
+   int nframes;
+   char  **strfrms;
+   StringInfoData errtrace;
+   int i;
+
+   recursion_depth++;
+   CHECK_STACK_DEPTH();
+   oldcontext = MemoryContextSwitchTo(edata->assoc_context);
+
+   nframes = backtrace(buf, BACKTRACE_FRAMES);
+   strfrms = backtrace_symbols(buf, nframes);
+   if (strfrms == NULL)
+   return 0;
+
+   initStringInfo();
+
+   /* the first frame is always errbacktrace itself, so skip it */
+   for (i = 1; i < nframes; i++)
+   appendStringInfo(, "%s\n", strfrms[i]);
+   free(strfrms);
+
+   edata->backtrace = errtrace.data;
+
+   MemoryContextSwitchTo(oldcontext);
+   recursion_depth--;
+
+   return 0;
+}
 
 /*
  * errmsg_internal --- add a primary error message text to the current error
@@ -1522,6 +1560,8 @@ CopyErrorData(void)
newedata->hint = pstrdup(newedata->hint);
if (newedata->context)
newedata->context = pstrdup(newedata->context);
+   if (newedata->backtrace)
+   newedata->backtrace = pstrdup(newedata->backtrace);
if (newedata->schema_name)
newedata->schema_name = pstrdup(newedata->schema_name);
if (newedata->table_name)
@@ -1560,6 +1600,8 @@ FreeErrorData(ErrorData *edata)
pfree(edata->hint);
if (edata->context)
pfree(edata->context);
+   if (edata->backtrace)
+   pfree(edata->backtrace);
if (edata->schema_name)
pfree(edata->schema_name);
if (edata->table_name)
@@ -1635,6 +1677,8 @@ ThrowErrorData(ErrorData *edata)
newedata->hint = pstrdup(edata->hint);
if (edata->context)
newedata->context = pstrdup(edata->context);
+   if (edata->backtrace)
+   newedata->backtrace = pstrdup(newedata->backtrace);
/* assume message_id is not available */
if (edata->schema_name)

Re: [HACKERS] Runtime Partition Pruning

2018-04-09 Thread Alvaro Herrera
Robert Haas wrote:
> On Mon, Apr 9, 2018 at 2:28 PM, Alvaro Herrera  
> wrote:
> > Robert Haas wrote:

> >> I don't get this.  The executor surely had to (and did) open all of
> >> the relations somewhere even before this patch.
> >
> > I was worried that this coding could be seen as breaking modularity, or
> > trying to do excessive work.  However, after looking closer at it, it
> > doesn't really look like it's the case.  So, nevermind.
> 
> Well what I'm saying is that it shouldn't be necessary.  If the
> relations are being opened already and the pointers to the relcache
> entries are being saved someplace, you shouldn't need to re-open them
> elsewhere to get pointers to the relcache entries.

Oh, okay.  I can look into that.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-04-09 Thread Robert Haas
On Mon, Apr 9, 2018 at 2:28 PM, Alvaro Herrera  wrote:
> Robert Haas wrote:
>> On Sat, Apr 7, 2018 at 5:13 PM, Alvaro Herrera  
>> wrote:
>> > I had reservations about a relation_open() in the new executor code.  It
>> > seemed a bit odd; we don't have any other relation_open in the executor
>> > anywhere.  However, setting up the pruneinfo needs some stuff from
>> > relcache that I don't see a reasonable mechanism to pass through
>> > planner.  I asked Andres about it on IM and while he didn't endorse the
>> > patch in any way, his quick opinion was that "it wasn't entirely
>> > insane".  I verified that we already hold lock on the relation.
>>
>> I don't get this.  The executor surely had to (and did) open all of
>> the relations somewhere even before this patch.
>
> Yeah.
>
> I was worried that this coding could be seen as breaking modularity, or
> trying to do excessive work.  However, after looking closer at it, it
> doesn't really look like it's the case.  So, nevermind.

Well what I'm saying is that it shouldn't be necessary.  If the
relations are being opened already and the pointers to the relcache
entries are being saved someplace, you shouldn't need to re-open them
elsewhere to get pointers to the relcache entries.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Runtime Partition Pruning

2018-04-09 Thread Alvaro Herrera
Robert Haas wrote:
> On Sat, Apr 7, 2018 at 5:13 PM, Alvaro Herrera  
> wrote:
> > I had reservations about a relation_open() in the new executor code.  It
> > seemed a bit odd; we don't have any other relation_open in the executor
> > anywhere.  However, setting up the pruneinfo needs some stuff from
> > relcache that I don't see a reasonable mechanism to pass through
> > planner.  I asked Andres about it on IM and while he didn't endorse the
> > patch in any way, his quick opinion was that "it wasn't entirely
> > insane".  I verified that we already hold lock on the relation.
> 
> I don't get this.  The executor surely had to (and did) open all of
> the relations somewhere even before this patch.

Yeah.

I was worried that this coding could be seen as breaking modularity, or
trying to do excessive work.  However, after looking closer at it, it
doesn't really look like it's the case.  So, nevermind.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-04-09 Thread Robert Haas
On Sat, Apr 7, 2018 at 5:13 PM, Alvaro Herrera  wrote:
> I had reservations about a relation_open() in the new executor code.  It
> seemed a bit odd; we don't have any other relation_open in the executor
> anywhere.  However, setting up the pruneinfo needs some stuff from
> relcache that I don't see a reasonable mechanism to pass through
> planner.  I asked Andres about it on IM and while he didn't endorse the
> patch in any way, his quick opinion was that "it wasn't entirely
> insane".  I verified that we already hold lock on the relation.

I don't get this.  The executor surely had to (and did) open all of
the relations somewhere even before this patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Runtime Partition Pruning

2018-04-07 Thread David Rowley
On 8 April 2018 at 09:13, Alvaro Herrera  wrote:
> I pushed this patch -- 0001, 0002 and 0003 only.  I did not include
> anything from 0004 and 0005; I didn't even get to the point of reading
> them, so that I could focus on the first part.

Oh great! Thank you for working on this and pushing it, especially so
during your weekend.

> While we didn't get fast pruning support for MergeAppend or the
> DELETE/UPDATE parts, I think those are valuable and recommend to
> resubmit those for PG12.

Thanks. I'll certainly be doing that.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-04-07 Thread Alvaro Herrera
I pushed this patch -- 0001, 0002 and 0003 only.  I did not include
anything from 0004 and 0005; I didn't even get to the point of reading
them, so that I could focus on the first part.

I did not find anything to complain about.  I made a few adjustments and
asked David to supply a paragraph for perform.sgml (the "Using EXPLAIN"
section) which is included here.  I also adopted Jesper's (actually
David's) suggestion of changing "Partitions Pruned" to "Partitions
Removed" in the EXPLAIN output.

I had reservations about a relation_open() in the new executor code.  It
seemed a bit odd; we don't have any other relation_open in the executor
anywhere.  However, setting up the pruneinfo needs some stuff from
relcache that I don't see a reasonable mechanism to pass through
planner.  I asked Andres about it on IM and while he didn't endorse the
patch in any way, his quick opinion was that "it wasn't entirely
insane".  I verified that we already hold lock on the relation.


While we didn't get fast pruning support for MergeAppend or the
DELETE/UPDATE parts, I think those are valuable and recommend to
resubmit those for PG12.

Thank you!

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-04-07 Thread Jesper Pedersen

Hi,

On 04/07/2018 04:45 AM, David Rowley wrote:

Ok, so I've gone and done this.

PartitionPruning has become PartitionPruneState
PartitionRelPruning has become PartitionPruningData

I've changed pointers to PartitionPruneStates to be named prunestate,
sometimes having the node prefix; as_, ma_, in these cases prune and
state are separated with a _ which seems to be the general rule for
executor state struct members.

Generally, pointers to PartitionPruningData are now named pprune.
Hopefully, that's ok, as this was the name previously used for
PartitionPruning pointers.

I applied the patch to get rid of as_noop_scan in favour of using a
special as_whichplan value. There was already one special value
(INVALID_SUBPLAN_INDEX), so seemed better to build on that rather than
inventing something new. This also means we don't have to make the
AppendState struct and wider too, which seems like a good thing to try
to do.

I made all the fixups which I mentioned in my review earlier and also
re-removed the resultRelation parameter from make_partition_pruneinfo.
It sneaked back into v22.

v23 is attached.



Passes check-world.

Changing explain.c to "Subplans Removed" as suggested by you in [1] is a 
good idea.


[1] 
https://www.postgresql.org/message-id/CAKJS1f99JnkbOshdV_4zoJZ96DPtKeHMHv43JRL_ZdHRkkVKCA%40mail.gmail.com


Best regards,
 Jesper



Re: [HACKERS] Runtime Partition Pruning

2018-04-07 Thread Alvaro Herrera
David Rowley wrote:
> On 8 April 2018 at 01:18, Alvaro Herrera  wrote:
> > Amit had it as "indexes" also in his original.  I wanted to avoid using
> > the "indexes" word alone, whose meaning is so overloaded.
> 
> hmm, good point.
> 
> > How about this?
> > "... which are then executed to produce a set of partitions (as indexes
> > of resultRelInfo->part_rels array) that satisfy the constraints in the
> > step".
> 
> Works for me, but with RelOptInfo rather than resultRelInfo.

Oops, sorry about that.  Pushed now, adding one line to
get_matching_partitions also.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-04-07 Thread David Rowley
On 8 April 2018 at 01:18, Alvaro Herrera  wrote:
> Amit had it as "indexes" also in his original.  I wanted to avoid using
> the "indexes" word alone, whose meaning is so overloaded.

hmm, good point.

> How about this?
> "... which are then executed to produce a set of partitions (as indexes
> of resultRelInfo->part_rels array) that satisfy the constraints in the
> step".

Works for me, but with RelOptInfo rather than resultRelInfo.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-04-07 Thread Alvaro Herrera
David Rowley wrote:

> It's not exactly wrong but:
> 
> + * are turned into a set of "pruning steps", which are then executed to
> + * produce a set of RTIs of partitions whose bounds satisfy the constraints 
> in
> + * the step.  Partitions not in the set are said to have been pruned.
> 
> It's only prune_append_rel_partitions which is only used for the
> planner's pruning needs that converts the partition indexes to RTIs.
> Would it be better to mention that the output is partition indexes?
> Maybe:
> 
> "which are then executed to produce a set of partition indexes whose
> bounds satisfy the constraints in the step. These partition indexes
> may then be translated into RTIs", or maybe even not mention the RTIs.

Amit had it as "indexes" also in his original.  I wanted to avoid using
the "indexes" word alone, whose meaning is so overloaded.  How about
this?

"... which are then executed to produce a set of partitions (as indexes
of resultRelInfo->part_rels array) that satisfy the constraints in the
step".

Maybe "the boundinfo array" instead of part_rels, which as I understand
also uses the same indexing as the other array, and partprune mostly
works based on boundinfo anyway?

Not mentioning RTIs seems fine.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-04-07 Thread David Rowley
On 8 April 2018 at 00:23, Alvaro Herrera  wrote:
> I edited it as attached, to 1. avoid mentioning functionality that
> doesn't yet exist, and 2. avoid excessive internal detail (we want a
> high-level overview here), which from experience gets outdated pretty
> quickly.

It's not exactly wrong but:

+ * are turned into a set of "pruning steps", which are then executed to
+ * produce a set of RTIs of partitions whose bounds satisfy the constraints in
+ * the step.  Partitions not in the set are said to have been pruned.

It's only prune_append_rel_partitions which is only used for the
planner's pruning needs that converts the partition indexes to RTIs.
Would it be better to mention that the output is partition indexes?
Maybe:

"which are then executed to produce a set of partition indexes whose
bounds satisfy the constraints in the step. These partition indexes
may then be translated into RTIs", or maybe even not mention the RTIs.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-04-07 Thread Alvaro Herrera
Amit Langote wrote:

> See if the attached makes it any better.
> 
> Now I know we don't have the runtime pruning in yet, but since the
> proposed patch would extend its functionality I have included its
> description in the comment.

Thanks!

I edited it as attached, to 1. avoid mentioning functionality that
doesn't yet exist, and 2. avoid excessive internal detail (we want a
high-level overview here), which from experience gets outdated pretty
quickly.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
commit 28768290d8d9c58e76e2594a14d5b1f8465ba262
Author: Alvaro Herrera 
AuthorDate: Sat Apr 7 08:44:12 2018 -0300
CommitDate: Sat Apr 7 09:20:58 2018 -0300

Add some documentation to partprune.c

Author: Amit Langote
Discussion: 
https://postgr.es/m/CA+HiwqGzq4D6z=8r0ap+xhbtfcq-4ct+t2ekqje9fpm84_j...@mail.gmail.com

diff --git a/src/backend/partitioning/partprune.c 
b/src/backend/partitioning/partprune.c
index 959ee1643d..07b8057e3f 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -1,10 +1,27 @@
 /*-
  *
  * partprune.c
- * Parses clauses attempting to match them up to partition keys of 
a
- * given relation and generates a set of "pruning steps", which 
can be
- * later "executed" either from the planner or the executor to 
determine
- * the minimum set of partitions which match the given clauses.
+ * Support for partition pruning during query planning
+ *
+ * This module implements partition pruning using the information contained in
+ * table's partition descriptor and query clauses.
+ *
+ * During planning, clauses that can be matched to the table's partition key
+ * are turned into a set of "pruning steps", which are then executed to
+ * produce a set of RTIs of partitions whose bounds satisfy the constraints in
+ * the step.  Partitions not in the set are said to have been pruned.
+ *
+ * There are two kinds of pruning steps: a "base" pruning step, which contains
+ * information extracted from one or more clauses that are matched to the
+ * (possibly multi-column) partition key, such as the expressions whose values
+ * to match against partition bounds and operator strategy to associate to
+ * each expression.  The other kind is a "combine" pruning step, which combines
+ * the outputs of some other steps using the appropriate combination method.
+ * All steps that are constructed are executed in succession such that for any
+ * "combine" step, all of the steps whose output it depends on are executed
+ * first and their ouput preserved.
+ *
+ * See gen_partprune_steps_internal() for more details on step generation.
  *
  * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California


Re: [HACKERS] Runtime Partition Pruning

2018-04-06 Thread Amit Langote
On Sat, Apr 7, 2018 at 1:58 PM, Amit Langote  wrote:
> On Sat, Apr 7, 2018 at 1:31 PM, Andres Freund  wrote:
>> I've not followed this thread/feature at all, but I don't find the
>> comments atop partprune.c even remotely sufficient. Unless there's an
>> README hidden or such hidden somewhere?
>
> Sorry there isn't a README and I agree partprune.c's header comment
> could be improved quite a bit.
>
> Just to be clear, that's the fault of the patch that was already
> committed earlier today (9fdb675fc "Faster partition pruning"), not
> this patch, which just extends partition.c's functionality to
> implement additional planner and executor support for runtime pruning.
>
> I'm drafting a patch that expands the partprune.c comment and will post 
> shortly.

See if the attached makes it any better.

Now I know we don't have the runtime pruning in yet, but since the
proposed patch would extend its functionality I have included its
description in the comment.

Thanks,
Amit


expand-partprune-header-comment.patch
Description: Binary data


Re: [HACKERS] Runtime Partition Pruning

2018-04-06 Thread Andres Freund
On 2018-04-07 16:58:01 +1200, David Rowley wrote:
> On 7 April 2018 at 16:31, Andres Freund  wrote:
> > I've not followed this thread/feature at all, but I don't find the
> > comments atop partprune.c even remotely sufficient. Unless there's an
> > README hidden or such hidden somewhere?
> 
> There's not a README file. The comments for partprune.c, do you want
> this to explain more about how partition pruning works or how this
> patch uses the existing code?

Primarily the first. This isn't trivial straightforward code.


> Probably if we need to explain more there about how pruning works then
> it should be a fixup patch to 9fdb675fc, no?

Yea, it's about that. Sorry for accidentally jumping on the wrong
thread.

Greetings,

Andres Freund



Re: [HACKERS] Runtime Partition Pruning

2018-04-06 Thread Amit Langote
On Sat, Apr 7, 2018 at 1:58 PM, David Rowley
 wrote:
> Probably if we need to explain more there about how pruning works then
> it should be a fixup patch to 9fdb675fc, no?

Yes, I just replied and working on a patch.

Thanks,
Amit



Re: [HACKERS] Runtime Partition Pruning

2018-04-06 Thread Amit Langote
On Sat, Apr 7, 2018 at 1:31 PM, Andres Freund  wrote:
> On 2018-04-07 13:26:51 +0900, Amit Langote wrote:
>> On Sat, Apr 7, 2018 at 11:26 AM, David Rowley
>>  wrote:
>> > Everything else looks fine from my point of view.
>>
>> Me too, although I still think having struct names PartitionPruning
>> and PartitionRelPruning is going to be  a bit confusing.  We should
>> think about naming the latter to something else.  I suggested
>> PartitionPruningDispatch(Data), but David doesn't seem to like it.
>> Maybe, PartitionPruneState, because it parallels the
>> PartitionPruneInfo that comes from the planner for every partitioned
>> table in the tree.
>
> I've not followed this thread/feature at all, but I don't find the
> comments atop partprune.c even remotely sufficient. Unless there's an
> README hidden or such hidden somewhere?

Sorry there isn't a README and I agree partprune.c's header comment
could be improved quite a bit.

Just to be clear, that's the fault of the patch that was already
committed earlier today (9fdb675fc "Faster partition pruning"), not
this patch, which just extends partition.c's functionality to
implement additional planner and executor support for runtime pruning.

I'm drafting a patch that expands the partprune.c comment and will post shortly.

Thanks,
Amit



Re: [HACKERS] Runtime Partition Pruning

2018-04-06 Thread David Rowley
On 7 April 2018 at 16:31, Andres Freund  wrote:
> I've not followed this thread/feature at all, but I don't find the
> comments atop partprune.c even remotely sufficient. Unless there's an
> README hidden or such hidden somewhere?

There's not a README file. The comments for partprune.c, do you want
this to explain more about how partition pruning works or how this
patch uses the existing code?

Probably if we need to explain more there about how pruning works then
it should be a fixup patch to 9fdb675fc, no?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-04-06 Thread David Rowley
On 7 April 2018 at 16:26, Amit Langote  wrote:
> Maybe, PartitionPruneState, because it parallels the
> PartitionPruneInfo that comes from the planner for every partitioned
> table in the tree.

I like that.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-04-06 Thread Andres Freund
On 2018-04-07 13:26:51 +0900, Amit Langote wrote:
> On Sat, Apr 7, 2018 at 11:26 AM, David Rowley
>  wrote:
> > Everything else looks fine from my point of view.
> 
> Me too, although I still think having struct names PartitionPruning
> and PartitionRelPruning is going to be  a bit confusing.  We should
> think about naming the latter to something else.  I suggested
> PartitionPruningDispatch(Data), but David doesn't seem to like it.
> Maybe, PartitionPruneState, because it parallels the
> PartitionPruneInfo that comes from the planner for every partitioned
> table in the tree.

I've not followed this thread/feature at all, but I don't find the
comments atop partprune.c even remotely sufficient. Unless there's an
README hidden or such hidden somewhere?

Greetings,

Andres Freund



Re: [HACKERS] Runtime Partition Pruning

2018-04-06 Thread Amit Langote
On Sat, Apr 7, 2018 at 11:26 AM, David Rowley
 wrote:
> Everything else looks fine from my point of view.

Me too, although I still think having struct names PartitionPruning
and PartitionRelPruning is going to be  a bit confusing.  We should
think about naming the latter to something else.  I suggested
PartitionPruningDispatch(Data), but David doesn't seem to like it.
Maybe, PartitionPruneState, because it parallels the
PartitionPruneInfo that comes from the planner for every partitioned
table in the tree.

Thanks,
Amit



Re: [HACKERS] Runtime Partition Pruning

2018-04-06 Thread David Rowley
On 7 April 2018 at 12:03, David Rowley  wrote:
> Continuing to read 0003 and 0004 now.


0003:

1. "setup" -> "set"

/* If run-time partition pruning is enabled, then setup that up now */

2. We should be able to get rid of as_noopscan and just have another
special negative value for as_whichplan.

I've attached a patch to do this.

3. I've forgotten to drop table boolvalues in the tests.

Patched attached to fix.

0004:

1. "ms_valid_subplans" -> "valid_subplans" in:

 * ms_valid_subplans for runtime pruning, valid mergeplans indexes to
 * scan.

All the other fields are not being prefixed with ms_ in these comments.

Everything else looks fine from my point of view.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


nodeAppend_get_rid_of_as_noopscan.patch
Description: Binary data


drop_table_boolvalues.patch
Description: Binary data


Re: [HACKERS] Runtime Partition Pruning

2018-04-06 Thread David Rowley
On 7 April 2018 at 10:45, David Rowley  wrote:
> I'm looking over the rebased patches now.

I've made a complete read of 0001 and 0002 so far.

Your rebase looks fine.

After the complete read, I only have the following comments:

0001:

1. missing "the" before "partition key":

* Extract Params matching partition key and record if we got any.

2. Is this the property name we're going to stick with:

ExplainPropertyInteger("Subplans Pruned", NULL, nplans - nsubnodes, es);

Other ideas are: "Subplans Removed"

3. In the following comment I've used the word "hierarchy", but maybe
we need to add the word "flattened" before it.

 * PartitionPruning - Encapsulates a hierarchy of PartitionRelPruning

4. Comment mentions "after init plan", but really we can only know the
value of an exec param during actual execution. So:

* Parameters that are safe to be used for partition pruning. execparams
* are not safe to use until after init plan.

maybe better as:

* Parameters that are safe to be used for partition pruning. execparams
* are not safe to use until the executor is running.

0002:

Looks fine. But if I was committing this, to give me confidence, I'd
want to know how the left_most_one table was generated.

I used:

#include 

int main(void)
{
int i = 1;

printf("0, ");
while (i < 256)
{
printf("%d, ", 31 - __builtin_clz(i));
if ((i & 0xf) == 0xf)
putchar('\n');
i++;
}
return 0;
}

Continuing to read 0003 and 0004 now.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-04-06 Thread David Rowley
On 7 April 2018 at 09:29, Alvaro Herrera  wrote:
> Alvaro Herrera wrote:
>> I rebased this series on top of the committed version of the other patch.
>> Here's v22, with no other changes than rebasing.  I did not include
>> 0005, though.
>
> Apologies, I forgot to "git add" one fixup for 0001.

0003 I think.

I'm looking over the rebased patches now.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-04-06 Thread Alvaro Herrera
Alvaro Herrera wrote:
> I rebased this series on top of the committed version of the other patch.
> Here's v22, with no other changes than rebasing.  I did not include
> 0005, though.

Apologies, I forgot to "git add" one fixup for 0001.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/optimizer/plan/createplan.c 
b/src/backend/optimizer/plan/createplan.c
index 093ceaa867..aac05917ee 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -29,7 +29,6 @@
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
-#include "optimizer/partprune.h"
 #include "optimizer/paths.h"
 #include "optimizer/placeholder.h"
 #include "optimizer/plancat.h"
@@ -42,6 +41,7 @@
 #include "optimizer/var.h"
 #include "parser/parse_clause.h"
 #include "parser/parsetree.h"
+#include "partitioning/partprune.h"
 #include "utils/lsyscache.h"
 
 


Re: [HACKERS] Runtime Partition Pruning

2018-04-05 Thread Amit Langote
Hi David,

On 2018/04/06 12:27, David Rowley wrote:
> (sending my reply in parts for concurrency)
> 
> On 6 April 2018 at 14:39, Amit Langote  wrote:
>> I think we can save some space here by not having the pointers stored
>> here.  Instead of passing the pointer itself in the recursive calls to
>> find_subplans_for_extparams_recurse, et al, just pass the entire array and
>> an offset to use for the given recursive invocation.
> 
> hmm, where would those offsets be stored?  I don't want to have to do
> any linear searching to determine the offset, which is why I just
> stored the pointer to the flat array. It seems very efficient to me to
> do this. Remember that this pruning can occur per tuple in cases like
> parameterized nested loops.
> 
> Are you worried about memory consumption? It's one pointer per
> partition. I imagine there's lots more allocated for DML on a
> partitioned table as it needs to store maps to map attribute numbers.
> 
> Or are you thinking the saving of storing an array of 32-bit int
> values is better than the array of probably 64-bit pointers? So
> requires half the space?

Yeah, just copy it from the PartitionPruneInfo like you're doing for
subnodeindex:

   memcpy(partrelprune->subpartindex, pinfo->subpartindex,
  sizeof(int) * pinfo->nparts);

Instead I see ExecSetupPartitionPruning is now doing this:

  /*
   * Setup the PartitionedRelPruning's subpartprune so that we can
   * quickly find sub-PartitionedRelPruning details for any
   * sub-partitioned tables that this partitioned table contains.
   * We need to be able to find these quickly during our recursive
   * search to find all matching subnodes.
   */
   for (j = 0; j < pinfo->nparts; j++)
   {
   int subpartidx = pinfo->subpartindex[j];

   Assert(subpartidx < list_length(partitionpruneinfo));

   if (subpartidx >= 0)
   partrelprune->subpartprune[j] = [subpartidx];
   else
   partrelprune->subpartprune[j] = NULL;
   }


With that in place, pass the index/offset instead of the pointer to the
next recursive invocation of find_subplans_*, along with the array
containing all PartitionedRelPruning's.

So, where you have in each of find_subplans_*:

  if (partrelprune->subnodeindex[i] >= 0)
  *validsubplans = bms_add_member(*validsubplans,
  partrelprune->subnodeindex[i]);
  else if (partrelprune->subpartprune[i] != NULL)
  find_subplans_for_allparams_recurse(partrelprune->subpartprune[i],
  validsubplans);

I'm proposing that you do:

  if (partrelprune->subnodeindex[i] >= 0)
  *validsubplans = bms_add_member(*validsubplans,
  partrelprune->subnodeindex[i]);
  else if (partrelprune->subpartindex[i] >= 0)
  find_subplans_for_allparams_recurse(all_partrelprunes,
  partrelprune->subpartindex[i],
  validsubplans);

And at the top of each of  find_subplans_*:

  ParitionedRelPruning *partrelprune = all_partrelprunes[offset];

where the signature is:

  static void
  find_subplans_for_allparams_recurse(
 PartitionRelPruning **all_partrelprune,
 int offset,
 Bitmapset **validsubplans)

The all_partrelprune above refers to the flat array in PartitionPruning.
On the first call from ExecFindMatchingSubPlans, et al, you'd pass 0 for
offset to start pruning with the root parent's PartitionedRelPruning.  All
the values contained in subnodeindex and subpartindex are indexes into the
global array (whole-tree that is) anyway and that fact would be more
apparent if we use this code structure, imho.

>> If you look at ExecFindPartition used for tuple routing, you may see that
>> there no recursion at all.  Similarly find_subplans_for_extparams_recurse,
>> et al might be able to avoid recursion if similar tricks are used.
> 
> That seems pretty different. That's looking for a single node in a
> tree, so just is following a single path from the root, it never has
> to go back up a level and look down any other paths.
> 
> What we need for the find_subplans_for_extparams_recurse is to find
> all nodes in the entire tree which match the given clause. Doing this
> without recursion would require some sort of stack so we can go back
> up a level and search again down another branch.  There are ways
> around this without using recursion, sure, but I don't think any of
> them will be quite as convenient and simple. The best I can think of
> is to palloc some stack manually and use some depth_level to track
> which element to use.  An actual stack seems more simple. I can't
> quite think of a good way to know in advance how many elements we'd
> need to palloc.

Hmm, yeah.  I just remembered that I had to give up suggesting this a
while back on this thread.  So, okay, you 

Re: [HACKERS] Runtime Partition Pruning

2018-04-05 Thread David Rowley
(sending my reply in parts for concurrency)

On 6 April 2018 at 14:39, Amit Langote  wrote:
> I think we can save some space here by not having the pointers stored
> here.  Instead of passing the pointer itself in the recursive calls to
> find_subplans_for_extparams_recurse, et al, just pass the entire array and
> an offset to use for the given recursive invocation.

hmm, where would those offsets be stored?  I don't want to have to do
any linear searching to determine the offset, which is why I just
stored the pointer to the flat array. It seems very efficient to me to
do this. Remember that this pruning can occur per tuple in cases like
parameterized nested loops.

Are you worried about memory consumption? It's one pointer per
partition. I imagine there's lots more allocated for DML on a
partitioned table as it needs to store maps to map attribute numbers.

Or are you thinking the saving of storing an array of 32-bit int
values is better than the array of probably 64-bit pointers? So
requires half the space?

> If you look at ExecFindPartition used for tuple routing, you may see that
> there no recursion at all.  Similarly find_subplans_for_extparams_recurse,
> et al might be able to avoid recursion if similar tricks are used.

That seems pretty different. That's looking for a single node in a
tree, so just is following a single path from the root, it never has
to go back up a level and look down any other paths.

What we need for the find_subplans_for_extparams_recurse is to find
all nodes in the entire tree which match the given clause. Doing this
without recursion would require some sort of stack so we can go back
up a level and search again down another branch.  There are ways
around this without using recursion, sure, but I don't think any of
them will be quite as convenient and simple. The best I can think of
is to palloc some stack manually and use some depth_level to track
which element to use.  An actual stack seems more simple. I can't
quite think of a good way to know in advance how many elements we'd
need to palloc.

> Finally about having two different functions for different sets of Params:
> can't we have just named find_subplans_for_params_recurse() and use the
> appropriate one based on the value of some parameter?  I can't help but
> notice the duplication of code.

I had decided not to make this one function previously as I didn't
really want to add unnecessary branching in the code. After
implementing it, it does not look as bad as I thought.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-04-05 Thread Amit Langote
Hi David.

On 2018/04/05 22:41, David Rowley wrote:
>> * make_partition_pruneinfo has a parameter resultRelations that's not used
>> anywhere
> 
> It gets used in 0005.
> 
> I guess I could only add it in 0005, but make_partition_pruneinfo is
> only used in 0003, so you could say the same about that entire
> function.
> 
> Do you think I should delay adding that parameter until the 0005 patch?

Yes, I think.

>> * In make_partition_pruneinfo()
>>
>> allsubnodeindex = palloc(sizeof(int) * root->simple_rel_array_size);
>> allsubpartindex = palloc(sizeof(int) * root->simple_rel_array_size);
>>
>> I think these arrays need to have root->simple_rel_array_size + 1
>> elements, because they're subscripted using RT indexes which start at 1.
> 
> RT indexes are always 1-based. See setup_simple_rel_arrays. It already
> sets the array size to list_length(rtable) + 1.

Oh, I missed that simple_rel_array_size itself is set to consider 1-based
RT indexes.

relnode.c:73
root->simple_rel_array_size = list_length(root->parse->rtable) + 1;

>> * Also in make_partition_pruneinfo()
>>
>> /* Initialize to -1 to indicate the rel was not found */
>> for (i = 0; i < root->simple_rel_array_size; i++)
>> {
>> allsubnodeindex[i] = -1;
>> allsubpartindex[i] = -1;
>> }
>>
>> Maybe, allocate the arrays above mentioned using palloc0 and don't do this
>> initialization.  Instead make the indexes that are stored in these start
>> with 1 and consider 0 as invalid entries.
> 
> 0 is a valid subplan index. It is possible to make this happen, but
> I'd need to subtract 1 everywhere I used the map. That does not seem
> very nice. Seems more likely to result in bugs where we might forget
> to do the - 1.

You can subtract 1 right here in make_partition_pruneinfo before setting
the values in PartitionPruneInfo's subplan_indexes and parent_indexes.
I'm only proposing to make make_partition_pruneinfo() a bit faster by not
looping over both the local map arrays setting them to -1.

IOW, I'm not saying that we emit PartitionPruneInfo nodes that contain
1-based indexes.

> Did you want this because you'd rather have the palloc0() than the for
> loop setting the array elements to -1? Or is there another reason?

Yeah, that's it.

>> * Instead of nesting PartitionedRelPruning inside another, just store them
>> in a global flat array in the PartitionPruning, like PartitionTupleRouting
>> does for PartitionDispatch of individual partitioned tables in the tree.
>>
>> typedef struct PartitionedRelPruning
>> {
>> int nparts;
>> int*subnodeindex;
>> struct PartitionedRelPruning **subpartprune;
> 
> There is a flat array in PartitionPruning. subpartprune contains
> pointers into that array. I want to have this pointer array so I can
> directly reference the flat array in PartitionPruning.
> 
> Maybe I've misunderstood what you mean here.

I think we can save some space here by not having the pointers stored
here.  Instead of passing the pointer itself in the recursive calls to
find_subplans_for_extparams_recurse, et al, just pass the entire array and
an offset to use for the given recursive invocation.

If you look at ExecFindPartition used for tuple routing, you may see that
there no recursion at all.  Similarly find_subplans_for_extparams_recurse,
et al might be able to avoid recursion if similar tricks are used.

Finally about having two different functions for different sets of Params:
can't we have just named find_subplans_for_params_recurse() and use the
appropriate one based on the value of some parameter?  I can't help but
notice the duplication of code.

Thanks,
Amit




Re: [HACKERS] Runtime Partition Pruning

2018-04-05 Thread Jesper Pedersen

Hi David,

First of all: Solid patch set with good documentation.

On 04/05/2018 09:41 AM, David Rowley wrote:

Seems mostly fair. I'm not a fan of using the term "unpruned" though.
I'll have a think.  The "all" is meant in terms of what exists as
subnodes.



'included_parts' / 'excluded_parts' probably isn't better...


subplan_indexes and parent_indexes seem like better names, I agree.



More clear.


* Also in make_partition_pruneinfo()

 /* Initialize to -1 to indicate the rel was not found */
 for (i = 0; i < root->simple_rel_array_size; i++)
 {
 allsubnodeindex[i] = -1;
 allsubpartindex[i] = -1;
 }

Maybe, allocate the arrays above mentioned using palloc0 and don't do this
initialization.  Instead make the indexes that are stored in these start
with 1 and consider 0 as invalid entries.


0 is a valid subplan index. It is possible to make this happen, but
I'd need to subtract 1 everywhere I used the map. That does not seem
very nice. Seems more likely to result in bugs where we might forget
to do the - 1.

Did you want this because you'd rather have the palloc0() than the for
loop setting the array elements to -1? Or is there another reason?



I think that doing palloc0 would be confusing; -1 is more clear, 
especially since it is used with 'allpartindexes' which is a Bitmapset.


Doing the variable renames as Amit suggests would be good.

I ran some tests (v50_v20) (make check-world passes), w/ and w/o 
choose_custom_plan() being false, and seeing good performance results 
without running into issues.


Maybe 0005 should be expanded in partition_prune.sql with the supported 
cases to make those more clear.


Thanks !

Best regards,
 Jesper



Re: [HACKERS] Runtime Partition Pruning

2018-04-05 Thread David Rowley
Hi Amit,

Thanks for having a look at this.

On 6 April 2018 at 00:54, Amit Langote  wrote:
> I looked at it and here are my thoughts on it after having for the first
> time looked very closely at it.
>
> * Regarding PartitionPruneInfo:
>
> I think the names of the fields could be improved -- like pruning_steps
> instead prunesteps, unpruned_parts instead of allpartindexs.  The latter
> is even a bit misleading because it doesn't in fact contain *all*
> partition indexes, only those that are unpruned, because each either has a
> subpath or it appears in (unpruned) partitioned_rels list.  Also, I didn't
> like the name subnodeindex and subpartindex very much.  subplan_indexes
> and parent_indexes would sound more informative to me.

Seems mostly fair. I'm not a fan of using the term "unpruned" though.
I'll have a think.  The "all" is meant in terms of what exists as
subnodes.

subplan_indexes and parent_indexes seem like better names, I agree.

> * make_partition_pruneinfo has a parameter resultRelations that's not used
> anywhere

It gets used in 0005.

I guess I could only add it in 0005, but make_partition_pruneinfo is
only used in 0003, so you could say the same about that entire
function.

Do you think I should delay adding that parameter until the 0005 patch?

> * In make_partition_pruneinfo()
>
> allsubnodeindex = palloc(sizeof(int) * root->simple_rel_array_size);
> allsubpartindex = palloc(sizeof(int) * root->simple_rel_array_size);
>
> I think these arrays need to have root->simple_rel_array_size + 1
> elements, because they're subscripted using RT indexes which start at 1.

RT indexes are always 1-based. See setup_simple_rel_arrays. It already
sets the array size to list_length(rtable) + 1.

> * Also in make_partition_pruneinfo()
>
> /* Initialize to -1 to indicate the rel was not found */
> for (i = 0; i < root->simple_rel_array_size; i++)
> {
> allsubnodeindex[i] = -1;
> allsubpartindex[i] = -1;
> }
>
> Maybe, allocate the arrays above mentioned using palloc0 and don't do this
> initialization.  Instead make the indexes that are stored in these start
> with 1 and consider 0 as invalid entries.

0 is a valid subplan index. It is possible to make this happen, but
I'd need to subtract 1 everywhere I used the map. That does not seem
very nice. Seems more likely to result in bugs where we might forget
to do the - 1.

Did you want this because you'd rather have the palloc0() than the for
loop setting the array elements to -1? Or is there another reason?

> * Regarding the code added in execPartition.c and execPartition.h:
>
> I wondered why PartitionedRelPruning is named the way it is.  I saw many
> parallels with PartitionDispatchData both in terms of the main thing it
> consists of, that is, the map that translates partition indexes as in
> partition descriptor to that of subplans or of some other executor
> structure.  Also, I imagine you tried to mimic PartitionTupleRouting with
> PartitionPruning but not throughout.  For example, tuple routing struct
> pointer variables are throughout called proute, whereas PartitionPruning
> ones are called partprune instead of, say, pprune.  Consistency would
> help, imho.

Yes, I saw similarities and did end up moving all the code into
execPartition a while back.

I'll look into this renaming.

> * Instead of nesting PartitionedRelPruning inside another, just store them
> in a global flat array in the PartitionPruning, like PartitionTupleRouting
> does for PartitionDispatch of individual partitioned tables in the tree.
>
> typedef struct PartitionedRelPruning
> {
> int nparts;
> int*subnodeindex;
> struct PartitionedRelPruning **subpartprune;

There is a flat array in PartitionPruning. subpartprune contains
pointers into that array. I want to have this pointer array so I can
directly reference the flat array in PartitionPruning.

Maybe I've misunderstood what you mean here.

> * I don't see why there needs to be nparts in the above, because it
> already has a PartitionPruneContext member which has that information.

Good point. I'll remove that.

> In fact, I made most of changes myself while going through the code.
> Please see attached the delta patch.  It also tweaks quite a few other
> things including various comments.  I think parts of it apply to 0001,
> 0003, and 0004 patches.  See if this looks good to you.

Thanks. I'll look.

It's late over this side now, so will look tomorrow.

Thanks again for reviewing this.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-04-05 Thread Amit Langote
On 2018/04/05 12:14, Amit Langote wrote:
> I will post comments on your v19 later today.

I looked at it and here are my thoughts on it after having for the first
time looked very closely at it.

* Regarding PartitionPruneInfo:

I think the names of the fields could be improved -- like pruning_steps
instead prunesteps, unpruned_parts instead of allpartindexs.  The latter
is even a bit misleading because it doesn't in fact contain *all*
partition indexes, only those that are unpruned, because each either has a
subpath or it appears in (unpruned) partitioned_rels list.  Also, I didn't
like the name subnodeindex and subpartindex very much.  subplan_indexes
and parent_indexes would sound more informative to me.

* make_partition_pruneinfo has a parameter resultRelations that's not used
anywhere

* In make_partition_pruneinfo()

allsubnodeindex = palloc(sizeof(int) * root->simple_rel_array_size);
allsubpartindex = palloc(sizeof(int) * root->simple_rel_array_size);

I think these arrays need to have root->simple_rel_array_size + 1
elements, because they're subscripted using RT indexes which start at 1.

* Also in make_partition_pruneinfo()

/* Initialize to -1 to indicate the rel was not found */
for (i = 0; i < root->simple_rel_array_size; i++)
{
allsubnodeindex[i] = -1;
allsubpartindex[i] = -1;
}

Maybe, allocate the arrays above mentioned using palloc0 and don't do this
initialization.  Instead make the indexes that are stored in these start
with 1 and consider 0 as invalid entries.

* Regarding the code added in execPartition.c and execPartition.h:

I wondered why PartitionedRelPruning is named the way it is.  I saw many
parallels with PartitionDispatchData both in terms of the main thing it
consists of, that is, the map that translates partition indexes as in
partition descriptor to that of subplans or of some other executor
structure.  Also, I imagine you tried to mimic PartitionTupleRouting with
PartitionPruning but not throughout.  For example, tuple routing struct
pointer variables are throughout called proute, whereas PartitionPruning
ones are called partprune instead of, say, pprune.  Consistency would
help, imho.

* Instead of nesting PartitionedRelPruning inside another, just store them
in a global flat array in the PartitionPruning, like PartitionTupleRouting
does for PartitionDispatch of individual partitioned tables in the tree.

typedef struct PartitionedRelPruning
{
int nparts;
int*subnodeindex;
struct PartitionedRelPruning **subpartprune;

* I don't see why there needs to be nparts in the above, because it
already has a PartitionPruneContext member which has that information.

In fact, I made most of changes myself while going through the code.
Please see attached the delta patch.  It also tweaks quite a few other
things including various comments.  I think parts of it apply to 0001,
0003, and 0004 patches.  See if this looks good to you.

Thanks,
Amit
diff --git a/src/backend/executor/execPartition.c 
b/src/backend/executor/execPartition.c
index 17da8cdbd3..1041871e51 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -39,11 +39,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation 
rel,
 bool 
*isnull,
 int 
maxfieldlen);
 static List *adjust_partition_tlist(List *tlist, TupleConversionMap *map);
-static void find_subplans_for_extparams_recurse(
-   
PartitionedRelPruning *partrelprune,
+static void find_subplans_for_extparams_recurse(PartitionPruningDispatch 
*all_ppd,
+   int 
dispatch_offset,

Bitmapset **validsubplans);
-static void find_subplans_for_allparams_recurse(
-   
PartitionedRelPruning *partrelprune,
+static void find_subplans_for_allparams_recurse(PartitionPruningDispatch 
*all_ppd,
+   int 
dispatch_offset,

Bitmapset **validsubplans);
 
 
@@ -1343,27 +1343,27 @@ adjust_partition_tlist(List *tlist, TupleConversionMap 
*map)
 PartitionPruning *
 ExecSetupPartitionPruning(PlanState *planstate, List *partitionpruneinfo)
 {
-   PartitionedRelPruning *partrelprunes;
-   PartitionPruning *partprune;
+   PartitionPruning *pprune;
+   PartitionPruningDispatch *all_ppd;
ListCell   *lc;
int i;
 
Assert(partitionpruneinfo != NIL);
 
-   partprune = (PartitionPruning *) palloc(sizeof(PartitionPruning));
-   partrelprunes = (PartitionedRelPruning *)

Re: [HACKERS] Runtime Partition Pruning

2018-04-05 Thread David Rowley
On 5 April 2018 at 15:14, Amit Langote  wrote:
> On 2018/03/31 22:52, David Rowley wrote:
>> Initialising
>> Append/MergeAppend/ModifyTable nodes with fewer subnodes than were
>> originally planned did require a small change in explain.c in some
>> code that was assuming the subnode arrays were the same length as the
>> subplan list. I also ended up adding a Int property to EXPLAIN to show
>> the number of subnodes that have been removed during execution.
>> Unfortunately, there is a small corner case with this in the case
>> where all subnodes are removed it leaves EXPLAIN with no subnodes to
>> search for outer side Vars in. I didn't really see any sane way to
>> handle this in EXPLAIN, so I think the best fix for this is what I've
>> done, and that's just to always initalise at least 1 subnode, even
>> when none match the external Params. Cost-wise, I don't think this is
>> such a big deal as the cost savings here are coming from saving on
>> initalising ten's or hundreds of subnodes, not 1.
>
> I have wondered about the possibility of setting a valid (non-dummy)
> targetlist in Append and MergeAppend if they're created for a partitioned
> table.  Currently they're overwritten by a dummy one using
> set_dummy_tlist_references() in set_plan_refs() citing the following reason:
>
>  * set_dummy_tlist_references
>  *Replace the targetlist of an upper-level plan node with a simple
>  *list of OUTER_VAR references to its child.
>  *
>  * This is used for plan types like Sort and Append that don't evaluate
>  * their targetlists.  Although the executor doesn't care at all what's in
>  * the tlist, EXPLAIN needs it to be realistic.
>
> In fact, when I had noticed that this EXPLAIN behavior I had wondered if
> that's something we should have discussed when d3cc37f1d801a [1] went in.

I had a quick hack at this to see if it would work and it does seem to
on my very simple test. However, it would mean removing
set_dummy_tlist_references from more than just Append/MergeAppend

create table listp (a int, b int) partition by list(a);
create table listp1 partition of listp for values in(1);
create table listp2 partition of listp for values in(2);
prepare q1 (int, int) as select * from listp where a in($1,$2) order
by b limit 1;
explain execute q1(1,2);
explain execute q1(1,2);
explain execute q1(1,2);
explain execute q1(1,2);
explain execute q1(1,2);
explain (verbose, costs off) execute q1(0,0);
   QUERY PLAN

 Limit
   Output: listp.a, listp.b
   ->  Sort
 Output: listp.a, listp.b
 Sort Key: listp.b
 ->  Append
   Subplans Pruned: 2
(7 rows)

The downside is that if we were to do this it would mean changing the
output in cases like:

explain (verbose, costs off) (select a z, b y from listp union all
select * from listp) order by y;
  QUERY PLAN
--
 Sort
   Output: z, y
   Sort Key: y
   ->  Append
 ->  Seq Scan on public.listp1
   Output: listp1.a, listp1.b
 ->  Seq Scan on public.listp2
   Output: listp2.a, listp2.b
 ->  Seq Scan on public.listp1 listp1_1
   Output: listp1_1.a, listp1_1.b
 ->  Seq Scan on public.listp2 listp2_1
   Output: listp2_1.a, listp2_1.b

Notice the sort key now refers to the alias rather than a column from
the first append child.

It sure is an interesting thought, and one I'd not considered, but I
don't think trying for something like this is going to be the path of
least resistance. It may also add quite a bit of complexity if we just
try to do it when the OUTER_VAR would lead to a Append/MergeAppend
which belongs to a partitioned table scan.

I think this idea has good merit and some form of it might be the
nicest way to allow run-time pruning of all subnodes in the future.
Perhaps we can put it on the shelf and think about it again for PG12.
However, it might not be the most interesting optimization to work on,
as I think probably run-time pruning away of all subnodes is probably
far less common than pruning some, or all but one, and the cost of
initializing the one unneeded subnode is not so big.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


set_dummy_tlist_hacks.patch
Description: Binary data


Re: [HACKERS] Runtime Partition Pruning

2018-04-04 Thread Amit Langote
Hi David.

On 2018/03/31 22:52, David Rowley wrote:
> The attached patchset is based on Amit's v45 faster partition pruning [1].
> 
> I've made a few changes since the v14 version. Since Amit's v45 patch
> now creates the partition pruning details in a data structure that can
> be copied from the planner over to the executor, it means this patch
> is now able to do much of the setup work for run-time pruning in the
> planner. Doing this now allows us to determine if run-time pruning is
> even possible at plan time, rather than the executor attempting it and
> sometimes wasting effort when it failed to find Params matching the
> partition key.
>
> Another change from the last version is that I've separated out the
> handling of exec Params and external Params.  The new patch now will
> perform a pruning step at executor startup if some external params
> match the partition key.  This is very beneficial to a PREPAREd OLTP
> type query against a partitioned table as it means we can skip
> sub-plan initialisation for all non-matching partitions.

This is quite nice.

> Initialising
> Append/MergeAppend/ModifyTable nodes with fewer subnodes than were
> originally planned did require a small change in explain.c in some
> code that was assuming the subnode arrays were the same length as the
> subplan list. I also ended up adding a Int property to EXPLAIN to show
> the number of subnodes that have been removed during execution.
> Unfortunately, there is a small corner case with this in the case
> where all subnodes are removed it leaves EXPLAIN with no subnodes to
> search for outer side Vars in. I didn't really see any sane way to
> handle this in EXPLAIN, so I think the best fix for this is what I've
> done, and that's just to always initalise at least 1 subnode, even
> when none match the external Params. Cost-wise, I don't think this is
> such a big deal as the cost savings here are coming from saving on
> initalising ten's or hundreds of subnodes, not 1.

I have wondered about the possibility of setting a valid (non-dummy)
targetlist in Append and MergeAppend if they're created for a partitioned
table.  Currently they're overwritten by a dummy one using
set_dummy_tlist_references() in set_plan_refs() citing the following reason:

 * set_dummy_tlist_references
 *Replace the targetlist of an upper-level plan node with a simple
 *list of OUTER_VAR references to its child.
 *
 * This is used for plan types like Sort and Append that don't evaluate
 * their targetlists.  Although the executor doesn't care at all what's in
 * the tlist, EXPLAIN needs it to be realistic.

In fact, when I had noticed that this EXPLAIN behavior I had wondered if
that's something we should have discussed when d3cc37f1d801a [1] went in.

> To put the new patch to the test, I tried pgbench -S -M prepared -s
> 100 with and without having modified pgbench_accounts to separate into
> 10 RANGE partitions of equal size.
> 
> A non-partitioned table was getting 12503 TPS.
> With partitioned tables, the old version of this patch was getting: 5470 TPS.
> With partitioned tables, the attached version gets 11247 TPS.
> For perspective, today's master with a partitioned table gets 4719 TPS.

Quite nice!

> So you can see it's a pretty good performance boost by skipping
> initialisation of the 9 non-matching subplans.  It's not hard to
> imagine the gains getting more significant with a larger number of
> partitions. Ideally, the performance of a partitioned table would be
> faster than the non-partitioned case, but please remember this query
> is a single row PK lookup SELECT, so is a very fast query in both
> cases. Partitioning cases should improve more as the table grows and
> indexes struggle to fit in RAM, or when many rows are being taken from
> the partition and being aggregated.
> 
> Unfortunately, the story is not quite as good for the non -M prepared
> version of the benchmark. This shows that planning time of partitioned
> table queries could still use some improvements. Amit's v45 patch
> certainly makes a dent in the planner slow performance here, but it's
> still nothing like as fast as the non-partitioned case. More work is
> required there in the future.

Certainly.  Things like the issue that we both replied to yesterday should
be addressed for starters [2].

> This patchset also contains a patch to improve the performance of
> inheritance planning of UPDATE/DELETE queries. This patch also
> implements run-time pruning for UPDATE/DELETE too. This also has a
> significant performance improvement for planning of UPDATE/DELETE
> operations on partitioned tables with a large number of partitions,
> performance is as follows:
> 
> Values in transactions per second.
> 
> Partitions = 1
> Unpatched: 7323.3
> Patched: 6573.2 (-10.24%)
> 
> Partitions = 2
> Unpatched: 6784.8
> Patched: 6377.1 (-6.01%)
> 
> Partitions = 4
> Unpatched: 5903.0
> Patched: 6106.8 (3.45%)
> 
> Partitions = 8
> Unpatched: 4582.0
> Patched: 5579.9 (21.78%)

Re: [HACKERS] Runtime Partition Pruning

2018-04-04 Thread Jesper Pedersen

Hi David,

On 04/03/2018 10:10 PM, David Rowley wrote:

The attached case doesn't trigger a generic plan, so basically all time is
spent in GetCachedPlan.


Yeah, there's still no resolution to the fact that a generic plan +
runtime pruning might be cheaper than a custom plan.  The problem is
the generic plan appears expensive to the custom vs generic plan
comparison due to it containing more Append subnodes and the run-time
pruning not being taking into account by that comparison.

There's been some discussion about this on this thread somewhere.



Forgot about that, sorry.


I think the best solution is probably the one suggested by Robert [1]
and that's to alter the Append plan's cost when run-time pruning is
enabled to try to account for the run-time pruning. This would be a
bit of a blind guess akin to what we do for clause selectivity
estimates for Params, but it's probably better than nothing, and
likely better than doing nothing.



Yeah, something based on the number of WHERE clauses, and if the 
partition type has DEFAULT / NULL partition could help. Forcing 
choose_custom_plan() to return false does help a lot (> 400%) for the 
HASH case.


But maybe this area is best left for PG12.


Yeah, it's a bug in v46 faster partition pruning. Discussing a fix for
that with Amit over on [2].



I was running with a version of faster_part_prune_v45_fixups.patch.

Patch v49 with v18 (0001-0004) works. 0005 needs a rebase.

Thanks again,
 Jesper



Re: [HACKERS] Runtime Partition Pruning

2018-04-04 Thread Amit Langote
On 2018/04/04 16:04, David Rowley wrote:
> On 4 April 2018 at 18:27, Amit Langote  wrote:
>> I'm not sure if we've yet discussed anything that'd be related to this on
>> the faster pruning thread.
> 
> hmm, yeah, I didn't really explain the context, but the report was in [1]>
> [1] 
> https://www.postgresql.org/message-id/cakjs1f_shpuqdhqwjq-_p1kppqn7bjt71ypbdp_8b3rhwfq...@mail.gmail.com

Oh, I see.  Hopefully it is no longer an issue.

Thanks,
Amit




Re: [HACKERS] Runtime Partition Pruning

2018-04-04 Thread David Rowley
On 4 April 2018 at 18:27, Amit Langote  wrote:
> On 2018/04/04 11:10, David Rowley wrote:
>> On 4 April 2018 at 05:44, Jesper Pedersen  wrote:
>>> Also, I'm seeing a regression for check-world in
>>> src/test/regress/results/inherit.out
>>>
>>> ***
>>> *** 642,648 
>>>   -+---+---+-
>>>mlparted_tab_part1  | 1 | a |
>>>mlparted_tab_part2a | 2 | a |
>>> !  mlparted_tab_part2b | 2 | b | xxx
>>>mlparted_tab_part3  | 3 | a | xxx
>>>   (4 rows)
>>>
>>> --- 642,648 
>>>   -+---+---+-
>>>mlparted_tab_part1  | 1 | a |
>>>mlparted_tab_part2a | 2 | a |
>>> !  mlparted_tab_part2b | 2 | b |
>>>mlparted_tab_part3  | 3 | a | xxx
>>>   (4 rows)
>>>
>>> I'll spend some more time tomorrow.
>>
>> Yeah, it's a bug in v46 faster partition pruning. Discussing a fix for
>> that with Amit over on [2].
>
> I'm not sure if we've yet discussed anything that'd be related to this on
> the faster pruning thread.

hmm, yeah, I didn't really explain the context, but the report was in [1]

Basically, the OR clause in the following SQL fragment was overwriting
the scan_all_non_null value:

where (mlp.a = ss.a and mlp.b = 'b') or mlp.a = 3;

Basically the:

result->scan_all_nonnull = step_result->scan_all_nonnull;

The minimum fix would have been to change that line to:

result->scan_all_nonnull |= step_result->scan_all_nonnull;

Anyway, it all irrelevant now as that code has all changed.

[1] 
https://www.postgresql.org/message-id/cakjs1f_shpuqdhqwjq-_p1kppqn7bjt71ypbdp_8b3rhwfq...@mail.gmail.com

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-04-04 Thread Amit Langote
Hi David.

On 2018/04/04 11:10, David Rowley wrote:
> On 4 April 2018 at 05:44, Jesper Pedersen  wrote:
>> Also, I'm seeing a regression for check-world in
>> src/test/regress/results/inherit.out
>>
>> ***
>> *** 642,648 
>>   -+---+---+-
>>mlparted_tab_part1  | 1 | a |
>>mlparted_tab_part2a | 2 | a |
>> !  mlparted_tab_part2b | 2 | b | xxx
>>mlparted_tab_part3  | 3 | a | xxx
>>   (4 rows)
>>
>> --- 642,648 
>>   -+---+---+-
>>mlparted_tab_part1  | 1 | a |
>>mlparted_tab_part2a | 2 | a |
>> !  mlparted_tab_part2b | 2 | b |
>>mlparted_tab_part3  | 3 | a | xxx
>>   (4 rows)
>>
>> I'll spend some more time tomorrow.
> 
> Yeah, it's a bug in v46 faster partition pruning. Discussing a fix for
> that with Amit over on [2].

I'm not sure if we've yet discussed anything that'd be related to this on
the faster pruning thread.  It seems that the difference arises from
mlparted_tab_part2b not being selected for an update query that's executed
just before this test.  When I execute an equivalent select query to check
if mlparted_tab_part2b is inadvertently pruned due to the new code, I
don't see the latest faster pruning patch doing it:

explain (costs off)
select *
from mlparted_tab mlp,
(select a from some_tab union all select a+1 from some_tab) ss (a)
where (mlp.a = ss.a and mlp.b = 'b') or mlp.a = 3;
QUERY PLAN

--
 Nested Loop
   Join Filter: (((mlp.a = some_tab.a) AND (mlp.b = 'b'::bpchar)) OR
(mlp.a = 3))
   ->  Append
 ->  Seq Scan on some_tab
 ->  Seq Scan on some_tab some_tab_1
   ->  Materialize
 ->  Append
   ->  Seq Scan on mlparted_tab_part1 mlp
 Filter: ((b = 'b'::bpchar) OR (a = 3))
   ->  Seq Scan on mlparted_tab_part2b mlp_1
 Filter: ((b = 'b'::bpchar) OR (a = 3))
   ->  Seq Scan on mlparted_tab_part3 mlp_2
 Filter: ((b = 'b'::bpchar) OR (a = 3))
(13 rows)

For the original update query, constraint exclusion selects the same set
of partitions:

explain (costs off) update mlparted_tab mlp set c = 'xxx'
from (select a from some_tab union all select a+1 from some_tab) ss (a)
where (mlp.a = ss.a and mlp.b = 'b') or mlp.a = 3;
  QUERY PLAN

--
 Update on mlparted_tab mlp
   Update on mlparted_tab_part1 mlp_1
   Update on mlparted_tab_part2b mlp_2
   Update on mlparted_tab_part3 mlp_3
   ->  Nested Loop
 Join Filter: (((mlp_1.a = some_tab.a) AND (mlp_1.b =
'b'::bpchar)) OR (mlp_1.a = 3))
 ->  Append
   ->  Seq Scan on some_tab
   ->  Seq Scan on some_tab some_tab_1
 ->  Materialize
   ->  Seq Scan on mlparted_tab_part1 mlp_1
 Filter: ((b = 'b'::bpchar) OR (a = 3))
   ->  Nested Loop
 Join Filter: (((mlp_2.a = some_tab.a) AND (mlp_2.b =
'b'::bpchar)) OR (mlp_2.a = 3))
 ->  Append
   ->  Seq Scan on some_tab
   ->  Seq Scan on some_tab some_tab_1
 ->  Materialize
   ->  Seq Scan on mlparted_tab_part2b mlp_2
 Filter: ((b = 'b'::bpchar) OR (a = 3))
   ->  Nested Loop
 Join Filter: (((mlp_3.a = some_tab.a) AND (mlp_3.b =
'b'::bpchar)) OR (mlp_3.a = 3))
 ->  Append
   ->  Seq Scan on some_tab
   ->  Seq Scan on some_tab some_tab_1
 ->  Materialize
   ->  Seq Scan on mlparted_tab_part3 mlp_3
 Filter: ((b = 'b'::bpchar) OR (a = 3))
(28 rows)

What am I missing?

Thanks,
Amit




Re: [HACKERS] Runtime Partition Pruning

2018-04-03 Thread Beena Emerson
Hi David,

On Wed, Apr 4, 2018 at 7:57 AM, David Rowley
 wrote:
> On 4 April 2018 at 05:50, Beena Emerson  wrote:
>> With Amit's v46 patch,  the following query in partition_prune was
>> crashing during make check.
>> explain (analyze, costs off, summary off, timing off) execute ab_q1 (2, 2, 
>> 3);
>
> Hi Beena,
>
> Thanks for looking.
>
> Does it work correctly if you apply [1] to Amit's v46 patch before
> patching with v18 run-time partition pruning?
>
> [1] 
> https://www.postgresql.org/message-id/CAKJS1f_6%2BgXB%3DQ%2BDryeB62yW7N19sY8hH_dBSjPFjm2ifdgoCw%40mail.gmail.com
>

Thanks for working on it. make check passes when the patch [1] is also applied.

--

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Runtime Partition Pruning

2018-04-03 Thread David Rowley
On 4 April 2018 at 05:50, Beena Emerson  wrote:
> With Amit's v46 patch,  the following query in partition_prune was
> crashing during make check.
> explain (analyze, costs off, summary off, timing off) execute ab_q1 (2, 2, 3);

Hi Beena,

Thanks for looking.

Does it work correctly if you apply [1] to Amit's v46 patch before
patching with v18 run-time partition pruning?

[1] 
https://www.postgresql.org/message-id/CAKJS1f_6%2BgXB%3DQ%2BDryeB62yW7N19sY8hH_dBSjPFjm2ifdgoCw%40mail.gmail.com

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-04-03 Thread Beena Emerson
Hello,

On Tue, Apr 3, 2018 at 11:14 PM, Jesper Pedersen
 wrote:
> Hi David,
>
> On 03/31/2018 09:52 AM, David Rowley wrote:
>>
>> I've attached a new version of the patch.  I'm now at v18 after having
>> some versions of the patch that I didn't release which were based on
>> various versions of Amit's faster partition pruning patch.
>>
>
> Thank you for the updated patch set !
>
> I have tested this together with Amit's v46 patch.
>
>
> Also, I'm seeing a regression for check-world in
> src/test/regress/results/inherit.out
>

With Amit's v46 patch,  the following query in partition_prune was
crashing during make check.
explain (analyze, costs off, summary off, timing off) execute ab_q1 (2, 2, 3);



-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Runtime Partition Pruning

2018-04-03 Thread Jesper Pedersen

Hi David,

On 03/31/2018 09:52 AM, David Rowley wrote:

I've attached a new version of the patch.  I'm now at v18 after having
some versions of the patch that I didn't release which were based on
various versions of Amit's faster partition pruning patch.



Thank you for the updated patch set !

I have tested this together with Amit's v46 patch.

The attached case doesn't trigger a generic plan, so basically all time 
is spent in GetCachedPlan.


The standard table case (std.sql) gives:

 generic_cost = 8.4525
 avg_custom_cost = 13.4525
 total_custom_cost = 67.2625

whereas the 64 hash partition case (hash.sql) gives:

 generic_cost = 540.32
 avg_custom_cost = 175.9425
 total_custom_cost = 879.7125

I tested with pgbench -M prepared -f select.sql.

Also, I'm seeing a regression for check-world in 
src/test/regress/results/inherit.out


***
*** 642,648 
  -+---+---+-
   mlparted_tab_part1  | 1 | a |
   mlparted_tab_part2a | 2 | a |
!  mlparted_tab_part2b | 2 | b | xxx
   mlparted_tab_part3  | 3 | a | xxx
  (4 rows)

--- 642,648 
  -+---+---+-
   mlparted_tab_part1  | 1 | a |
   mlparted_tab_part2a | 2 | a |
!  mlparted_tab_part2b | 2 | b |
   mlparted_tab_part3  | 3 | a | xxx
  (4 rows)

I'll spend some more time tomorrow.

Thanks for working on this !

Best regards,
 Jesper


hash.sql
Description: application/sql


select.sql
Description: application/sql


std.sql
Description: application/sql


Re: [HACKERS] Runtime Partition Pruning

2018-03-31 Thread Alvaro Herrera
David Rowley wrote:

> To put the new patch to the test, I tried pgbench -S -M prepared -s
> 100 with and without having modified pgbench_accounts to separate into
> 10 RANGE partitions of equal size.
> 
> A non-partitioned table was getting 12503 TPS.
> With partitioned tables, the old version of this patch was getting: 5470 TPS.
> With partitioned tables, the attached version gets 11247 TPS.
> For perspective, today's master with a partitioned table gets 4719 TPS.
> 
> So you can see it's a pretty good performance boost by skipping
> initialisation of the 9 non-matching subplans.  It's not hard to
> imagine the gains getting more significant with a larger number of
> partitions.

These are excellent news!

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-03-09 Thread Jesper Pedersen

Hi David,

On 03/01/2018 08:29 PM, David Rowley wrote:

0004 fails "make check-world" due to

pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 670; 1259 49954 TABLE
boolp_f jpedersen
pg_restore: [archiver (db)] could not execute query: ERROR:  syntax error at
or near "false"
LINE 24: ..." ATTACH PARTITION "public"."boolp_f" FOR VALUES IN (false);


The tests seem to have stumbled on a pg_dump bug which causes it to
produce syntax that's not valid (currently)

I should be able to stop my patch failing the test by dropping that
table, which I should have been doing anyway.



I've added that thread to the Open Items list.


Thanks for the review and in advance for the future review.

I'll delay releasing a new patch as there's some discussion over on
the faster partition pruning thread which affects this too [1]

[1] 
https://www.postgresql.org/message-id/CA+Tgmoa4D1c4roj7L8cx8gkkeBWAZD=mtcxkxtwbnslrhd3...@mail.gmail.com



Sure, 0003-0005 depends on that thread. 0002 is refactoring so that one 
is ready.


In 0004 should we add a HASH based test case,

-- test.sql --
-- verify pruning in hash partitions
create table hashp (a int primary key, b int) partition by hash (a);
create table hashp_0 partition of hashp for values with (modulus 2, 
remainder 0);
create table hashp_1 partition of hashp for values with (modulus 2, 
remainder 1);

insert into hashp values (0,0), (1,1), (2,2), (3,3);
prepare q1 (int) as select * from hashp where a = $1;
execute q1 (1);
execute q1 (1);
execute q1 (1);
execute q1 (1);
execute q1 (1);
explain (analyze, costs off, summary off, timing off)  execute q1 (1);
explain (analyze, costs off, summary off, timing off)  execute q1 (3);
deallocate q1;
drop table hashp;
-- test.sql --

Also, should 0004 consider the "Parallel Append" case, aka

-- parallel.sql --
CREATE TABLE t1 (
a integer NOT NULL,
b integer NOT NULL
) PARTITION BY HASH (b);

CREATE TABLE t1_p00 PARTITION OF t1 FOR VALUES WITH (MODULUS 4, 
REMAINDER 0);
CREATE TABLE t1_p01 PARTITION OF t1 FOR VALUES WITH (MODULUS 4, 
REMAINDER 1);
CREATE TABLE t1_p02 PARTITION OF t1 FOR VALUES WITH (MODULUS 4, 
REMAINDER 2);
CREATE TABLE t1_p03 PARTITION OF t1 FOR VALUES WITH (MODULUS 4, 
REMAINDER 3);

INSERT INTO t1 (SELECT i, i FROM generate_series(1, 100) AS i);
PREPARE q1 (int) AS SELECT * FROM t1 WHERE a = $1;
EXECUTE q1 (5432);
EXECUTE q1 (5432);
EXECUTE q1 (5432);
EXECUTE q1 (5432);
EXECUTE q1 (5432);
EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)  EXECUTE q1 (5432);
DEALLOCATE q1;
DROP TABLE t1;
-- parallel.sql --

Best regards,
 Jesper



Re: [HACKERS] Runtime Partition Pruning

2018-03-01 Thread David Rowley
On 2 March 2018 at 07:17, Jesper Pedersen  wrote:
> A small typo in 0001:
>
> + * leftmost_ons_pos[x] gives the bit number (0-7) of the leftmost one bit
> in a
>
> ..."_one_"...

Oops. I'll fix that.

> 0004 fails "make check-world" due to
>
> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> pg_restore: [archiver (db)] Error from TOC entry 670; 1259 49954 TABLE
> boolp_f jpedersen
> pg_restore: [archiver (db)] could not execute query: ERROR:  syntax error at
> or near "false"
> LINE 24: ..." ATTACH PARTITION "public"."boolp_f" FOR VALUES IN (false);

The tests seem to have stumbled on a pg_dump bug which causes it to
produce syntax that's not valid (currently)

I should be able to stop my patch failing the test by dropping that
table, which I should have been doing anyway.

> Do you require https://commitfest.postgresql.org/17/1410/ as well ?

I'll look at that thread and see if there's any pg_dump being broken discussion.

> I'll look more at 0002-0005 over the coming days.

Thanks for the review and in advance for the future review.

I'll delay releasing a new patch as there's some discussion over on
the faster partition pruning thread which affects this too [1]

[1] 
https://www.postgresql.org/message-id/CA+Tgmoa4D1c4roj7L8cx8gkkeBWAZD=mtcxkxtwbnslrhd3...@mail.gmail.com

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-03-01 Thread Jesper Pedersen

Hi David,

On 03/01/2018 08:04 AM, David Rowley wrote:

I've also split the patch out a bit more into logical parts in the
hope it makes things easier to review.



A small typo in 0001:

+ * leftmost_ons_pos[x] gives the bit number (0-7) of the leftmost one 
bit in a


..."_one_"...

0004 fails "make check-world" due to

pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 670; 1259 49954 TABLE 
boolp_f jpedersen
pg_restore: [archiver (db)] could not execute query: ERROR:  syntax 
error at or near "false"

LINE 24: ..." ATTACH PARTITION "public"."boolp_f" FOR VALUES IN (false);

Do you require https://commitfest.postgresql.org/17/1410/ as well ?

I'll look more at 0002-0005 over the coming days.

Thanks for working on this !

Best regards,
 Jesper



Re: [HACKERS] Runtime Partition Pruning

2018-02-25 Thread Amit Langote
On 2018/02/23 20:40, David Rowley wrote:
> On 23 February 2018 at 04:11, Jesper Pedersen
>  wrote:
>> Are UPDATE and DELETE suppose to be supported ?
> 
> To be honest, I had not even considered those. Without looking in
> detail I imagine it may be possible to allow this simply by setting
> the AppendPath->trypartitionprune in the correct cases in the
> inheritence_planner(). I would need to look into this in some detail
> to find out for sure.

I guess you meant in ModifyTablePath.

Thanks,
Amit




Re: [HACKERS] Runtime Partition Pruning

2018-02-25 Thread Amit Langote
(Sorry, I had mistakenly replied only to Simon on Friday)

On Fri, Feb 23, 2018 at 9:04 PM, Simon Riggs  wrote:
> On 23 February 2018 at 11:40, David Rowley  
> wrote:
>> On 23 February 2018 at 04:11, Jesper Pedersen
>>  wrote:
>>> Are UPDATE and DELETE suppose to be supported ?
>>
>> To be honest, I had not even considered those.
>
> I can say that I had considered those. Handling of UPDATE and DELETE
> with partitions is significantly different, so its not just an
> oversight its a different branch of the project.
>
> We need SELECT to work first and then we have a chance of making
> UPDATE/DELETE work.

+1

Thanks,
Amit



Re: [HACKERS] Runtime Partition Pruning

2018-02-23 Thread Simon Riggs
On 23 February 2018 at 11:40, David Rowley  wrote:
> On 23 February 2018 at 04:11, Jesper Pedersen
>  wrote:
>> Are UPDATE and DELETE suppose to be supported ?
>
> To be honest, I had not even considered those.

I can say that I had considered those. Handling of UPDATE and DELETE
with partitions is significantly different, so its not just an
oversight its a different branch of the project.

We need SELECT to work first and then we have a chance of making
UPDATE/DELETE work.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-02-23 Thread David Rowley
On 23 February 2018 at 04:11, Jesper Pedersen
 wrote:
> Are UPDATE and DELETE suppose to be supported ?

To be honest, I had not even considered those. Without looking in
detail I imagine it may be possible to allow this simply by setting
the AppendPath->trypartitionprune in the correct cases in the
inheritence_planner(). I would need to look into this in some detail
to find out for sure.

Another case which likely is simple to implement is the exact same
processing for MergeAppends. I currently see no reason why the same
pruning cannot be done for subnodes of that node type too. I've just
not done so yet. I'd rather get more sanity check reviews on the
current scope of the patch before I widen it out to other areas, but
at the same time also don't want to leave very simple things to PG12
which can easily be done in PG11. So I'll try to look at this and get
back to you, or perhaps release a new set of patches to support the
additional features.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-02-22 Thread Amit Langote
Hi David.

On 2018/02/23 0:11, David Rowley wrote:
> On 23 February 2018 at 01:15, David Rowley  
> wrote:
>> One problem that I'm facing now is down to the way I'm gathering the
>> ParamIds that match the partkeys. As you'll see from the patch I've
>> added a 'paramids' field to PartitionPruneContext and I'm populating
>> this when the clauses are being pre-processed in
>> extract_partition_clauses(). The problem is that the or_clauses are
>> not pre-processed at all, so the current patch will not properly
>> perform run-time pruning when the Params are "hidden" in OR branches.
>>
>> One way I thought to fix this was to change the clause processing to
>> create an array of PartitionClauseInfos, one for each OR branch. This
>> would also improve the performance of the run-time pruning, meaning
>> that all of the or clauses would be already matched to the partition
>> keys once, rather than having to redo that again each time a Param
>> changes its value.
>>
>> If I go and write a patch to do that, would you want it in your patch,
>> or would you rather I kept it over here?  Or perhaps you have a better
>> idea on how to solve...?
> 
> I've attached a patch which does this. For now, the patch is only
> intended to assist in the discussion of the above idea.
> 
> The patch is based on a WIP version of run-time pruning that I'm not
> quite ready to post yet, but with a small amount of work you could
> probably take it and base it on your faster partition pruning v31
> patch [1].
> 
> I ended up pulling the PartitionPruneInfo out of the
> PartitionPruneContext. This was required due how I've now made
> extract_partition_clauses() recursively call itself. We don't want to
> overwrite the context's clauseinfo with the one from the recursive
> call. A side effect of this is that the subcontext is no longer
> required when processing the OR clauses. You only did this so that the
> context's clauseinfo was not overwritten. I also think it's better to
> seperate out the inputs and outputs. Anything in context was more
> intended to be for input fields.
> 
> Let me know your thoughts about this. If you don't want it for faster
> partition pruning, then I'll probably go and tidy it up and include it
> for run-time pruning.

Thanks for the patch.

I don't have time today to look at the patch closely, but at first blush,
it seems like something I should incorporate into my own patch, because
it's restructuring the partprune.c code.  I will study the issue and your
proposed solution in the form of this restructuring more closely over the
weekend and reply (probably with a new version of my patch) on Monday.

Thanks,
Amit




Re: [HACKERS] Runtime Partition Pruning

2018-02-22 Thread David Rowley
On 23 February 2018 at 01:15, David Rowley  wrote:
> One problem that I'm facing now is down to the way I'm gathering the
> ParamIds that match the partkeys. As you'll see from the patch I've
> added a 'paramids' field to PartitionPruneContext and I'm populating
> this when the clauses are being pre-processed in
> extract_partition_clauses(). The problem is that the or_clauses are
> not pre-processed at all, so the current patch will not properly
> perform run-time pruning when the Params are "hidden" in OR branches.
>
> One way I thought to fix this was to change the clause processing to
> create an array of PartitionClauseInfos, one for each OR branch. This
> would also improve the performance of the run-time pruning, meaning
> that all of the or clauses would be already matched to the partition
> keys once, rather than having to redo that again each time a Param
> changes its value.
>
> If I go and write a patch to do that, would you want it in your patch,
> or would you rather I kept it over here?  Or perhaps you have a better
> idea on how to solve...?

Hi Amit,

I've attached a patch which does this. For now, the patch is only
intended to assist in the discussion of the above idea.

The patch is based on a WIP version of run-time pruning that I'm not
quite ready to post yet, but with a small amount of work you could
probably take it and base it on your faster partition pruning v31
patch [1].

I ended up pulling the PartitionPruneInfo out of the
PartitionPruneContext. This was required due how I've now made
extract_partition_clauses() recursively call itself. We don't want to
overwrite the context's clauseinfo with the one from the recursive
call. A side effect of this is that the subcontext is no longer
required when processing the OR clauses. You only did this so that the
context's clauseinfo was not overwritten. I also think it's better to
seperate out the inputs and outputs. Anything in context was more
intended to be for input fields.

Let me know your thoughts about this. If you don't want it for faster
partition pruning, then I'll probably go and tidy it up and include it
for run-time pruning.

[1] 
https://www.postgresql.org/message-id/00ae2273-bb6b-1287-9ebc-5459b37c9078%40lab.ntt.co.jp

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


generate_PartitionClauseInfos_for_or_clauses.patch
Description: Binary data


Re: [HACKERS] Runtime Partition Pruning

2018-02-22 Thread Jesper Pedersen

Hi David,

On 02/21/2018 04:06 AM, David Rowley wrote:

I've attached v11 of the patch.



Are UPDATE and DELETE suppose to be supported ?

With

-- test.sql --
CREATE TABLE test (a integer NOT NULL, b integer) PARTITION BY HASH(a);
CREATE TABLE test_p00 PARTITION OF test FOR VALUES WITH (MODULUS 2, 
REMAINDER 0);
CREATE TABLE test_p01 PARTITION OF test FOR VALUES WITH (MODULUS 2, 
REMAINDER 1);

CREATE INDEX idx_test_a ON test (a);
CREATE INDEX idx_test_b ON test (b);

INSERT INTO test (SELECT i,i FROM generate_series(1, 100) AS i);

ANALYZE;
-- test.sql --

and

UPDATE test SET b = 1 WHERE a = ?
DELETE FROM test WHERE a = ?

both shows that all partitions are scanned;

 Update on test
   Update on test_p00
   Update on test_p01
   ->  Index Scan using test_p00_a_idx on test_p00
 Index Cond: (a = 1)
   ->  Index Scan using test_p01_a_idx on test_p01
 Index Cond: (a = 1)

Using prune_v32 and runtime_v11 with conflicts resolved.

Best regards,
 Jesper



Re: [HACKERS] Runtime Partition Pruning

2018-02-22 Thread David Rowley
On 22 February 2018 at 22:31, Amit Langote
 wrote:
> Some comments:

Hi Amit,

Thanks for looking at this. I'll work through your comments and
produce a patch sometime in the near future.

One problem that I'm facing now is down to the way I'm gathering the
ParamIds that match the partkeys. As you'll see from the patch I've
added a 'paramids' field to PartitionPruneContext and I'm populating
this when the clauses are being pre-processed in
extract_partition_clauses(). The problem is that the or_clauses are
not pre-processed at all, so the current patch will not properly
perform run-time pruning when the Params are "hidden" in OR branches.

One way I thought to fix this was to change the clause processing to
create an array of PartitionClauseInfos, one for each OR branch. This
would also improve the performance of the run-time pruning, meaning
that all of the or clauses would be already matched to the partition
keys once, rather than having to redo that again each time a Param
changes its value.

If I go and write a patch to do that, would you want it in your patch,
or would you rather I kept it over here?  Or perhaps you have a better
idea on how to solve...?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-02-22 Thread Amit Langote
Hi David.

On 2018/02/21 18:06, David Rowley wrote:
> Other things I don't particularly like about the current patch are how
> I had to construct the partition key expressions in
> set_valid_runtime_subplans_recurse(). This pretty much uses code
> borrowed from set_baserel_partition_key_exprs(). One way around that
> would be to change the partprune.c code to deal with the
> partkey->partattrs and consume an expression from the list on attnum =
> 0. I didn't do that as I want to minimise how much I touch Amit's
> patch before it gets committed as doing so would likely just cause
> more headaches for me keeping this merged with his work. Another
> option to resolve this would be to put the partition key expressions
> into the PartitionPruneInfo.

Another way could be to refactor the code you've borrowed from
set_baserel_partition_key_exprs() into its own function that is exported
by some optimizer header.

I removed PartitionKey/Relation from signatures of various functions of my
patch to avoid having to re-heap_open() the relation per [1].

> I've attached v11 of the patch.

Some comments:

* I noticed that the patch adds a function to bitmapset.c which you might
want to extract into its own patch, like your patch to add bms_add_range()
that got committed as 84940644d [2].

* Maybe it's better to add `default: break;` in the two switch's
you've added in partkey_datum_from_expr()

partprune.c: In function ‘partkey_datum_from_expr’:
partprune.c:1555:2: warning: enumeration value ‘T_Invalid’ not handled in
switch [-Wswitch]
  switch (nodeTag(expr))

partprune.c:1562:4: warning: enumeration value ‘PARAM_SUBLINK’ not handled
in switch [-Wswitch]
switch (((Param *) expr)->paramkind)

* I see that you moved PartitionClauseInfo's definition from partprune.c
to partition.h.  Isn't it better to move it to partprune.h?

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmoabi-29Vs8H0xkjtYB%3DcU%2BGVCrNwPz7okpa3KsoLmdEUQ%40mail.gmail.com

[2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=84940644d




Re: [HACKERS] Runtime Partition Pruning

2018-02-21 Thread David Rowley
On 21 February 2018 at 22:45, Rajkumar Raghuwanshi
 wrote:
> select count(*) from prt1 x where (x.a,x.b) in (select t1.a,t2.b from prt1
> t1,prt2 t2 where t1.a=t2.b)
> and (x.c) in (select t3.c from plt1 t3,plt2 t4 where t3.c=t4.c);

Thanks for the test case.

It seems like the x.c Param for some reason has a ParamId of 0. I need
to go learn a bit more about Params to understand why this is.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-02-21 Thread Rajkumar Raghuwanshi
On Wed, Feb 21, 2018 at 2:36 PM, David Rowley 
wrote:

> I've attached v11 of the patch.
>

Hi,

I have applied attached patch on head
"6f1d723b6359507ef55a81617167507bc25e3e2b" over Amit's v30 patches. while
testing further I got a server crash with below test case. Please take a
look.

CREATE TABLE prt1 (a int, b int, c varchar) PARTITION BY RANGE(a);
CREATE TABLE prt1_p1 PARTITION OF prt1 FOR VALUES FROM (0) TO (250);
CREATE TABLE prt1_p3 PARTITION OF prt1 FOR VALUES FROM (500) TO (600);
CREATE TABLE prt1_p2 PARTITION OF prt1 FOR VALUES FROM (250) TO (500);
INSERT INTO prt1 SELECT i, i, to_char(i, 'FM') FROM generate_series(0,
599, 2) i;
CREATE INDEX iprt1_p1_a on prt1_p1(a);
CREATE INDEX iprt1_p2_a on prt1_p2(a);
CREATE INDEX iprt1_p3_a on prt1_p3(a);
ANALYZE prt1;

CREATE TABLE prt2 (a int, b int, c varchar) PARTITION BY RANGE(b);
CREATE TABLE prt2_p1 PARTITION OF prt2 FOR VALUES FROM (0) TO (250);
CREATE TABLE prt2_p2 PARTITION OF prt2 FOR VALUES FROM (250) TO (500);
CREATE TABLE prt2_p3 PARTITION OF prt2 FOR VALUES FROM (500) TO (600);
INSERT INTO prt2 SELECT i, i, to_char(i, 'FM') FROM generate_series(0,
599, 3) i;
CREATE INDEX iprt2_p1_b on prt2_p1(b);
CREATE INDEX iprt2_p2_b on prt2_p2(b);
CREATE INDEX iprt2_p3_b on prt2_p3(b);
ANALYZE prt2;

CREATE TABLE plt1 (a int, b int, c text) PARTITION BY LIST(c);
CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN ('', '0003',
'0004', '0010');
CREATE TABLE plt1_p2 PARTITION OF plt1 FOR VALUES IN ('0001', '0005',
'0002', '0009');
CREATE TABLE plt1_p3 PARTITION OF plt1 FOR VALUES IN ('0006', '0007',
'0008', '0011');
INSERT INTO plt1 SELECT i, i, to_char(i/50, 'FM') FROM
generate_series(0, 599, 2) i;
CREATE INDEX iplt1_p1_c on plt1_p1(c);
CREATE INDEX iplt1_p2_c on plt1_p2(c);
CREATE INDEX iplt1_p3_c on plt1_p3(c);
ANALYZE plt1;

CREATE TABLE plt2 (a int, b int, c text) PARTITION BY LIST(c);
CREATE TABLE plt2_p1 PARTITION OF plt2 FOR VALUES IN ('', '0003',
'0004', '0010');
CREATE TABLE plt2_p2 PARTITION OF plt2 FOR VALUES IN ('0001', '0005',
'0002', '0009');
CREATE TABLE plt2_p3 PARTITION OF plt2 FOR VALUES IN ('0006', '0007',
'0008', '0011');
INSERT INTO plt2 SELECT i, i, to_char(i/50, 'FM') FROM
generate_series(0, 599, 3) i;
CREATE INDEX iplt2_p1_c on plt2_p1(c);
CREATE INDEX iplt2_p2_c on plt2_p2(c);
CREATE INDEX iplt2_p3_c on plt2_p3(c);
ANALYZE plt2;

select count(*) from prt1 x where (x.a,x.b) in (select t1.a,t2.b from prt1
t1,prt2 t2 where t1.a=t2.b)
and (x.c) in (select t3.c from plt1 t3,plt2 t4 where t3.c=t4.c);

/*
postgres=# select count(*) from prt1 x where (x.a,x.b) in (select t1.a,t2.b
from prt1 t1,prt2 t2 where t1.a=t2.b)
postgres-# and (x.c) in (select t3.c from plt1 t3,plt2 t4 where t3.c=t4.c);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
*/

stack-trace give below :

/*
(gdb) bt
#0  0x006ce6dc in ExecEvalParamExec (state=0x26e9ee0, op=0x26e9f78,
econtext=0x26ea390) at execExprInterp.c:
#1  0x006cc66a in ExecInterpExpr (state=0x26e9ee0,
econtext=0x26ea390, isnull=0x7ffe0f75d77f "") at execExprInterp.c:1024
#2  0x006cdd8c in ExecInterpExprStillValid (state=0x26e9ee0,
econtext=0x26ea390, isNull=0x7ffe0f75d77f "") at execExprInterp.c:1819
#3  0x007db078 in ExecEvalExprSwitchContext (state=0x26e9ee0,
econtext=0x26ea390, isNull=0x7ffe0f75d77f "") at
../../../../src/include/executor/executor.h:305
#4  0x007e2072 in evaluate_expr (expr=0x26a3cb0, result_type=25,
result_typmod=-1, result_collation=0) at clauses.c:4890
#5  0x007e588a in partkey_datum_from_expr (context=0x26d3180,
parttypid=25, expr=0x26a3cb0, value=0x7ffe0f75da00) at partprune.c:1504
#6  0x007e5243 in extract_bounding_datums (context=0x26d3180,
minimalclauses=0x7ffe0f75d900, keys=0x7ffe0f75da00) at partprune.c:1307
#7  0x007e377d in get_partitions_from_clauses (context=0x26d3180)
at partprune.c:273
#8  0x006ea2ec in set_valid_runtime_subplans_recurse
(node=0x269bf90, pinfo=0x7f6cf6765cf0, ctxcache=0x26d3158,
validsubplans=0x7ffe0f75de10) at nodeAppend.c:771
#9  0x006e9ebf in set_valid_runtime_subplans (node=0x269bf90) at
nodeAppend.c:640
#10 0x006e99b5 in choose_next_subplan_locally (node=0x269bf90) at
nodeAppend.c:426
#11 0x006e9598 in ExecAppend (pstate=0x269bf90) at nodeAppend.c:224
#12 0x006deb3a in ExecProcNodeFirst (node=0x269bf90) at
execProcnode.c:446
#13 0x006fb9ee in ExecProcNode (node=0x269bf90) at
../../../src/include/executor/executor.h:239
#14 0x006fbcc4 in ExecHashJoinImpl (pstate=0x2697808, parallel=0
'\000') at nodeHashjoin.c:262
#15 0x006fc3fd in ExecHashJoin (pstate=0x2697808) at
nodeHashjoin.c:565
#16 0x006deb3a in ExecProcNodeFirst (node=0x2697808) at
execProcnode.c:446
#17 0x0070c376 in ExecProcNode 

Re: [HACKERS] Runtime Partition Pruning

2018-02-20 Thread David Rowley
On 20 February 2018 at 23:46, Rajkumar Raghuwanshi
 wrote:
> I have applied v10 patch on Amit's v27 over head ad7dbee36. I got "ERROR:
> partition missing from Append subplans" with the patch. on head and only
> with Amit's patches query is working fine, Please find test case below.

Thanks for the test case. I can recreate locally. This is down to the
fact that make_partition_pruneinfo() only makes sub-PartitionPruneInfo
for actual subpaths found in the Append. Your test case happens to
match both the part_p1 and part_p2 partitions on the first level of
iteration, but since no child of part_p2 is found in
make_partition_pruneinfo, that element in the subpartindex never gets
set.

The fix might be to just remove the error and silently ignore those
cases, but I was hoping to keep that around as it might catch other
bugs. I'm just not sure yet how to do both.

I'll rebase this on Amit's latest patch and have a think about it
while doing that.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-02-20 Thread Rajkumar Raghuwanshi
On Sat, Feb 17, 2018 at 2:27 PM, David Rowley 
wrote:

> Hi,
>
> I've attached an updated patch, now at v10. v9 was short lived due to
> the evolution of Amit's which which this based on.
>
> This version is based on Amit's v27 of faster partition pruning [1]
> which can be applied atop of ad7dbee36.
>

Hi,

I have applied v10 patch on Amit's v27 over head ad7dbee36. I got "ERROR:
partition missing from Append subplans" with the patch. on head and only
with Amit's patches query is working fine, Please find test case below.

CREATE TABLE part ( c1 INT2, c2 DATE) PARTITION BY RANGE (c1);
CREATE TABLE part_p1 PARTITION OF part FOR VALUES FROM (0) TO (141)
PARTITION BY RANGE(c2);
CREATE TABLE part_p11 PARTITION OF part_p1 FOR VALUES FROM ('1/1/1997') TO
('2/1/1999');
CREATE TABLE part_p12 PARTITION OF part_p1 FOR VALUES FROM ('2/1/1999') TO
('2/1/2000');
CREATE TABLE part_p2 PARTITION OF part FOR VALUES FROM (141) TO (211)
PARTITION BY RANGE(c2);
CREATE TABLE part_p21 PARTITION OF part_p2 FOR VALUES FROM ('1/1/2000') TO
('2/1/2001');
CREATE TABLE part_p22 PARTITION OF part_p2 FOR VALUES FROM ('2/1/2001') TO
('2/1/2006');

INSERT INTO part VALUES (100,'1/1/1999');
INSERT INTO part VALUES (110,'1/1/1998');
INSERT INTO part VALUES (130,'1/1/2000');
INSERT INTO part VALUES (170,'1/1/2000');
INSERT INTO part VALUES (180,'1/1/2001');
INSERT INTO part VALUES (190,'1/1/2006');
INSERT INTO part VALUES (200,'1/1/2000');
INSERT INTO part VALUES (210,'1/1/2002');

postgres=# PREPARE RTP AS SELECT * FROM PART WHERE c2 BETWEEN '1/1/1998'
AND '1/1/1999';
PREPARE
postgres=# EXPLAIN execute RTP;
 QUERY
PLAN
-
 Append  (cost=0.00..46.00 rows=12 width=6)
   ->  Seq Scan on part_p11  (cost=0.00..46.00 rows=12 width=6)
 Filter: ((c2 >= '1998-01-01'::date) AND (c2 <= '1999-01-01'::date))
(3 rows)

postgres=# execute RTP;
*ERROR:  partition missing from Append subplans*

deallocate RTP;
DROP TABLE part;

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


Re: [HACKERS] Runtime Partition Pruning

2018-02-17 Thread David Rowley
Hi,

I've attached an updated patch, now at v10. v9 was short lived due to
the evolution of Amit's which which this based on.

This version is based on Amit's v27 of faster partition pruning [1]
which can be applied atop of ad7dbee36.

I've done a self review of this, but I've not yet done any final
polishing work as the faster partition pruning patch is still
evolving. I will, for example, likely need to do some more work in
nodeAppend.c to add a few more comments and probably think of better
names for a few new things that have made a first appearance in this
version of the patch

[1] 
https://www.postgresql.org/message-id/520f8a71-286d-e36d-34cf-363fd7436...@lab.ntt.co.jp

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


runtime_prune_drowley_v10.patch
Description: Binary data


Re: [HACKERS] Runtime Partition Pruning

2018-01-09 Thread David Rowley
On Tue, Jan 9, 2018 at 2:24 PM, Beena Emerson  wrote:
> prepare abc_q1 (int, int, char) as select * from ab_c where a BETWEEN
> $1 and $2 AND b <= $3;
>
> --after 5 runs: abc_a2_b3 is not pruned.

This seems to be down to an existing bug. I'm not yet sure if it's
caused by faster_partition_prune_v17, or if it exists in master.
Basically RelOptInfo->partition_rels can contain duplicates for
relations. In your example while debugging make_partition_pruneinfo I
see:

list_nth_int(best_path->partitioned_rels,0)
1
list_nth_int(best_path->partitioned_rels,1)
3
list_nth_int(best_path->partitioned_rels,2)
8
list_nth_int(best_path->partitioned_rels,3)
13
list_nth_int(best_path->partitioned_rels,4)
3
list_nth_int(best_path->partitioned_rels,5)
8
list_nth_int(best_path->partitioned_rels,6)
13

There should only be 4 items in this list, not 7.

make_partition_pruneinfo might have been a bit naive to assume this
couldn't happen, so I've coded it to be a bit more resilient to this
happening. It'll still end up creating another sub-PartitionPruneInfo
and slotting into the same place, but it'll no longer attempt to
translate the prunequals twice... which was what was causing the
problem. I'd been a bit sloppy and assigned the output of
adjust_appendrel_attrs() back to the prunequals which is a parameter
to the function instead of assigning to a local variable like I've
done now.


On 9 January 2018 at 22:22, Beena Emerson  wrote:
>> postgres=# explain (analyze, costs off, summary off, timing off)
>> execute abc_q1 (1, 2);
>> ERROR:  partition missing from Append subplans

This also seems to be fixed by the above fix.

> =# explain (analyze, costs off, summary off, timing off) execute abc_q1 (1, 
> 1);
> ERROR:  operator 1057 is not a member of opfamily 1976

This seems to be broken in faster_partition_prune_v17 where in
classify_partition_bounding_keys() the code properly checks if the
clause matches the partition key for OpExpr, but fails to do the same
for ScalarArrayOpExpr. I'll report to Amit on the thread for that
patch.

I'll also investigate the duplication in RelOptInfo->partition_rels
and report that in another thread.

Can you confirm that case 1 and 2 are working with the attached?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


runtime_prune_drowley_v8.patch
Description: Binary data


Re: [HACKERS] Runtime Partition Pruning

2018-01-09 Thread David Rowley
On 9 January 2018 at 22:22, Beena Emerson  wrote:
> ERROR:  operator 1057 is not a member of opfamily 1976

Thanks for finding these. I'm looking into the above, and the other
ones you've mentioned now.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-01-09 Thread Beena Emerson
Hi,

The mail was accidently sent before I could complete.

On Tue, Jan 9, 2018 at 2:24 PM, Beena Emerson  wrote:
> Hello,
>
> The pruning does not work well with char type:
>
> Case 2: Case with optimizer pruning
> drop table ab_c;
>  create table ab_c (a int not null, b int) partition by list(a);
>  create table abc_a2 (b int, a int not null) partition by list(b);
>   create table abc_a2_b1 partition of abc_a2 for values in (1);
>   create table abc_a2_b2 partition of abc_a2 for values in (2);
>   create table abc_a2_b3 partition of abc_a2 for values in (3);
>   alter table ab_c attach partition abc_a2 for values in (2);
>   create table abc_a1 partition of ab_c for values in(1) partition by list 
> (b);
>   create table abc_a1_b1 partition of abc_a1 for values in (1);
>   create table abc_a1_b2 partition of abc_a1 for values in (2);
>   create table abc_a1_b3 partition of abc_a1 for values in (3);
>   create table abc_a3 partition of ab_c for values in(3) partition by list 
> (b);
>   create table abc_a3_b1 partition of abc_a3 for values in (1);
>   create table abc_a3_b2 partition of abc_a3 for values in (2);
>   create table abc_a3_b3 partition of abc_a3 for values in (3);
>   deallocate abc_q1;

Prepared statement is missing:
 prepare abc_q1 (int, int) as select a,b from ab_c where a BETWEEN $1
and $2 AND b IN (3, 2);
>
>
> =# explain (analyze, costs off, summary off, timing off) execute abc_q1 (1, 
> 1);
>   QUERY PLAN
> --
>  Append (actual rows=2 loops=1)
>->  Seq Scan on abc_a1_b2 (actual rows=1 loops=1)
>  Filter: ((a >= $1) AND (a <= $2) AND (b = ANY ('{3,2}'::integer[])))
>->  Seq Scan on abc_a1_b3 (actual rows=1 loops=1)
>  Filter: ((a >= $1) AND (a <= $2) AND (b = ANY ('{3,2}'::integer[])))
>->  Seq Scan on abc_a2_b2 (never executed)
>  Filter: ((a >= $1) AND (a <= $2) AND (b = ANY ('{3,2}'::integer[])))
>->  Seq Scan on abc_a2_b3 (never executed)
>  Filter: ((a >= $1) AND (a <= $2) AND (b = ANY ('{3,2}'::integer[])))
>->  Seq Scan on abc_a3_b2 (never executed)
>  Filter: ((a >= $1) AND (a <= $2) AND (b = ANY ('{3,2}'::integer[])))
>->  Seq Scan on abc_a3_b3 (never executed)
>  Filter: ((a >= $1) AND (a <= $2) AND (b = ANY ('{3,2}'::integer[])))
> (13 rows)
>
> postgres=# explain (analyze, costs off, summary off, timing off)
> execute abc_q1 (1, 2);
> ERROR:  partition missing from Append subplans

These work fine when the column order of subpartitons are not changed.

Case 3: Optimizer pruning with char types:
Same as case1 with all subpartitions having same col order as parent.

drop table ab_c;
 create table ab_c (a int not null, b char) partition by list(a);
 create table abc_a2 ( a int not null, b char) partition by list(b);
  create table abc_a2_b1 partition of abc_a2 for values in ('1');
  create table abc_a2_b2 partition of abc_a2 for values in ('2');
  create table abc_a2_b3 partition of abc_a2 for values in ('3');
  alter table ab_c attach partition abc_a2 for values in (2);
  create table abc_a1 partition of ab_c for values in(1) partition by list (b);
  create table abc_a1_b1 partition of abc_a1 for values in ('1');
  create table abc_a1_b2 partition of abc_a1 for values in ('2');
  create table abc_a1_b3 partition of abc_a1 for values in ('3');
  create table abc_a3 partition of ab_c for values in(3) partition by list (b);
  create table abc_a3_b1 partition of abc_a3 for values in ('1');
  create table abc_a3_b2 partition of abc_a3 for values in ('2');
  create table abc_a3_b3 partition of abc_a3 for values in ('3');

deallocate abc_q1;
  prepare abc_q1 (int, int) as select a,b from ab_c where a BETWEEN $1
and $2 AND b IN ('3', '2');

-- b4 runtime pruning
=# explain (analyze, costs off, summary off, timing off) execute abc_q1 (1, 8);
QUERY PLAN
---
 Append (actual rows=0 loops=1)
   ->  Seq Scan on abc_a1_b2 (actual rows=0 loops=1)
 Filter: ((a >= 1) AND (a <= 8) AND (b = ANY ('{3,2}'::bpchar[])))
   ->  Seq Scan on abc_a1_b3 (actual rows=0 loops=1)
 Filter: ((a >= 1) AND (a <= 8) AND (b = ANY ('{3,2}'::bpchar[])))
   ->  Seq Scan on abc_a2_b2 (actual rows=0 loops=1)
 Filter: ((a >= 1) AND (a <= 8) AND (b = ANY ('{3,2}'::bpchar[])))
   ->  Seq Scan on abc_a2_b3 (actual rows=0 loops=1)
 Filter: ((a >= 1) AND (a <= 8) AND (b = ANY ('{3,2}'::bpchar[])))
   ->  Seq Scan on abc_a3_b2 (actual rows=0 loops=1)
 Filter: ((a >= 1) AND (a <= 8) AND (b = ANY ('{3,2}'::bpchar[])))
   ->  Seq Scan on abc_a3_b3 (actual rows=0 loops=1)
 Filter: ((a >= 1) AND (a <= 8) AND (b = ANY ('{3,2}'::bpchar[])))
(13 rows)

-- after 5 runs

=# explain (analyze, costs off, summary off, timing off) execute abc_q1 (1, 1);
ERROR:  operator 1057 is not a 

Re: [HACKERS] Runtime Partition Pruning

2018-01-09 Thread Beena Emerson
Hello,

On Sun, Jan 7, 2018 at 5:31 PM, David Rowley
 wrote:
> On 7 January 2018 at 00:03, David Rowley  wrote:
>> I've fixed this in the attached, but I did so by calling
>> adjust_appendrel_attrs() from the nodeAppend.c, which did, of course,
>> mean that the AppendRelInfo needed to be given to the executor. I was
>> also a bit unsure what exactly I should be doing in primnodes.h, since
>> I've put PartitionPruneInfo in there, but AppendRelInfo is not. I
>> stuck a quick declaration of AppendRelInfo in primnode.h with an XXX
>> comment so we don't forget to think about that again.
>
> Actually, this was not a very smart fix for the problem. It seems much
> better to make the prune qual part of PartitionPruneInfo and just have
> the planner translate the qual to what's required for the partition
> that the PartitionPruneInfo belongs to. This means we no longer need
> to use the Append's qual to store the prune qual and that all the
> pruning information for one partition is now neatly in a single
> struct.
>
> I've attached a patch which does things like this.


The pruning does not work well with char type:

Case: A subpartition has a different col order and the subpartitioned
col is type char.

drop table ab_c;
 create table ab_c (a int not null, b char) partition by list(a);
  create table abc_a2 (b char, a int not null) partition by list(b);
  create table abc_a2_b1 partition of abc_a2 for values in ('1');
  create table abc_a2_b2 partition of abc_a2 for values in ('2');
  create table abc_a2_b3 partition of abc_a2 for values in ('3');
  alter table ab_c attach partition abc_a2 for values in (2);
  create table abc_a1 partition of ab_c for values in(1) partition by list (b);
  create table abc_a1_b1 partition of abc_a1 for values in ('1');
  create table abc_a1_b2 partition of abc_a1 for values in ('2');
  create table abc_a1_b3 partition of abc_a1 for values in ('3');
  create table abc_a3 partition of ab_c for values in(3) partition by list (b);
  create table abc_a3_b1 partition of abc_a3 for values in ('1');
  create table abc_a3_b2 partition of abc_a3 for values in ('2');
  create table abc_a3_b3 partition of abc_a3 for values in ('3');
  deallocate abc_q1;

INSERT INTO ab_c VALUES (1,'1'), (1,'2'), (1,'3');
INSERT INTO ab_c VALUES (2,'1'), (2,'2'), (2,'3');
INSERT INTO ab_c VALUES (3,'1'), (3,'2'), (3,'3');

prepare abc_q1 (int, int, char) as select * from ab_c where a BETWEEN
$1 and $2 AND b <= $3;

--after 5 runs: abc_a2_b3 is not pruned.

# explain (analyze, costs off, summary off, timing off) execute abc_q1
(1, 2, '2');
   QUERY PLAN
-
 Append (actual rows=4 loops=1)
   ->  Seq Scan on abc_a1_b1 (actual rows=1 loops=1)
 Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
   ->  Seq Scan on abc_a1_b2 (actual rows=1 loops=1)
 Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
   ->  Seq Scan on abc_a1_b3 (never executed)
 Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
   ->  Seq Scan on abc_a2_b1 (actual rows=1 loops=1)
 Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
   ->  Seq Scan on abc_a2_b2 (actual rows=1 loops=1)
 Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
   ->  Seq Scan on abc_a2_b3 (actual rows=0 loops=1)
 Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
 Rows Removed by Filter: 1
   ->  Seq Scan on abc_a3_b1 (never executed)
 Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
   ->  Seq Scan on abc_a3_b2 (never executed)
 Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
   ->  Seq Scan on abc_a3_b3 (never executed)
 Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
(20 rows)


Case 2: Case with optimizer pruning
drop table ab_c;
 create table ab_c (a int not null, b int) partition by list(a);
 create table abc_a2 (b int, a int not null) partition by list(b);
  create table abc_a2_b1 partition of abc_a2 for values in (1);
  create table abc_a2_b2 partition of abc_a2 for values in (2);
  create table abc_a2_b3 partition of abc_a2 for values in (3);
  alter table ab_c attach partition abc_a2 for values in (2);
  create table abc_a1 partition of ab_c for values in(1) partition by list (b);
  create table abc_a1_b1 partition of abc_a1 for values in (1);
  create table abc_a1_b2 partition of abc_a1 for values in (2);
  create table abc_a1_b3 partition of abc_a1 for values in (3);
  create table abc_a3 partition of ab_c for values in(3) partition by list (b);
  create table abc_a3_b1 partition of abc_a3 for values in (1);
  create table abc_a3_b2 partition of abc_a3 for values in (2);
  create table abc_a3_b3 partition of abc_a3 for values in (3);
  deallocate abc_q1;


=# explain (analyze, costs off, summary off, timing off) execute abc_q1 (1, 1);
  QUERY PLAN
--
 Append 

Re: [HACKERS] Runtime Partition Pruning

2018-01-07 Thread David Rowley
On 7 January 2018 at 00:03, David Rowley  wrote:
> I've fixed this in the attached, but I did so by calling
> adjust_appendrel_attrs() from the nodeAppend.c, which did, of course,
> mean that the AppendRelInfo needed to be given to the executor. I was
> also a bit unsure what exactly I should be doing in primnodes.h, since
> I've put PartitionPruneInfo in there, but AppendRelInfo is not. I
> stuck a quick declaration of AppendRelInfo in primnode.h with an XXX
> comment so we don't forget to think about that again.

Actually, this was not a very smart fix for the problem. It seems much
better to make the prune qual part of PartitionPruneInfo and just have
the planner translate the qual to what's required for the partition
that the PartitionPruneInfo belongs to. This means we no longer need
to use the Append's qual to store the prune qual and that all the
pruning information for one partition is now neatly in a single
struct.

I've attached a patch which does things like this.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


runtime_prune_drowley_v7.patch
Description: Binary data


Re: [HACKERS] Runtime Partition Pruning

2018-01-04 Thread Alvaro Herrera
David Rowley wrote:

> Looks like it's down to ExplainPropertyFloat() having
> machine-dependent behaviour.
> 
> On the machine that I was working with when testing this the following
> code outputs "1"
> [ sample code ]
> 
> but on your machine it must be outputting "0"?

Yeah, it does.  Thanks for updating --- I'll look at your patch
tomorrow.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-01-04 Thread David Rowley
On 5 January 2018 at 05:37, Alvaro Herrera  wrote:
> I tried this patch (applying it on Amit's last current version on top of
> 4e2970f8807f which is the latest it applies to) and regression tests
> fail with the attached diff; in all cases it appears to be an off-by-one
> in row count.  Would you please give it a look?

Thanks for testing. I've attached an updated patch which hopefully fixes this.

I've only thing I did to fix it was to alter the tests a bit so that
the row counts in explain are evenly divisible by the nloops or
parallel workers. Looks like it was failing due to platform dependent
behaviour in printf.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


runtime_prune_drowley_v5.patch
Description: Binary data


Re: [HACKERS] Runtime Partition Pruning

2018-01-01 Thread David Rowley
On 1 January 2018 at 19:22, Beena Emerson  wrote:
> I think you are testing without asserts

Yeah, I was indeed. Oops.

> The following assert fails: src/backend/optimizer/plan/setrefs.c :
> set_plan_refs: ln 921
> Assert(splan->plan.qual == NIL);
> Append node now has runtime partition quals.
>
> Also since the valid subplans are set in ExecInitAppend, the queries
> with Init Plans do not work. I had moved it to ExecAppend in my patch
> to handle the InitPlans as well.

Thanks for noticing. I've now changed things around so this case works
as it should and I've added a test too.

I've attached an updated patch which also fixes a number of other
problems with my previous patch.

1. The Bitmapset I was using in nodeAppend.c to mark the valid
subplans was pretty bogus for Parallel Append since the memory for the
set was not in shared memory. I changed things around to reuse the
pa_finished[] array and the patch just now sets pa_finished to true
for any invalid subplans.
2. I've added a new memory context to use in nodeAppend.c which is
used to call the planner code to determine which partitions are valid.
I'd been trying to have Amit be careful to pfree() everything in his
v17 patch, but I realised it was just not possible to get everything
pfree'd. I found it pretty easy to construct a test case which caused
an OOM.
3. I've added support for IN lists to be pruned when the IN() list
contains a parameter. The changes I made to support this case probably
mostly belong in Amit's faster partition pruning patch, but I've put
them here for now to get this case working. There's a bunch of new
tests to test this.
4. Various other cosmetic improvements.

The attached patch should be applied after patching master with Amit's
v17 faster partition pruning patch [1].

[1] 
https://www.postgresql.org/message-id/58c3e20a-a964-4fdb-4e7d-bd833e9be...@lab.ntt.co.jp

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


runtime_prune_drowley_v5.patch
Description: Binary data


Re: [HACKERS] Runtime Partition Pruning

2017-12-31 Thread Beena Emerson
Hello David,

On Wed, Dec 27, 2017 at 8:36 AM, David Rowley
 wrote:
> Hi,
>
> Please find attached my 4th version this patch.

Thanks for the patch
>
> This is now based on v17 of Amit's faster partition pruning patch [1].
> It also now includes Beena's tests which I've done some mostly
> cosmetic changes to.
>
> I've also fixed a few bugs, one in a case where I was not properly
> handling zero matching partitions in nodeAppend.c.
>
> Another change I've made is to now perform the partition pruning at
> run-time using a new memory context that's reset each time we
> redetermine the matching partitions. This was required since we're
> calling a planner function which might not be too careful about
> pfreeing memory it allocates. A test case I was running before making
> this change ended out failing to palloc memory due to OOM.
>
> I've not done anything about reducing the cost of the Append path when
> runtime pruning is enabled. I'm still thinking over the best way to
> handle that.
>

I think you are testing without asserts

The following assert fails: src/backend/optimizer/plan/setrefs.c :
set_plan_refs: ln 921
Assert(splan->plan.qual == NIL);
Append node now has runtime partition quals.

Also since the valid subplans are set in ExecInitAppend, the queries
with Init Plans do not work. I had moved it to ExecAppend in my patch
to handle the InitPlans as well.

DROP TABLE IF EXISTS prun_test_part;
CREATE TABLE prun_test_part (sal int) PARTITION BY RANGE(sal);
CREATE TABLE prun_test_part_p1 PARTITION OF prun_test_part FOR VALUES
FROM (0) TO (100);
CREATE TABLE prun_test_part_p2 PARTITION OF prun_test_part FOR VALUES
FROM (100) TO (200);
CREATE TABLE prun_test_part_p3 PARTITION OF prun_test_part FOR VALUES
FROM (200) TO (300);
CREATE TABLE prun_test_part_p4 PARTITION OF prun_test_part FOR VALUES
FROM (300) TO (400);

INSERT INTO prun_test_part VALUES (90), (100), (110), (200), (210),
(300), (310);
=# explain (analyze, costs off, summary off, timing off) SELECT * FROM
prun_test_part WHERE sal < (SELECT sal FROM prun_test_part WHERE sal =
200);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Runtime Partition Pruning

2017-12-26 Thread David Rowley
Hi,

Please find attached my 4th version this patch.

This is now based on v17 of Amit's faster partition pruning patch [1].
It also now includes Beena's tests which I've done some mostly
cosmetic changes to.

I've also fixed a few bugs, one in a case where I was not properly
handling zero matching partitions in nodeAppend.c.

Another change I've made is to now perform the partition pruning at
run-time using a new memory context that's reset each time we
redetermine the matching partitions. This was required since we're
calling a planner function which might not be too careful about
pfreeing memory it allocates. A test case I was running before making
this change ended out failing to palloc memory due to OOM.

I've not done anything about reducing the cost of the Append path when
runtime pruning is enabled. I'm still thinking over the best way to
handle that.

[1] 
https://www.postgresql.org/message-id/58c3e20a-a964-4fdb-4e7d-bd833e9be...@lab.ntt.co.jp

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


runtime_prune_drowley_v4.patch
Description: Binary data


Re: [HACKERS] Runtime Partition Pruning

2017-12-22 Thread Robert Haas
On Thu, Dec 21, 2017 at 8:37 PM, David Rowley
 wrote:
>> No, I don't think comparing to previous custom plans is a workable
>> approach.  I was thinking, rather, that if we know for example that
>> we've doing pruning on partition_column = $1, then we know that only
>> one partition will match.  That's probably a common case.  If we've
>> got partition_column > $1, we could assume that, say, 75% of the
>> partitions would match.  partition_column BETWEEN $1 and $2 is
>> probably a bit more selective, so maybe we assume 50% of the
>> partitions would match.
>
> Okay. Do you think this is something we need to solve for this patch?
> When I complained originally I didn't quite see any way to even test
> the majority of this patch with the regression tests, but Beena has
> since proven me wrong about that.

Although I have done one round of view of this patch, I haven't really
got my head around it completely yet and I haven't spent of time on it
yet, so my opinions are not as well-formed as maybe they should be.
I'm glad, by the way, that you are putting some effort into it, as I
think that will help move this forward more quickly.   At a high
level, I want to avoid trying to solve too many problems in one patch
(which was the motivation behind my comment near the top of the
thread), but I also want to end up with something useful (which I
think is your concern).

Leaving aside the difficulty of implementation, I have some questions
about what the right thing to do actually is.  In a simple case, I'm
guessing that the cost of creating a custom plan will exceed the
amount that the plan saves, but in more complex cases, I'm not sure
that will be true.  For instance, if we know the particular parameter
value at plan time, we can make a more accurate estimate of how many
times that value appears, which can then feed into our choice of what
plan shape to use.  That is, for a query like SELECT * FROM a JOIN b
ON a.x = b.x WHERE a.y = $1, the generic plan might choose, say, a
nested loop with b on the inner side, but if we know that a particular
value for $1 will match a lot of rows in a, we might prefer a hash or
merge join for that specific case.  Run-time pruning doesn't give us
that flexibility.  My intuition is that the more complex we make the
query, the more point there will be to making custom plans, and the
simpler the query, the more likely it is that a generic plan will be
good enough that it's not worth replanning every time.

Now, in my experience, the current system for custom plans vs. generic
plans doesn't approach the problem in this way at all, and in my
experience that results in some pretty terrible behavior.  It will do
things like form a custom plan every time because the estimated cost
of the custom plan is lower than the estimated cost of the generic
plan even though the two plans are structurally identical; only the
estimates differ.  It will waste gobs of CPU cycles by replanning a
primary key lookup 5 times just on the off chance that a lookup on the
primary key index isn't the best option.  But this patch isn't going
to fix any of that.  The best we can probably do is try to adjust the
costing for Append paths in some way that reflects the costs and
benefits of pruning.  I'm tentatively in favor of trying to do
something modest in that area, but I don't have a detailed proposal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Runtime Partition Pruning

2017-12-21 Thread David Rowley
On 22 December 2017 at 14:29, Robert Haas  wrote:
> On Thu, Dec 21, 2017 at 6:53 PM, David Rowley
>  wrote:
>> Are you considering some sort of clauselist_selectivity() estimate on
>> the given parameters and comparing that to the same selectivities that
>> were determined for the previous custom plans?
>
> No, I don't think comparing to previous custom plans is a workable
> approach.  I was thinking, rather, that if we know for example that
> we've doing pruning on partition_column = $1, then we know that only
> one partition will match.  That's probably a common case.  If we've
> got partition_column > $1, we could assume that, say, 75% of the
> partitions would match.  partition_column BETWEEN $1 and $2 is
> probably a bit more selective, so maybe we assume 50% of the
> partitions would match.

Okay. Do you think this is something we need to solve for this patch?
When I complained originally I didn't quite see any way to even test
the majority of this patch with the regression tests, but Beena has
since proven me wrong about that.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2017-12-21 Thread Robert Haas
On Thu, Dec 21, 2017 at 6:53 PM, David Rowley
 wrote:
> On 22 December 2017 at 12:45, Robert Haas  wrote:
>> PREPARE q3 (int, int) AS SELECT * FROM unpartitioned_table WHERE
>> partkey BETWEEN $1 AND $2;
>>
>> ...that has to decide whether to use an index.  And to do that it has
>> to estimate what fraction of the table will match the BETWEEN clause.
>> That may be an uninformed guess, but it guesses something.  We
>> probably want to do something here that makes the guess for a
>> partitioned_table similar to the guess for an unpartitioned_table.
>
> Are you considering some sort of clauselist_selectivity() estimate on
> the given parameters and comparing that to the same selectivities that
> were determined for the previous custom plans?

No, I don't think comparing to previous custom plans is a workable
approach.  I was thinking, rather, that if we know for example that
we've doing pruning on partition_column = $1, then we know that only
one partition will match.  That's probably a common case.  If we've
got partition_column > $1, we could assume that, say, 75% of the
partitions would match.  partition_column BETWEEN $1 and $2 is
probably a bit more selective, so maybe we assume 50% of the
partitions would match.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Runtime Partition Pruning

2017-12-21 Thread David Rowley
On 22 December 2017 at 12:45, Robert Haas  wrote:
> PREPARE q3 (int, int) AS SELECT * FROM unpartitioned_table WHERE
> partkey BETWEEN $1 AND $2;
>
> ...that has to decide whether to use an index.  And to do that it has
> to estimate what fraction of the table will match the BETWEEN clause.
> That may be an uninformed guess, but it guesses something.  We
> probably want to do something here that makes the guess for a
> partitioned_table similar to the guess for an unpartitioned_table.

Are you considering some sort of clauselist_selectivity() estimate on
the given parameters and comparing that to the same selectivities that
were determined for the previous custom plans?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2017-12-21 Thread David Rowley
On 22 December 2017 at 03:02, Beena Emerson  wrote:
> This seems like having two different patches for the same feature. I
> will post my version of the patch which uses the struct
> PartitionPruneInfo from your patch and I will add the other additional
> features you added like optimizing the pruning rescan. I will try and
> post the patch tomorrow.

I apologise for persisting in making these parallel efforts. I do have
time right now to dedicate to review this patch, but that time is
running out. At this end of this time, I was really hoping that there
would be a patch that's worthy of being committed (or at least one
worthy of a committers time). During my review of v5, because I found
the patch to still need quite a bit of work, I thought the best use of
that time was to make it work myself, which to my knowledge I have
done. Although, I'm sure my patch will still have bugs, it appears to
me to be quite a bit further ahead than your v7 WIP patch.

> If there is more suggestions, you can give it over that; otherwise it
> seems like duplicating efforts.

Much of the things I did differently from you could be taken as suggestions.

There were a number of things in the v7 patch were still not in a
workable state:

1. Using the PlannerInfo to record details about Append. How will this
work with a plan containing multiple Appends scanning partitioned
tables?
2. The use of AppendState->subplan_indexes List. Please use a
Bitmapset to mark the valid subplans. Lists are not particularly
efficient to get the nth item.
3. Use of PlannerInfo to store details specific to a single
partitioned table in set_base_rel_sizes.
4. Use of a new PlannerInfo->join_clauses in set_rel_size(). How will
this work when there are multiple partitioned tables being scanned in
a single plan?
5. In match_clauses_to_partkey() you're using a new
PlannerInfo->baserestrictinfo_param_indexes List to store ParamIds.
How will this work when there are multiple partitioned tables being
scanned in a single plan? A Bitmapset would be a better choice to
store paramids in.
6. In set_append_rel_pathlist you're using more PlannerInfo members to
handle a specific Append rel. Again, how will it work for plans
scanning multiple different partitioned tables?
7. Your changes to get_appendrel_parampathinfo() ignore equivalence
join clauses, don't you need to look at these too? If so, maybe it
might be worth just using get_baserel_parampathinfo()?
8. Lack of ability to detect if set_append_subplan_indexes() needs to
be called in ExecReScanAppend(). Some parameters that change might not
have an effect on which partitions to scan.

If you go and find a new way to solve all those problems, then please
consider which one of us it is that's making the duplicate effort.

Again, I'm sorry that I have been standing on your toes with my work
here. I'm certainly not out to try to take any glory here. I just want
the patch to be in a working state and the time I have to do that is
fast running out.

Please consider my efforts as an offer of assistance rather than a
threat to your work.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2017-12-21 Thread Robert Haas
On Thu, Dec 21, 2017 at 4:01 AM, David Rowley
 wrote:
> The problem is down to the logic in choose_custom_plan() only choosing
> a generic plan if the average cost of the generic plan is less than
> the average custom plan cost. The problem is that the generic plan can
> have many extra Append subnodes in comparison to the custom plan, all
> of which are taken into account in the total plan cost, but these may
> be pruned during execution. The logic in choose_custom_plan() has no
> idea about this.  I don't have any bright ideas on how to fix this
> yet, as, suppose a PREPAREd statement like the following comes along:
>
> PREPARE q3 (int, int) AS SELECT * FROM partitioned_table WHERE partkey
> BETWEEN $1 AND $2;
>
> the run-time pruning may prune it down no subplans, all subplans, or
> any number in between. So we can't do anything like take the total
> Append cost to be the highest costing of its subplans, and likely
> using the average cost might not be a good idea either.

Well, I do think we need to make some kind of estimate.  It may be a
bad estimate, but if we do nothing, we're estimating that no pruning
at all will happen, which is probably not right either.  I mean, if we
have...

PREPARE q3 (int, int) AS SELECT * FROM unpartitioned_table WHERE
partkey BETWEEN $1 AND $2;

...that has to decide whether to use an index.  And to do that it has
to estimate what fraction of the table will match the BETWEEN clause.
That may be an uninformed guess, but it guesses something.  We
probably want to do something here that makes the guess for a
partitioned_table similar to the guess for an unpartitioned_table.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Runtime Partition Pruning

2017-12-21 Thread David Rowley
On 22 December 2017 at 03:02, Beena Emerson  wrote:
> postgres=# create table ta (a int not null) partition by list (a);
> CREATE TABLE
> postgres=# create table ta1 partition of ta for values in(1,2);
> CREATE TABLE
> postgres=# create table ta2 partition of ta for values in(3,4);
> CREATE TABLE
> postgres=# explain select * from ta where a <> 1 and a <> 2;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.

I just get "ERROR:  negative bitmapset member not allowed" here. I
reported that one to Amit over on the other thread. Not sure why
you're getting a crash. Can you get a backtrace?


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2017-12-21 Thread Beena Emerson
Hello,

On Thu, Dec 21, 2017 at 6:26 PM, David Rowley
 wrote:
> On 21 December 2017 at 22:01, David Rowley  
> wrote:
>> I've attached the latest version of the patch. This is based
>> on Amit's v15 of faster-partition-pruning [1] which I found to cleanly
>> apply to f94eec490
>
> Well, that went out of date pretty quickly. Amit has now posted v16 of
> the faster partition pruning patch [1] which conflicts with my changes
> in my v2 patch.
>
> I've attached a new version of the patch to resolve these conflicts.

Thank you for the patch, I applied it over Amit's v16 patches on
commit 180428404.

I found that make check is crashing along with the following sql you mentioned.

postgres=# create table ta (a int not null) partition by list (a);
CREATE TABLE
postgres=# create table ta1 partition of ta for values in(1,2);
CREATE TABLE
postgres=# create table ta2 partition of ta for values in(3,4);
CREATE TABLE
postgres=# explain select * from ta where a <> 1 and a <> 2;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

This seems like having two different patches for the same feature. I
will post my version of the patch which uses the struct
PartitionPruneInfo from your patch and I will add the other additional
features you added like optimizing the pruning rescan. I will try and
post the patch tomorrow.
If there is more suggestions, you can give it over that; otherwise it
seems like duplicating efforts.

-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Runtime Partition Pruning

2017-12-21 Thread Beena Emerson
Hello,

On Thu, Dec 21, 2017 at 6:30 PM, David Rowley
 wrote:
> On 22 December 2017 at 01:43, Beena Emerson  wrote:
>>> I wonder if Robert understands the extent of the problem. The patch
>>> will be useful to prune away partitions when the partitioned table is
>>> on the inside of a parameterised nested loop join, but I think we'll
>>> likely get some complaints about PREPARE statements always using a
>>> custom plan.  It's going to make this pretty hard to write regression
>>> tests for unless we went and invent some GUC "prefer_generic_plan", or
>>> "generic_plan_cost_multiplier" to provide a way to coax
>>> choose_custom_plan() into not choosing a custom plan.  I admit to not
>>> having the answer to this but I don't think we should rule out
>>> discussing possible fixes.
>>
>> I have attached the regression tests I am currently using to check my
>> patch.  It has few prepared statements using runtime pruning.
>
> Thanks for sending.  Will the tests still pass if you remove the hack
> from choose_custom_plan()?
>

The hack has been removed in the v7 patch sent and yes the tests work
without any additional adjustments.

Thank you,

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



  1   2   >