subject:"Re\: \[HACKERS\] path toward faster partition pruning"

Re: [HACKERS] path toward faster partition pruning

2018-05-10 Thread Marina Polyakova


On 09-05-2018 17:30, Alvaro Herrera wrote:

Marina Polyakova wrote:

Hello everyone in this thread!


I got a similar server crash as in [1] on the master branch since the 
commit
9fdb675fc5d2de825414e05939727de8b120ae81 when the assertion fails 
because
the second argument ScalarArrayOpExpr is not a Const or an ArrayExpr, 
but is

an ArrayCoerceExpr (see [2]):


Hello Marina, thanks for reporting this.  I have pushed all fixes
derived from this report -- thanks to Amit and Michaël for those.
I verified your test case no longer crashes.  If you have more 
elaborate

test cases, please do try these too.


Hello, thank you all very much! :)

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] path toward faster partition pruning

2018-05-09 Thread Amit Langote

On 2018/05/09 22:43, Alvaro Herrera wrote:
> Amit Langote wrote:
>> On 2018/05/09 11:31, David Rowley wrote:
>>> On 9 May 2018 at 14:29, Amit Langote  wrote:
 On 2018/05/09 11:20, Michael Paquier wrote:
> While looking at this code, is there any reason to not make
> gen_partprune_steps static?  This is only used in partprune.c for now,
> so the intention is to make it available for future patches?
> 
>> Here is a patch that does that.
> 
> Pushed, thanks.

Thank you.

Regards,
Amit

Re: [HACKERS] path toward faster partition pruning

2018-05-09 Thread Michael Paquier

On Wed, May 09, 2018 at 10:39:07AM -0300, Alvaro Herrera wrote:
> Michael Paquier wrote:
>> This removes a useless default clause in partprune.c and it got
>> forgotten in the crowd.  Just attaching it again here, and it can just
>> be applied on top of the rest.
> 
> Done, thanks for insisting.

Thanks!
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] path toward faster partition pruning

2018-05-09 Thread Alvaro Herrera

Marina Polyakova wrote:
> Hello everyone in this thread!

> I got a similar server crash as in [1] on the master branch since the commit
> 9fdb675fc5d2de825414e05939727de8b120ae81 when the assertion fails because
> the second argument ScalarArrayOpExpr is not a Const or an ArrayExpr, but is
> an ArrayCoerceExpr (see [2]):

Hello Marina, thanks for reporting this.  I have pushed all fixes
derived from this report -- thanks to Amit and Michaël for those.
I verified your test case no longer crashes.  If you have more elaborate
test cases, please do try these too.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-05-09 Thread Alvaro Herrera

Amit Langote wrote:
> On 2018/05/09 11:31, David Rowley wrote:
> > On 9 May 2018 at 14:29, Amit Langote  wrote:
> >> On 2018/05/09 11:20, Michael Paquier wrote:
> >>> While looking at this code, is there any reason to not make
> >>> gen_partprune_steps static?  This is only used in partprune.c for now,
> >>> so the intention is to make it available for future patches?

> Here is a patch that does that.

Pushed, thanks.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-05-09 Thread Alvaro Herrera

Michael Paquier wrote:

> Alvaro, could it be possible to consider as well the patch I posted
> here?
> https://www.postgresql.org/message-id/20180424012042.gd1...@paquier.xyz
> 
> This removes a useless default clause in partprune.c and it got
> forgotten in the crowd.  Just attaching it again here, and it can just
> be applied on top of the rest.

Done, thanks for insisting.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-05-09 Thread Michael Paquier

On Wed, May 09, 2018 at 02:01:26PM +0900, Amit Langote wrote:
> On 2018/05/09 11:31, David Rowley wrote:
>> On 9 May 2018 at 14:29, Amit Langote  wrote:
>>> On 2018/05/09 11:20, Michael Paquier wrote:
 While looking at this code, is there any reason to not make
 gen_partprune_steps static?  This is only used in partprune.c for now,
 so the intention is to make it available for future patches?
>>>
>>> Yeah, making it static might be a good idea.  I had made it externally
>>> visible, because I was under the impression that the runtime pruning
>>> related code would want to call it from elsewhere within the planner.
>>> But, instead it introduced a make_partition_pruneinfo() which in turn
>>> calls get_partprune_steps.
>> 
>> Yeah. Likely left over from when run-time pruning was generating the
>> steps during execution rather than during planning.
> 
> Here is a patch that does that.

Thanks, Amit.

Alvaro, could it be possible to consider as well the patch I posted
here?
https://www.postgresql.org/message-id/20180424012042.gd1...@paquier.xyz

This removes a useless default clause in partprune.c and it got
forgotten in the crowd.  Just attaching it again here, and it can just
be applied on top of the rest.
--
Michael
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index f8844ef2eb..cbbb4c1827 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -2950,10 +2950,6 @@ perform_pruning_combine_step(PartitionPruneContext *context,
 }
 			}
 			break;
-
-		default:
-			elog(ERROR, "invalid pruning combine op: %d",
- (int) cstep->combineOp);
 	}
 
 	return result;


signature.asc
Description: PGP signature

Re: [HACKERS] path toward faster partition pruning

2018-05-08 Thread Amit Langote

On 2018/05/09 11:31, David Rowley wrote:
> On 9 May 2018 at 14:29, Amit Langote  wrote:
>> On 2018/05/09 11:20, Michael Paquier wrote:
>>> While looking at this code, is there any reason to not make
>>> gen_partprune_steps static?  This is only used in partprune.c for now,
>>> so the intention is to make it available for future patches?
>>
>> Yeah, making it static might be a good idea.  I had made it externally
>> visible, because I was under the impression that the runtime pruning
>> related code would want to call it from elsewhere within the planner.
>> But, instead it introduced a make_partition_pruneinfo() which in turn
>> calls get_partprune_steps.
> 
> Yeah. Likely left over from when run-time pruning was generating the
> steps during execution rather than during planning.

Here is a patch that does that.

Thanks,
Amit
diff --git a/src/backend/partitioning/partprune.c 
b/src/backend/partitioning/partprune.c
index f954b92a6b..f1f7b2dea9 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -116,6 +116,8 @@ typedef struct PruneStepResult
 } PruneStepResult;
 
 
+static List *gen_partprune_steps(RelOptInfo *rel, List *clauses,
+   bool *contradictory);
 static List *gen_partprune_steps_internal(GeneratePruningStepsContext *context,
 RelOptInfo *rel, List 
*clauses,
 bool *contradictory);
@@ -355,7 +357,7 @@ make_partition_pruneinfo(PlannerInfo *root, List 
*partition_rels,
  * If the clauses in the input list are contradictory or there is a
  * pseudo-constant "false", *contradictory is set to true upon return.
  */
-List *
+static List *
 gen_partprune_steps(RelOptInfo *rel, List *clauses, bool *contradictory)
 {
GeneratePruningStepsContext context;
diff --git a/src/include/partitioning/partprune.h 
b/src/include/partitioning/partprune.h
index c9fe95dc30..3d114b4c71 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -67,7 +67,5 @@ extern List *make_partition_pruneinfo(PlannerInfo *root, List 
*partition_rels,
 extern Relids prune_append_rel_partitions(RelOptInfo *rel);
 extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
-extern List *gen_partprune_steps(RelOptInfo *rel, List *clauses,
-   bool *contradictory);
 
 #endif /* PARTPRUNE_H */

Re: [HACKERS] path toward faster partition pruning

2018-05-08 Thread David Rowley

On 9 May 2018 at 14:29, Amit Langote  wrote:
> On 2018/05/09 11:20, Michael Paquier wrote:
>> While looking at this code, is there any reason to not make
>> gen_partprune_steps static?  This is only used in partprune.c for now,
>> so the intention is to make it available for future patches?
>
> Yeah, making it static might be a good idea.  I had made it externally
> visible, because I was under the impression that the runtime pruning
> related code would want to call it from elsewhere within the planner.
> But, instead it introduced a make_partition_pruneinfo() which in turn
> calls get_partprune_steps.

Yeah. Likely left over from when run-time pruning was generating the
steps during execution rather than during planning.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-05-08 Thread Amit Langote

On 2018/05/09 11:20, Michael Paquier wrote:
> While looking at this code, is there any reason to not make
> gen_partprune_steps static?  This is only used in partprune.c for now,
> so the intention is to make it available for future patches?

Yeah, making it static might be a good idea.  I had made it externally
visible, because I was under the impression that the runtime pruning
related code would want to call it from elsewhere within the planner.
But, instead it introduced a make_partition_pruneinfo() which in turn
calls get_partprune_steps.

Thanks,
Amit

Re: [HACKERS] path toward faster partition pruning

2018-05-08 Thread Michael Paquier

On Tue, May 08, 2018 at 07:05:46PM -0300, Alvaro Herrera wrote:
> The reason for this crash is that gen_partprune_steps_internal() is
> unable to generate any steps for the clause -- which is natural, since
> the operator is not in a btree opclass.  There are various callers
> of gen_partprune_steps_internal that are aware that it could return an
> empty set of steps, but this one in match_clause_to_partition_key for
> the ScalarArrayOpExpr case was being a bit too optimistic.

Indeed.

While looking at this code, is there any reason to not make
gen_partprune_steps static?  This is only used in partprune.c for now,
so the intention is to make it available for future patches?
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] path toward faster partition pruning

2018-05-08 Thread Amit Langote

Hi.

On 2018/05/09 7:05, Alvaro Herrera wrote:
> So I found that this query also crashed (using your rig),
> 
> create table coercepart (a varchar) partition by list (a);
> create table coercepart_ab partition of coercepart for values in ('ab');
> create table coercepart_bc partition of coercepart for values in ('bc');
> create table coercepart_cd partition of coercepart for values in ('cd');
> explain (costs off) select * from coercepart where a ~ any ('{ab}');
> 
> The reason for this crash is that gen_partprune_steps_internal() is
> unable to generate any steps for the clause -- which is natural, since
> the operator is not in a btree opclass.  There are various callers
> of gen_partprune_steps_internal that are aware that it could return an
> empty set of steps, but this one in match_clause_to_partition_key for
> the ScalarArrayOpExpr case was being a bit too optimistic.

Yeah, good catch!  That fixes the crash, but looking around that code a
bit, it seems that we shouldn't even have gotten up to the point you're
proposing to fix.  If saop_op is not in the partitioning opfamily, it
should have bailed out much sooner, that is, here:

/*
 * In case of NOT IN (..), we get a '<>', which we handle if list
 * partitioning is in use and we're able to confirm that it's negator
 * is a btree equality operator belonging to the partitioning operator
 * family.
 */
if (!op_in_opfamily(saop_op, partopfamily))
{


negator = get_negator(saop_op);
if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
{

}
+   else
+   /* otherwise, unsupported! */
+   return PARTCLAUSE_UNSUPPORTED;

Let me propose that we also have this along with the rest of the changes
you made in that part of the function.  So, attached is an updated patch.

Thanks,
Amit
diff --git a/src/backend/partitioning/partprune.c 
b/src/backend/partitioning/partprune.c
index f954b92a6b..be9ea8a6db 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -571,8 +571,9 @@ get_matching_partitions(PartitionPruneContext *context, 
List *pruning_steps)
  * For BoolExpr clauses, we recursively generate steps for each argument, and
  * return a PartitionPruneStepCombine of their results.
  *
- * The generated steps are added to the context's steps list.  Each step is
- * assigned a step identifier, unique even across recursive calls.
+ * The return value is a list of the steps generated, which are also added to
+ * the context's steps list.  Each step is assigned a step identifier, unique
+ * even across recursive calls.
  *
  * If we find clauses that are mutually contradictory, or a pseudoconstant
  * clause that contains false, we set *contradictory to true and return NIL
@@ -1599,6 +1600,7 @@ match_clause_to_partition_key(RelOptInfo *rel,
List   *elem_exprs,
   *elem_clauses;
ListCell   *lc1;
+   boolcontradictory;
 
if (IsA(leftop, RelabelType))
leftop = ((RelabelType *) leftop)->arg;
@@ -1617,7 +1619,7 @@ match_clause_to_partition_key(RelOptInfo *rel,
 * Only allow strict operators.  This will guarantee nulls are
 * filtered.
 */
-   if (!op_strict(saop->opno))
+   if (!op_strict(saop_op))
return PARTCLAUSE_UNSUPPORTED;
 
/* Useless if the array has any volatile functions. */
@@ -1650,6 +1652,9 @@ match_clause_to_partition_key(RelOptInfo *rel,
if (strategy != BTEqualStrategyNumber)
return PARTCLAUSE_UNSUPPORTED;
}
+   else
+   /* otherwise, unsupported! */
+   return PARTCLAUSE_UNSUPPORTED;
}
 
/*
@@ -1690,7 +1695,7 @@ match_clause_to_partition_key(RelOptInfo *rel,
elem_exprs = lappend(elem_exprs, elem_expr);
}
}
-   else
+   else if (IsA(rightop, ArrayExpr))
{
ArrayExpr  *arrexpr = castNode(ArrayExpr, rightop);
 
@@ -1704,6 +1709,11 @@ match_clause_to_partition_key(RelOptInfo *rel,
 
elem_exprs = arrexpr->elements;
}
+   else
+   {
+   /* Give up on any other clause types. */
+   return PARTCLAUSE_UNSUPPORTED;
+   }
 
/*
 * Now generate a list of clauses, one for each array element, 
of the
@@ -1722,36 +1732,21 @@ match_clause_to_partition_key(RelOptInfo *rel,
}
 
/*
-* Build a combine step as if for an OR clause or add the 
clauses to
-

Re: [HACKERS] path toward faster partition pruning

2018-05-08 Thread Alvaro Herrera

So I found that this query also crashed (using your rig),

create table coercepart (a varchar) partition by list (a);
create table coercepart_ab partition of coercepart for values in ('ab');
create table coercepart_bc partition of coercepart for values in ('bc');
create table coercepart_cd partition of coercepart for values in ('cd');
explain (costs off) select * from coercepart where a ~ any ('{ab}');

The reason for this crash is that gen_partprune_steps_internal() is
unable to generate any steps for the clause -- which is natural, since
the operator is not in a btree opclass.  There are various callers
of gen_partprune_steps_internal that are aware that it could return an
empty set of steps, but this one in match_clause_to_partition_key for
the ScalarArrayOpExpr case was being a bit too optimistic.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/partitioning/partprune.c 
b/src/backend/partitioning/partprune.c
index f954b92a6b..f8aaccfa18 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -571,8 +571,9 @@ get_matching_partitions(PartitionPruneContext *context, 
List *pruning_steps)
  * For BoolExpr clauses, we recursively generate steps for each argument, and
  * return a PartitionPruneStepCombine of their results.
  *
- * The generated steps are added to the context's steps list.  Each step is
- * assigned a step identifier, unique even across recursive calls.
+ * The return value is a list of the steps generated, which are also added to
+ * the context's steps list.  Each step is assigned a step identifier, unique
+ * even across recursive calls.
  *
  * If we find clauses that are mutually contradictory, or a pseudoconstant
  * clause that contains false, we set *contradictory to true and return NIL
@@ -1599,6 +1600,7 @@ match_clause_to_partition_key(RelOptInfo *rel,
List   *elem_exprs,
   *elem_clauses;
ListCell   *lc1;
+   boolcontradictory;
 
if (IsA(leftop, RelabelType))
leftop = ((RelabelType *) leftop)->arg;
@@ -1617,7 +1619,7 @@ match_clause_to_partition_key(RelOptInfo *rel,
 * Only allow strict operators.  This will guarantee nulls are
 * filtered.
 */
-   if (!op_strict(saop->opno))
+   if (!op_strict(saop_op))
return PARTCLAUSE_UNSUPPORTED;
 
/* Useless if the array has any volatile functions. */
@@ -1690,7 +1692,7 @@ match_clause_to_partition_key(RelOptInfo *rel,
elem_exprs = lappend(elem_exprs, elem_expr);
}
}
-   else
+   else if (IsA(rightop, ArrayExpr))
{
ArrayExpr  *arrexpr = castNode(ArrayExpr, rightop);
 
@@ -1704,6 +1706,11 @@ match_clause_to_partition_key(RelOptInfo *rel,
 
elem_exprs = arrexpr->elements;
}
+   else
+   {
+   /* Give up on any other clause types. */
+   return PARTCLAUSE_UNSUPPORTED;
+   }
 
/*
 * Now generate a list of clauses, one for each array element, 
of the
@@ -1722,36 +1729,21 @@ match_clause_to_partition_key(RelOptInfo *rel,
}
 
/*
-* Build a combine step as if for an OR clause or add the 
clauses to
-* the end of the list that's being processed currently.
+* If we have an ANY clause and multiple elements, first turn 
the list
+* of clauses into an OR expression.
 */
if (saop->useOr && list_length(elem_clauses) > 1)
-   {
-   Expr   *orexpr;
-   boolcontradictory;
+   elem_clauses = list_make1(makeBoolExpr(OR_EXPR, 
elem_clauses, -1));
 
-   orexpr = makeBoolExpr(OR_EXPR, elem_clauses, -1);
-   *clause_steps =
-   gen_partprune_steps_internal(context, rel, 
list_make1(orexpr),
-   
 );
-   if (contradictory)
-   return PARTCLAUSE_MATCH_CONTRADICT;
-
-   Assert(list_length(*clause_steps) == 1);
-   return PARTCLAUSE_MATCH_STEPS;
-   }
-   else
-   {
-   boolcontradictory;
-
-   *clause_steps =
-   gen_partprune_steps_internal(context, rel, 
elem_clauses,
-

Re: [HACKERS] path toward faster partition pruning

2018-05-08 Thread Alvaro Herrera

Michael Paquier wrote:

> So the problem appears when an expression needs to use
> COERCION_PATH_ARRAYCOERCE for a type coercion from one type to another,
> which requires an execution state to be able to build the list of
> elements.  The clause matching happens at planning state, so while there
> are surely cases where this could be improved depending on the
> expression type, I propose to just discard all clauses which do not
> match OpExpr and ArrayExpr for now, as per the attached.  It would be
> definitely a good practice to have a default code path returning
> PARTCLAUSE_UNSUPPORTED where the element list cannot be built.
> 
> Thoughts?

I found a related crash and I'm investigating it further.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [Suspect SPAM] Re: [HACKERS] path toward faster partition pruning

2018-05-08 Thread Michael Paquier

On Tue, May 08, 2018 at 04:07:41PM +0900, Amit Langote wrote:
> I have to agree to go with this conservative approach for now.  Although
> we might be able to evaluate the array elements by applying the coercion
> specified by ArrayCoerceExpr, let's save that as an improvement to be
> pursued later.

Thanks for confirming.  Yes, non-volatile functions would be actually
safe, and we'd need to be careful about NULL handling as well, but
that's definitely out of scope for v11.

> FWIW, constraint exclusion wouldn't prune in this case either (that is, if
> you try this example with PG 10 or using HEAD as of the parent of
> 9fdb675fc5), but it doesn't crash like the new pruning code does.

Yeah, I have noticed that.
--
Michael


signature.asc
Description: PGP signature

Re: [Suspect SPAM] Re: [HACKERS] path toward faster partition pruning

2018-05-08 Thread Amit Langote

Thank you Marina for the report and Michael for following up.

On 2018/05/07 16:56, Michael Paquier wrote:
> On Mon, May 07, 2018 at 10:37:10AM +0900, Michael Paquier wrote:
>> On Fri, May 04, 2018 at 12:32:23PM +0300, Marina Polyakova wrote:
>>> I got a similar server crash as in [1] on the master branch since the commit
>>> 9fdb675fc5d2de825414e05939727de8b120ae81 when the assertion fails because
>>> the second argument ScalarArrayOpExpr is not a Const or an ArrayExpr, but is
>>> an ArrayCoerceExpr (see [2]):
>>
>> Indeed, I can see the crash.  I have been playing with this stuff and I
>> am in the middle of writing the patch, but let's track this properly for
>> now.
> 
> So the problem appears when an expression needs to use
> COERCION_PATH_ARRAYCOERCE for a type coercion from one type to another,
> which requires an execution state to be able to build the list of
> elements.  The clause matching happens at planning state, so while there
> are surely cases where this could be improved depending on the
> expression type, I propose to just discard all clauses which do not
> match OpExpr and ArrayExpr for now, as per the attached.  It would be
> definitely a good practice to have a default code path returning
> PARTCLAUSE_UNSUPPORTED where the element list cannot be built.
> 
> Thoughts?

I have to agree to go with this conservative approach for now.  Although
we might be able to evaluate the array elements by applying the coercion
specified by ArrayCoerceExpr, let's save that as an improvement to be
pursued later.

FWIW, constraint exclusion wouldn't prune in this case either (that is, if
you try this example with PG 10 or using HEAD as of the parent of
9fdb675fc5), but it doesn't crash like the new pruning code does.

Thanks again.

Regards,
Amit

Re: [HACKERS] path toward faster partition pruning

2018-05-07 Thread Michael Paquier

On Mon, May 07, 2018 at 10:37:10AM +0900, Michael Paquier wrote:
> On Fri, May 04, 2018 at 12:32:23PM +0300, Marina Polyakova wrote:
> > I got a similar server crash as in [1] on the master branch since the commit
> > 9fdb675fc5d2de825414e05939727de8b120ae81 when the assertion fails because
> > the second argument ScalarArrayOpExpr is not a Const or an ArrayExpr, but is
> > an ArrayCoerceExpr (see [2]):
> 
> Indeed, I can see the crash.  I have been playing with this stuff and I
> am in the middle of writing the patch, but let's track this properly for
> now.

So the problem appears when an expression needs to use
COERCION_PATH_ARRAYCOERCE for a type coercion from one type to another,
which requires an execution state to be able to build the list of
elements.  The clause matching happens at planning state, so while there
are surely cases where this could be improved depending on the
expression type, I propose to just discard all clauses which do not
match OpExpr and ArrayExpr for now, as per the attached.  It would be
definitely a good practice to have a default code path returning
PARTCLAUSE_UNSUPPORTED where the element list cannot be built.

Thoughts?
--
Michael
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index f954b92a6b..2d2f88e880 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -1690,7 +1690,7 @@ match_clause_to_partition_key(RelOptInfo *rel,
 elem_exprs = lappend(elem_exprs, elem_expr);
 			}
 		}
-		else
+		else if (IsA(rightop, ArrayExpr))
 		{
 			ArrayExpr  *arrexpr = castNode(ArrayExpr, rightop);
 
@@ -1704,6 +1704,16 @@ match_clause_to_partition_key(RelOptInfo *rel,
 
 			elem_exprs = arrexpr->elements;
 		}
+		else
+		{
+			/*
+			 * Ignore all other clause types.  It could be possible here
+			 * to reach this code path with a type coercion from an
+			 * array type to another with ArrayCoerceExpr which depends on
+			 * per-element execution for the conversion.
+			 */
+			return PARTCLAUSE_UNSUPPORTED;
+		}
 
 		/*
 		 * Now generate a list of clauses, one for each array element, of the
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index e0cc5f3393..86dcd62d55 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1073,6 +1073,48 @@ explain (costs off) select * from boolpart where a is not unknown;
  Filter: (a IS NOT UNKNOWN)
 (7 rows)
 
+-- type coercion from one array type to another, no pruning
+create table coercepart (a varchar) partition by list (a);
+create table coercepart_ab partition of coercepart for values in ('ab');
+create table coercepart_cd partition of coercepart for values in ('cd');
+create table coercepart_ef_gh partition of coercepart for values in ('ef', 'gh');
+explain (costs off) select * from coercepart where a in ('ab', to_char(125, '999'));
+  QUERY PLAN  
+--
+ Append
+   ->  Seq Scan on coercepart_ab
+ Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, (to_char(125, '999'::text))::character varying])::text[]))
+   ->  Seq Scan on coercepart_cd
+ Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, (to_char(125, '999'::text))::character varying])::text[]))
+   ->  Seq Scan on coercepart_ef_gh
+ Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, (to_char(125, '999'::text))::character varying])::text[]))
+(7 rows)
+
+explain (costs off) select * from coercepart where a in ('ab', NULL, to_char(125, '999'));
+  QUERY PLAN   
+---
+ Append
+   ->  Seq Scan on coercepart_ab
+ Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, NULL::character varying, (to_char(125, '999'::text))::character varying])::text[]))
+   ->  Seq Scan on coercepart_cd
+ Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, NULL::character varying, (to_char(125, '999'::text))::character varying])::text[]))
+   ->  Seq Scan on coercepart_ef_gh
+ Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, NULL::character varying, (to_char(125, '999'::text))::character varying])::text[]))
+(7 rows)
+
+explain (costs off) select * from coercepart where a in ('ef', 'gh', to_char(125, '999'));
+  QUERY PLAN

Re: [HACKERS] path toward faster partition pruning

2018-05-07 Thread Marina Polyakova


On 07-05-2018 4:37, Michael Paquier wrote:

On Fri, May 04, 2018 at 12:32:23PM +0300, Marina Polyakova wrote:
I got a similar server crash as in [1] on the master branch since the 
commit
9fdb675fc5d2de825414e05939727de8b120ae81 when the assertion fails 
because
the second argument ScalarArrayOpExpr is not a Const or an ArrayExpr, 
but is

an ArrayCoerceExpr (see [2]):


Indeed, I can see the crash.  I have been playing with this stuff and I
am in the middle of writing the patch, but let's track this properly 
for

now.


Thank you very much!

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] path toward faster partition pruning

2018-05-06 Thread Michael Paquier

On Fri, May 04, 2018 at 12:32:23PM +0300, Marina Polyakova wrote:
> I got a similar server crash as in [1] on the master branch since the commit
> 9fdb675fc5d2de825414e05939727de8b120ae81 when the assertion fails because
> the second argument ScalarArrayOpExpr is not a Const or an ArrayExpr, but is
> an ArrayCoerceExpr (see [2]):

Indeed, I can see the crash.  I have been playing with this stuff and I
am in the middle of writing the patch, but let's track this properly for
now.
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] path toward faster partition pruning

2018-05-04 Thread Marina Polyakova


Hello everyone in this thread!

I got a similar server crash as in [1] on the master branch since the 
commit 9fdb675fc5d2de825414e05939727de8b120ae81 when the assertion fails 
because the second argument ScalarArrayOpExpr is not a Const or an 
ArrayExpr, but is an ArrayCoerceExpr (see [2]):


=# create table list_parted (
  a varchar
) partition by list (a);
=# create table part_ab_cd partition of list_parted for values in ('ab', 
'cd');

=# CREATE OR REPLACE FUNCTION public.x_stl_text_integer (
)
RETURNS text STABLE AS
$body$
BEGIN
RAISE NOTICE 's text integer';
RETURN 1::text;
END;
$body$
LANGUAGE 'plpgsql';
=# explain (costs off) select * from list_parted where a in ('ab', 'cd', 
x_stl_text_integer());

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

[1] 
https://www.postgresql.org/message-id/CAKcux6nCsCmu9oUnnuKZkeBenYvUFbU2Lt4q2MFNEb7QErzn8w%40mail.gmail.com


[2] partprune.c, function match_clause_to_partition_key:
if (IsA(rightop, Const))
{
...
}
else
{
ArrayExpr  *arrexpr = castNode(ArrayExpr, rightop); # 
fails here
...
}

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] path toward faster partition pruning

2018-04-22 Thread Amit Langote

On 2018/04/21 0:58, Alvaro Herrera wrote:
> Amit Langote wrote:
> 
>> PS: git grep "partition by hash\|PARTITION BY HASH" on src/test indicates
>> that there are hash partitioning related tests in create_table,
>> foreign_key, and partition_join files as well.  Do we want to use the
>> custom opclass in those files as well?
> 
> By the way, let me suggest 'git grep -i' instead.

Ah, thanks.

Regards,
Amit

Re: [HACKERS] path toward faster partition pruning

2018-04-20 Thread Alvaro Herrera

Amit Langote wrote:

> PS: git grep "partition by hash\|PARTITION BY HASH" on src/test indicates
> that there are hash partitioning related tests in create_table,
> foreign_key, and partition_join files as well.  Do we want to use the
> custom opclass in those files as well?

By the way, let me suggest 'git grep -i' instead.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-13 Thread Robert Haas

On Fri, Apr 13, 2018 at 10:50 AM, Alvaro Herrera
 wrote:
> Robert Haas wrote:
>> On Wed, Apr 11, 2018 at 8:35 AM, Alvaro Herrera  
>> wrote:
>> > Here's an idea.  Why don't we move the function/opclass creation lines
>> > to insert.sql, without the DROPs, and use the same functions/opclasses
>> > in the three tests insert.sql, alter_table.sql, hash_part.sql and
>> > partition_prune.sql, i.e. not recreate what are essentially the same
>> > objects three times?  This also leaves them around for the pg_upgrade
>> > test, which is not a bad thing.
>>
>> That sounds good, but maybe we should go further and move the
>> partitioning tests out of generically-named things like insert.sql
>> altogether and have test names that actually mention partitioning.
>
> I don't think that's necessary to fix the problem that
> partition_prune_hash.sql file has two expected output files.  If you
> want to propose such a reorganization, feel free, but let's not hijack
> the patch at hand.  For the record, I'm not a fan of the idea.

Fair enough.  I don't think I'm hacking the thread much more than it
was already hijacked; and it was just a thought.  I haven't really
studied the tests well enough to have a really clear idea what a
better organization would look like.  It was just that, for example,
the commit that added hash partitioning added tests to 5 different
files, and some things had to be duplicated as a result.  It sounds
like what you've already done improves that, but I was wondering if
there's a way to do better.  I don't feel super-strongly about it
though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] path toward faster partition pruning

2018-04-13 Thread Alvaro Herrera

Robert Haas wrote:
> On Wed, Apr 11, 2018 at 8:35 AM, Alvaro Herrera  
> wrote:
> > Here's an idea.  Why don't we move the function/opclass creation lines
> > to insert.sql, without the DROPs, and use the same functions/opclasses
> > in the three tests insert.sql, alter_table.sql, hash_part.sql and
> > partition_prune.sql, i.e. not recreate what are essentially the same
> > objects three times?  This also leaves them around for the pg_upgrade
> > test, which is not a bad thing.
> 
> That sounds good, but maybe we should go further and move the
> partitioning tests out of generically-named things like insert.sql
> altogether and have test names that actually mention partitioning.

I don't think that's necessary to fix the problem that
partition_prune_hash.sql file has two expected output files.  If you
want to propose such a reorganization, feel free, but let's not hijack
the patch at hand.  For the record, I'm not a fan of the idea.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-12 Thread Ashutosh Bapat

On Fri, Apr 13, 2018 at 7:45 AM, Amit Langote
 wrote:
> On 2018/04/13 1:47, Robert Haas wrote:
>> On Wed, Apr 11, 2018 at 8:35 AM, Alvaro Herrera  
>> wrote:
>>> Here's an idea.  Why don't we move the function/opclass creation lines
>>> to insert.sql, without the DROPs, and use the same functions/opclasses
>>> in the three tests insert.sql, alter_table.sql, hash_part.sql and
>>> partition_prune.sql, i.e. not recreate what are essentially the same
>>> objects three times?  This also leaves them around for the pg_upgrade
>>> test, which is not a bad thing.
>>
>> That sounds good, but maybe we should go further and move the
>> partitioning tests out of generically-named things like insert.sql
>> altogether and have test names that actually mention partitioning.
>
> Do you mean to do that for *all* files that have tests exercising some
> partitioning code, even if it's just one test?  I can see that tests in at
> least some of them could be put into their own partition_ file as follows:
>
> partition_insert (including tests in insert_conflict)
> partition_update
> partition_triggers
> partition_indexing (indexing.sql added when partitioned indexes went in)
> partition_ddl (for the tests in create_table and alter_table)
>

We haven't generally created test files specific to a table type for
example temporary tables or unlogged tables, instead have created
files by SQL commands. But then that's not true for indexes; we have
separate files for indexes and we also have separate file for
materialized views and also for various data types. So, our test file
organization seems to have cut across of SQL commands and object
types. But partitioning seems an area large enough to have files of
its own; we already have partition_join and partition_aggregate.

Do we want to move to a directory based organization for tests also,
where sql/ expected/ will have directories within them for various
types of objects like partitioned tables, indexes, regular tables,
datatypes etc. and each of those will have files organized by sql
commands? An immediate question arises as to where to add the files
which exercises a mixture of objects; may be in a directory
corresponding to the primary object like materialized views over
partitioned tables, would fit materialized view (or just views?)
directory.

Whatever organization we want to use, it should be easy to find
testcases for relevant functionality e.g. all tests for partitioned
tables or all alter table command tests.

> What about the tests in inherit.sql that start with:
>
> --
> -- Check that constraint exclusion works correctly with partitions using
> -- implicit constraints generated from the partition bound information.
> --
>
> Maybe, just move all of them to partition_prune.sql, because we no longer
> use constraint exclusion for pruning?

I think we need to have some testcases somwhere to test constraint
exclusion on partitions and partitioned tables, those do not
necessarily fit partition pruning.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] path toward faster partition pruning

2018-04-12 Thread David Rowley

On 13 April 2018 at 14:15, Amit Langote  wrote:
> On 2018/04/13 1:47, Robert Haas wrote:
>> On Wed, Apr 11, 2018 at 8:35 AM, Alvaro Herrera  
>> wrote:
>>> Here's an idea.  Why don't we move the function/opclass creation lines
>>> to insert.sql, without the DROPs, and use the same functions/opclasses
>>> in the three tests insert.sql, alter_table.sql, hash_part.sql and
>>> partition_prune.sql, i.e. not recreate what are essentially the same
>>> objects three times?  This also leaves them around for the pg_upgrade
>>> test, which is not a bad thing.
>>
>> That sounds good, but maybe we should go further and move the
>> partitioning tests out of generically-named things like insert.sql
>> altogether and have test names that actually mention partitioning.
>
> Do you mean to do that for *all* files that have tests exercising some
> partitioning code, even if it's just one test?  I can see that tests in at
> least some of them could be put into their own partition_ file as follows:

Wouldn't it be best to just move hash partition tests into hash_part?
Leave all the other stuff where it is?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-12 Thread Amit Langote

On 2018/04/13 1:47, Robert Haas wrote:
> On Wed, Apr 11, 2018 at 8:35 AM, Alvaro Herrera  
> wrote:
>> Here's an idea.  Why don't we move the function/opclass creation lines
>> to insert.sql, without the DROPs, and use the same functions/opclasses
>> in the three tests insert.sql, alter_table.sql, hash_part.sql and
>> partition_prune.sql, i.e. not recreate what are essentially the same
>> objects three times?  This also leaves them around for the pg_upgrade
>> test, which is not a bad thing.
> 
> That sounds good, but maybe we should go further and move the
> partitioning tests out of generically-named things like insert.sql
> altogether and have test names that actually mention partitioning.

Do you mean to do that for *all* files that have tests exercising some
partitioning code, even if it's just one test?  I can see that tests in at
least some of them could be put into their own partition_ file as follows:

partition_insert (including tests in insert_conflict)
partition_update
partition_triggers
partition_indexing (indexing.sql added when partitioned indexes went in)
partition_ddl (for the tests in create_table and alter_table)

That leaves:

cluster
create_index (one test here could be moved to partition_indexing?)
foreign_data (could be moved to partition_ddl?)
foreign_key  (could be moved to partition_ddl?)
hash_part(leave alone, because already contains 'part' in the name?)
identity
join
plancache
plpgsql
publication
rowsecurity
rules
stats_ext
tablesample
truncate
updatable_views
vacuum


What about the tests in inherit.sql that start with:

--
-- Check that constraint exclusion works correctly with partitions using
-- implicit constraints generated from the partition bound information.
--

Maybe, just move all of them to partition_prune.sql, because we no longer
use constraint exclusion for pruning?

Thanks,
Amit

Re: [HACKERS] path toward faster partition pruning

2018-04-12 Thread Robert Haas

On Wed, Apr 11, 2018 at 8:35 AM, Alvaro Herrera  wrote:
> Here's an idea.  Why don't we move the function/opclass creation lines
> to insert.sql, without the DROPs, and use the same functions/opclasses
> in the three tests insert.sql, alter_table.sql, hash_part.sql and
> partition_prune.sql, i.e. not recreate what are essentially the same
> objects three times?  This also leaves them around for the pg_upgrade
> test, which is not a bad thing.

That sounds good, but maybe we should go further and move the
partitioning tests out of generically-named things like insert.sql
altogether and have test names that actually mention partitioning.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] path toward faster partition pruning

2018-04-11 Thread Amit Langote

On 2018/04/11 21:35, Alvaro Herrera wrote:
> Here's an idea.  Why don't we move the function/opclass creation lines
> to insert.sql, without the DROPs, and use the same functions/opclasses
> in the three tests insert.sql, alter_table.sql, hash_part.sql and
> partition_prune.sql, i.e. not recreate what are essentially the same
> objects three times?  This also leaves them around for the pg_upgrade
> test, which is not a bad thing.
> 
> (This would require a few updates to insert.sql because the definitions
> there are different, but it shouldn't be a problem coverage-wise.)

OK, I've tried doing that.  Needed adjustments to hash_part.sql as well.
The hash function for int4 was defined differently in insert.sql,
alter_table.sql, and hash_part.sql.  I went with the definition in
insert.sql, which although slightly different from the one
alter_table.sql, didn't affect the latter's output in any way.  Since the
definition in hash_part.sql was totally different, a couple of tests
needed adjusting after starting to use hash opclasses defined in insert.sql.

Attached updated patch.

PS: git grep "partition by hash\|PARTITION BY HASH" on src/test indicates
that there are hash partitioning related tests in create_table,
foreign_key, and partition_join files as well.  Do we want to use the
custom opclass in those files as well?

Thanks,
Amit
From 5a01d81aa7e90ef130b245c5e38b02fe9be5e8d7 Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 10 Apr 2018 16:06:33 +0900
Subject: [PATCH v4] Rewrite hash partition pruning tests to use custom opclass

Relying on platform-provided hashing functions makes tests unreliable
as shown by buildfarm recently.

This adds adjusted tests to partition_prune.sql itself and hence
partition_prune_hash.sql is deleted along with two expected output
files.

Discussion: 
https://postgr.es/m/CA%2BTgmoZ0D5kJbt8eKXtvVdvTcGGWn6ehWCRSZbWytD-uzH92mQ%40mail.gmail.com
---
 src/test/regress/expected/alter_table.out  |  15 +-
 src/test/regress/expected/hash_part.out|  23 +--
 src/test/regress/expected/insert.out   |  32 +++-
 src/test/regress/expected/partition_prune.out  | 191 +
 src/test/regress/expected/partition_prune_hash.out | 189 
 .../regress/expected/partition_prune_hash_1.out| 187 
 src/test/regress/parallel_schedule |   2 +-
 src/test/regress/serial_schedule   |   1 -
 src/test/regress/sql/alter_table.sql   |  15 +-
 src/test/regress/sql/hash_part.sql |  24 +--
 src/test/regress/sql/insert.sql|  36 +++-
 src/test/regress/sql/partition_prune.sql   |  44 -
 src/test/regress/sql/partition_prune_hash.sql  |  41 -
 13 files changed, 305 insertions(+), 495 deletions(-)
 delete mode 100644 src/test/regress/expected/partition_prune_hash.out
 delete mode 100644 src/test/regress/expected/partition_prune_hash_1.out
 delete mode 100644 src/test/regress/sql/partition_prune_hash.sql

diff --git a/src/test/regress/expected/alter_table.out 
b/src/test/regress/expected/alter_table.out
index 63845910a6..50b9443e2d 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -3662,20 +3662,13 @@ CREATE TABLE quuux2 PARTITION OF quuux FOR VALUES IN 
(2);
 INFO:  updated partition constraint for default partition "quuux_default1" is 
implied by existing constraints
 DROP TABLE quuux;
 -- check validation when attaching hash partitions
--- The default hash functions as they exist today aren't portable; they can
--- return different results on different machines.  Depending upon how the
--- values are hashed, the row may map to different partitions, which result in
--- regression failure.  To avoid this, let's create a non-default hash function
--- that just returns the input value unchanged.
-CREATE OR REPLACE FUNCTION dummy_hashint4(a int4, seed int8) RETURNS int8 AS
-$$ BEGIN RETURN (a + 1 + seed); END; $$ LANGUAGE 'plpgsql' IMMUTABLE;
-CREATE OPERATOR CLASS custom_opclass FOR TYPE int4 USING HASH AS
-OPERATOR 1 = , FUNCTION 2 dummy_hashint4(int4, int8);
+-- Use hand-rolled hash functions and operator class to get predictable result
+-- on different matchines. part_test_int4_ops is defined in insert.sql.
 -- check that the new partition won't overlap with an existing partition
 CREATE TABLE hash_parted (
a int,
b int
-) PARTITION BY HASH (a custom_opclass);
+) PARTITION BY HASH (a part_test_int4_ops);
 CREATE TABLE hpart_1 PARTITION OF hash_parted FOR VALUES WITH (MODULUS 4, 
REMAINDER 0);
 CREATE TABLE fail_part (LIKE hpart_1);
 ALTER TABLE hash_parted ATTACH PARTITION fail_part FOR VALUES WITH (MODULUS 8, 
REMAINDER 4);
@@ -3840,8 +3833,6 @@ SELECT * FROM list_parted;
 DROP TABLE list_parted, list_parted2, range_parted;
 DROP TABLE fail_def_part;
 DROP TABLE hash_parted;
-DROP OPERATOR CLASS custom_opclass USING HASH;
-DROP FUNCTION

Re: [HACKERS] path toward faster partition pruning

2018-04-11 Thread Alvaro Herrera

Here's an idea.  Why don't we move the function/opclass creation lines
to insert.sql, without the DROPs, and use the same functions/opclasses
in the three tests insert.sql, alter_table.sql, hash_part.sql and
partition_prune.sql, i.e. not recreate what are essentially the same
objects three times?  This also leaves them around for the pg_upgrade
test, which is not a bad thing.

(This would require a few updates to insert.sql because the definitions
there are different, but it shouldn't be a problem coverage-wise.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-11 Thread Ashutosh Bapat

On Wed, Apr 11, 2018 at 2:52 PM, Amit Langote
 wrote:

>>
>> I've attached a delta patch that applies to your v2 which does this.
>> Do you think it's worth doing?
>
> We can see check by inspection that individual values are in appropriate
> partitions, which is the point of having the inserts and the select just
> above the actual pruning related tests.  So, I'm not sure if adding the
> satisfies_hash_partition against each pruning tests adds much.

+1.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] path toward faster partition pruning

2018-04-11 Thread Amit Langote

Hi David.

Thanks for the review.

On 2018/04/11 17:59, David Rowley wrote:
> On 11 April 2018 at 18:04, Amit Langote  wrote:
>> Updated patch attached.
> 
> Thanks for the updated patch.
> 
> The only thing I'm not sure about is the chances you've made to the
> COALESCE function.
> 
> +CREATE OR REPLACE FUNCTION pp_hashint4_noop(int4, int8) RETURNS int8 AS
> +$$SELECT coalesce($1, $2)::int8$$ LANGUAGE sql IMMUTABLE;
> +CREATE OPERATOR CLASS pp_test_int4_ops FOR TYPE int4 USING HASH AS
> +OPERATOR 1 = , FUNCTION 2 pp_hashint4_noop(int4, int8);
> +CREATE OR REPLACE FUNCTION pp_hashtext_length(text, int8) RETURNS int8 AS
> +$$SELECT length(coalesce($1, ''))::int8$$ LANGUAGE sql IMMUTABLE;
> 
> Why does one default to the seed and the other to an empty string?
> Shouldn't they both do the same thing? If you were to copy the
> hash_part.sql you'd just coalesce($1, 0) and coalesce($1, ''), any
> special reason not to do that?

Oops, so I hadn't actually restored it to the way it is in hash_part.sql.

Also, Ashutosh was talking about the custom hashing function used in
insert.sql, not hash_part.sql, which I based my revision upon.

Fixed it now.

> Also just wondering if it's worth adding some verification that we've
> actually eliminated the correct partitions by backing the tests up
> with a call to satisfies_hash_partition.
> 
> I've attached a delta patch that applies to your v2 which does this.
> Do you think it's worth doing?

We can see check by inspection that individual values are in appropriate
partitions, which is the point of having the inserts and the select just
above the actual pruning related tests.  So, I'm not sure if adding the
satisfies_hash_partition against each pruning tests adds much.

Attached revised patch.

Thanks,
Amit
From 4685448a7eb2eaf5feceea2206d136c135b2dea7 Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 10 Apr 2018 16:06:33 +0900
Subject: [PATCH v3] Rewrite hash partition pruning tests to use custom opclass

Relying on platform-provided hashing functions makes tests unreliable
as shown by buildfarm recently.

This adds adjusted tests to partition_prune.sql itself and hence
partition_prune_hash.sql is deleted along with two expected output
files.

Discussion: 
https://postgr.es/m/CA%2BTgmoZ0D5kJbt8eKXtvVdvTcGGWn6ehWCRSZbWytD-uzH92mQ%40mail.gmail.com
---
 src/test/regress/expected/partition_prune.out  | 237 +
 src/test/regress/expected/partition_prune_hash.out | 189 
 .../regress/expected/partition_prune_hash_1.out| 187 
 src/test/regress/parallel_schedule |   2 +-
 src/test/regress/serial_schedule   |   1 -
 src/test/regress/sql/partition_prune.sql   |  70 +-
 src/test/regress/sql/partition_prune_hash.sql  |  41 
 7 files changed, 307 insertions(+), 420 deletions(-)
 delete mode 100644 src/test/regress/expected/partition_prune_hash.out
 delete mode 100644 src/test/regress/expected/partition_prune_hash_1.out
 delete mode 100644 src/test/regress/sql/partition_prune_hash.sql

diff --git a/src/test/regress/expected/partition_prune.out 
b/src/test/regress/expected/partition_prune.out
index df3fca025e..d13389b9c2 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1332,6 +1332,243 @@ explain (costs off) select * from rparted_by_int2 where 
a > 100;
 
 drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, 
coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
 --
+-- Test Partition pruning for HASH partitioning
+-- We roll our own operator classes to use for tests, because depending on the
+-- platform-provided hashing functions makes tests unreliable
+--
+CREATE OR REPLACE FUNCTION pp_hashint4_noop(int4, int8) RETURNS int8 AS
+$$SELECT coalesce($1, 0)::int8$$ LANGUAGE sql IMMUTABLE;
+CREATE OPERATOR CLASS pp_test_int4_ops FOR TYPE int4 USING HASH AS
+OPERATOR 1 = , FUNCTION 2 pp_hashint4_noop(int4, int8);
+CREATE OR REPLACE FUNCTION pp_hashtext_length(text, int8) RETURNS int8 AS
+$$SELECT length(coalesce($1, ''))::int8$$ LANGUAGE sql IMMUTABLE;
+CREATE OPERATOR CLASS pp_test_text_ops FOR TYPE text USING HASH AS
+OPERATOR 1 = , FUNCTION 2 pp_hashtext_length(text, int8);
+create table hp (a int, b text) partition by hash (a pp_test_int4_ops, b 
pp_test_text_ops);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (2, 'xxx');
+insert into hp values (1, 'abcde');
+select tableoid::regclass, * from hp order by 1;
+

Re: [HACKERS] path toward faster partition pruning

2018-04-11 Thread David Rowley

On 11 April 2018 at 18:04, Amit Langote  wrote:
> Updated patch attached.

Thanks for the updated patch.

The only thing I'm not sure about is the chances you've made to the
COALESCE function.

+CREATE OR REPLACE FUNCTION pp_hashint4_noop(int4, int8) RETURNS int8 AS
+$$SELECT coalesce($1, $2)::int8$$ LANGUAGE sql IMMUTABLE;
+CREATE OPERATOR CLASS pp_test_int4_ops FOR TYPE int4 USING HASH AS
+OPERATOR 1 = , FUNCTION 2 pp_hashint4_noop(int4, int8);
+CREATE OR REPLACE FUNCTION pp_hashtext_length(text, int8) RETURNS int8 AS
+$$SELECT length(coalesce($1, ''))::int8$$ LANGUAGE sql IMMUTABLE;

Why does one default to the seed and the other to an empty string?
Shouldn't they both do the same thing? If you were to copy the
hash_part.sql you'd just coalesce($1, 0) and coalesce($1, ''), any
special reason not to do that?

Also just wondering if it's worth adding some verification that we've
actually eliminated the correct partitions by backing the tests up
with a call to satisfies_hash_partition.

I've attached a delta patch that applies to your v2 which does this.
Do you think it's worth doing?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


add_satisfies_hash_partition_verification.patch
Description: Binary data

Re: [HACKERS] path toward faster partition pruning

2018-04-11 Thread Amit Langote

Thanks for the review.

On 2018/04/10 21:02, David Rowley wrote:
> On 10 April 2018 at 20:56, Amit Langote  wrote:
>> On 2018/04/10 13:27, Ashutosh Bapat wrote:
>>> On Mon, Apr 9, 2018 at 8:56 PM, Robert Haas  wrote:
 CREATE OR REPLACE FUNCTION hashint4_noop(int4, int8) RETURNS int8 AS
 $$SELECT coalesce($1,0)::int8$$ LANGUAGE sql IMMUTABLE;
 CREATE OPERATOR CLASS test_int4_ops FOR TYPE int4 USING HASH AS
 OPERATOR 1 = , FUNCTION 2 hashint4_noop(int4, int8);
 CREATE TABLE mchash (a int, b text, c jsonb)
   PARTITION BY HASH (a test_int4_ops, b test_text_ops);
>>
>> Thanks for the idea.  I think it makes sense and also agree that alternate
>> outputs approach is not perfectly reliable and maintainable.
>>
>>> +1.
>>
>> Attached find a patch that rewrites hash partition pruning tests that
>> away.  It creates two hash operator classes, one for int4 and another for
>> text type and uses them to create hash partitioned table to be used in the
>> tests, like done in the existing tests in hash_part.sql.  Since that makes
>> tests (hopefully) reliably return the same result always, I no longer see
>> the need to keep them in a separate partition_prune_hash.sql.  The
>> reasoning behind having the separate file was to keep the alternative
>> output file small as David explained in [1].
>> [1]
>> https://www.postgresql.org/message-id/CAKJS1f-SON_hAekqoV4_WQwJBtJ_rvvSe68jRNhuYcXqQ8PoQg%40mail.gmail.com
> 
> I had a quick look, but I'm still confused about why a function like
> hash_uint32_extended() is susceptible to varying results depending on
> CPU endianness but hash_combine64 is not.

It might as well be the combination of both that's sensitive to
endianness.  I too am not sure exactly which part.  They're are both used
in succession in compute_hash_value:

/*
 * Compute hash for each datum value by calling respective
 * datatype-specific hash functions of each partition key
 * attribute.
 */
hash = FunctionCall2([i], values[i], seed);

/* Form a single 64-bit hash value */
rowHash = hash_combine64(rowHash, DatumGetUInt64(hash));

> Apart from that confusion, looking at the patch:
> 
> +CREATE OR REPLACE FUNCTION pp_hashint4_noop(int4, int8) RETURNS int8 AS
> +$$SELECT coalesce($1)::int8$$ LANGUAGE sql IMMUTABLE STRICT;
> +CREATE OPERATOR CLASS pp_test_int4_ops FOR TYPE int4 USING HASH AS
> +OPERATOR 1 = , FUNCTION 2 pp_hashint4_noop(int4, int8);
> +CREATE OR REPLACE FUNCTION pp_hashtext_length(text, int8) RETURNS int8 AS
> +$$SELECT length(coalesce($1))::int8$$ LANGUAGE sql IMMUTABLE STRICT;
> 
> 
> Why coalesce here? Maybe I've not thought of something, but coalesce
> only seems useful to me if there's > 1 argument. Plus the function is
> strict, so not sure it's really doing even if you added a default.

After reading Ashutosh's comment, I realized I didn't really mean to add
the STRICT to those function definitions.  As these are not operators, but
support (hash) procedures, it's insignificant to the pruning code whether
they are STRICT or not, unlike clause operators where it is.

Also, I've adopted the coalesce-based hashing function from hash_part.sql,
albeit with unnecessary tweaks.  I've not read anywhere about why the
coalesce was used in the first place, but it's insignificant for our
purpose here anyway.

> I know this one was there before, but I only just noticed it:
> 
> +-- pruning should work if non-null values are provided for all the keys
> +explain (costs off) select * from hp where a is null and b is null;
> 
> The comment is a bit misleading given the first test below it is
> testing for nulls. Maybe it can be changed to
> 
> +-- pruning should work if values or is null clauses are provided for
> all partition keys.
I have adjusted the comments.

Updated patch attached.

Thanks,
Amit
From 5a6d00d4d9d6aa8bb84dc9699646ee5c4fa77719 Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 10 Apr 2018 16:06:33 +0900
Subject: [PATCH v2] Rewrite hash partition pruning tests to use custom opclass

Relying on platform-provided hashing functions makes tests unreliable
as shown by buildfarm recently.

This adds adjusted tests to partition_prune.sql itself and hence
partition_prune_hash.sql is deleted along with two expected output
files.

Discussion: 
https://postgr.es/m/CA%2BTgmoZ0D5kJbt8eKXtvVdvTcGGWn6ehWCRSZbWytD-uzH92mQ%40mail.gmail.com
---
 src/test/regress/expected/partition_prune.out  | 201 +
 src/test/regress/expected/partition_prune_hash.out | 189 ---
 .../regress/expected/partition_prune_hash_1.out| 187 ---
 src/test/regress/parallel_schedule |   2 +-
 src/test/regress/serial_schedule   |   1 -
 src/test/regress/sql/partition_prune.sql   |  58 +-
 src/test/regress/sql/partition_prune_hash.sql  |

Re: [HACKERS] path toward faster partition pruning

2018-04-10 Thread Amit Langote

Thanks for the comment.

On 2018/04/10 21:11, Ashutosh Bapat wrote:
> On Tue, Apr 10, 2018 at 5:32 PM, David Rowley
>  wrote:
>> Apart from that confusion, looking at the patch:
>>
>> +CREATE OR REPLACE FUNCTION pp_hashint4_noop(int4, int8) RETURNS int8 AS
>> +$$SELECT coalesce($1)::int8$$ LANGUAGE sql IMMUTABLE STRICT;
>> +CREATE OPERATOR CLASS pp_test_int4_ops FOR TYPE int4 USING HASH AS
>> +OPERATOR 1 = , FUNCTION 2 pp_hashint4_noop(int4, int8);
>> +CREATE OR REPLACE FUNCTION pp_hashtext_length(text, int8) RETURNS int8 AS
>> +$$SELECT length(coalesce($1))::int8$$ LANGUAGE sql IMMUTABLE STRICT;
>>
>>
>> Why coalesce here? Maybe I've not thought of something, but coalesce
>> only seems useful to me if there's > 1 argument. Plus the function is
>> strict, so not sure it's really doing even if you added a default.
> 
> I think Amit Langote wanted to write coalesce($1, $2), $2 being the
> seed for hash function. See how hash operator class functions are
> defined in sql/insert.sql.

Actually, I referenced functions and operator classes defined in
hash_part.sql, not insert.sql.  Although as you point out, I didn't think
very hard about the significance of $2 passed to coalesce in those
functions.  I will fix that and add it back, along with some other changes
that makes them almost identical with definitions in hash_part.sql.

> May be we should just use the same
> functions or even the same tables.

Because hash_part.sql and partition_prune.sql tests run in parallel, I've
decided to rename the functions, operator classes, and the tables in
partition_prune.sql.  It seems like a good idea in any case.  Also, since
the existing pruning tests were written with that table, I decided not to
change that.

Will post an updated patch after addressing David's comment.

Regards,
Amit

Re: [HACKERS] path toward faster partition pruning

2018-04-10 Thread Ashutosh Bapat

On Tue, Apr 10, 2018 at 5:32 PM, David Rowley
 wrote:
> On 10 April 2018 at 20:56, Amit Langote  wrote:
>> On 2018/04/10 13:27, Ashutosh Bapat wrote:
>>> On Mon, Apr 9, 2018 at 8:56 PM, Robert Haas  wrote:
 CREATE OR REPLACE FUNCTION hashint4_noop(int4, int8) RETURNS int8 AS
 $$SELECT coalesce($1,0)::int8$$ LANGUAGE sql IMMUTABLE;
 CREATE OPERATOR CLASS test_int4_ops FOR TYPE int4 USING HASH AS
 OPERATOR 1 = , FUNCTION 2 hashint4_noop(int4, int8);
 CREATE TABLE mchash (a int, b text, c jsonb)
   PARTITION BY HASH (a test_int4_ops, b test_text_ops);
>>
>> Thanks for the idea.  I think it makes sense and also agree that alternate
>> outputs approach is not perfectly reliable and maintainable.
>>
>>> +1.
>>
>> Attached find a patch that rewrites hash partition pruning tests that
>> away.  It creates two hash operator classes, one for int4 and another for
>> text type and uses them to create hash partitioned table to be used in the
>> tests, like done in the existing tests in hash_part.sql.  Since that makes
>> tests (hopefully) reliably return the same result always, I no longer see
>> the need to keep them in a separate partition_prune_hash.sql.  The
>> reasoning behind having the separate file was to keep the alternative
>> output file small as David explained in [1].
>> [1]
>> https://www.postgresql.org/message-id/CAKJS1f-SON_hAekqoV4_WQwJBtJ_rvvSe68jRNhuYcXqQ8PoQg%40mail.gmail.com
>
> I had a quick look, but I'm still confused about why a function like
> hash_uint32_extended() is susceptible to varying results depending on
> CPU endianness but hash_combine64 is not.
>
> Apart from that confusion, looking at the patch:
>
> +CREATE OR REPLACE FUNCTION pp_hashint4_noop(int4, int8) RETURNS int8 AS
> +$$SELECT coalesce($1)::int8$$ LANGUAGE sql IMMUTABLE STRICT;
> +CREATE OPERATOR CLASS pp_test_int4_ops FOR TYPE int4 USING HASH AS
> +OPERATOR 1 = , FUNCTION 2 pp_hashint4_noop(int4, int8);
> +CREATE OR REPLACE FUNCTION pp_hashtext_length(text, int8) RETURNS int8 AS
> +$$SELECT length(coalesce($1))::int8$$ LANGUAGE sql IMMUTABLE STRICT;
>
>
> Why coalesce here? Maybe I've not thought of something, but coalesce
> only seems useful to me if there's > 1 argument. Plus the function is
> strict, so not sure it's really doing even if you added a default.

I think Amit Langote wanted to write coalesce($1, $2), $2 being the
seed for hash function. See how hash operator class functions are
defined in sql/insert.sql. May be we should just use the same
functions or even the same tables.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] path toward faster partition pruning

2018-04-10 Thread David Rowley

On 10 April 2018 at 20:56, Amit Langote  wrote:
> On 2018/04/10 13:27, Ashutosh Bapat wrote:
>> On Mon, Apr 9, 2018 at 8:56 PM, Robert Haas  wrote:
>>> CREATE OR REPLACE FUNCTION hashint4_noop(int4, int8) RETURNS int8 AS
>>> $$SELECT coalesce($1,0)::int8$$ LANGUAGE sql IMMUTABLE;
>>> CREATE OPERATOR CLASS test_int4_ops FOR TYPE int4 USING HASH AS
>>> OPERATOR 1 = , FUNCTION 2 hashint4_noop(int4, int8);
>>> CREATE TABLE mchash (a int, b text, c jsonb)
>>>   PARTITION BY HASH (a test_int4_ops, b test_text_ops);
>
> Thanks for the idea.  I think it makes sense and also agree that alternate
> outputs approach is not perfectly reliable and maintainable.
>
>> +1.
>
> Attached find a patch that rewrites hash partition pruning tests that
> away.  It creates two hash operator classes, one for int4 and another for
> text type and uses them to create hash partitioned table to be used in the
> tests, like done in the existing tests in hash_part.sql.  Since that makes
> tests (hopefully) reliably return the same result always, I no longer see
> the need to keep them in a separate partition_prune_hash.sql.  The
> reasoning behind having the separate file was to keep the alternative
> output file small as David explained in [1].
> [1]
> https://www.postgresql.org/message-id/CAKJS1f-SON_hAekqoV4_WQwJBtJ_rvvSe68jRNhuYcXqQ8PoQg%40mail.gmail.com

I had a quick look, but I'm still confused about why a function like
hash_uint32_extended() is susceptible to varying results depending on
CPU endianness but hash_combine64 is not.

Apart from that confusion, looking at the patch:

+CREATE OR REPLACE FUNCTION pp_hashint4_noop(int4, int8) RETURNS int8 AS
+$$SELECT coalesce($1)::int8$$ LANGUAGE sql IMMUTABLE STRICT;
+CREATE OPERATOR CLASS pp_test_int4_ops FOR TYPE int4 USING HASH AS
+OPERATOR 1 = , FUNCTION 2 pp_hashint4_noop(int4, int8);
+CREATE OR REPLACE FUNCTION pp_hashtext_length(text, int8) RETURNS int8 AS
+$$SELECT length(coalesce($1))::int8$$ LANGUAGE sql IMMUTABLE STRICT;


Why coalesce here? Maybe I've not thought of something, but coalesce
only seems useful to me if there's > 1 argument. Plus the function is
strict, so not sure it's really doing even if you added a default.

I know this one was there before, but I only just noticed it:

+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;

The comment is a bit misleading given the first test below it is
testing for nulls. Maybe it can be changed to

+-- pruning should work if values or is null clauses are provided for
all partition keys.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-10 Thread Amit Langote

On 2018/04/10 13:27, Ashutosh Bapat wrote:
> On Mon, Apr 9, 2018 at 8:56 PM, Robert Haas  wrote:
>> On Fri, Apr 6, 2018 at 11:41 PM, Tom Lane  wrote:
>>> David Rowley  writes:
 Sounds like you're saying that if we have too many alternative files
 then there's a chance that one could pass by luck.
>>>
>>> Yeah, exactly: it passed, but did it pass for the right reason?
>>>
>>> If there's just two expected-files, it's likely not a big problem,
>>> but if you have a bunch it's something to worry about.
>>>
>>> I'm also wondering how come we had hash partitioning before and
>>> did not have this sort of problem.  Is it just that we added a
>>> new test that's more sensitive to the details of the hashing
>>> (if so, could it be made less so)?  Or is there actually more
>>> platform dependence now than before (and if so, why is that)?
>>
>> The existing hash partitioning tests did have some dependencies on the
>> hash function, but they took care not to use the built-in hash
>> functions.  Instead they did stuff like this:
>>
>> CREATE OR REPLACE FUNCTION hashint4_noop(int4, int8) RETURNS int8 AS
>> $$SELECT coalesce($1,0)::int8$$ LANGUAGE sql IMMUTABLE;
>> CREATE OPERATOR CLASS test_int4_ops FOR TYPE int4 USING HASH AS
>> OPERATOR 1 = , FUNCTION 2 hashint4_noop(int4, int8);
>> CREATE TABLE mchash (a int, b text, c jsonb)
>>   PARTITION BY HASH (a test_int4_ops, b test_text_ops);
>>
>> I think that this approach should also be used for the new tests.
>> Variant expected output files are a pain to maintain, and you
>> basically just have to take whatever output you get as the right
>> answer, because nobody knows what output a certain built-in hash
>> function should produce for a given input except by running the code.
>> If you do the kind of thing shown above, though, then you can easily
>> see by inspection that you're getting the right answer.

Thanks for the idea.  I think it makes sense and also agree that alternate
outputs approach is not perfectly reliable and maintainable.

> +1.

Attached find a patch that rewrites hash partition pruning tests that
away.  It creates two hash operator classes, one for int4 and another for
text type and uses them to create hash partitioned table to be used in the
tests, like done in the existing tests in hash_part.sql.  Since that makes
tests (hopefully) reliably return the same result always, I no longer see
the need to keep them in a separate partition_prune_hash.sql.  The
reasoning behind having the separate file was to keep the alternative
output file small as David explained in [1].

However, I noticed that there is a bug in RelationBuildPartitionKey that
causes a crash when using a SQL function as partition support function as
the revised tests do, so please refer to and apply the patches I posted
here before running the revised tests:

https://www.postgresql.org/message-id/3041e853-b1dd-a0c6-ff21-7cc5633bffd0%40lab.ntt.co.jp

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CAKJS1f-SON_hAekqoV4_WQwJBtJ_rvvSe68jRNhuYcXqQ8PoQg%40mail.gmail.com
From c1508fc715a7783108f626c67c76fcc1f2303719 Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 10 Apr 2018 16:06:33 +0900
Subject: [PATCH v1] Rewrite hash partition pruning tests to use custom opclass

Relying on platform-provided hashing functions makes tests unreliable
as shown by buildfarm recently.

This adds adjusted tests to partition_prune.sql itself and hence
partition_prune_hash.sql is deleted along with two expected output
files.

Discussion: 
https://postgr.es/m/CA%2BTgmoZ0D5kJbt8eKXtvVdvTcGGWn6ehWCRSZbWytD-uzH92mQ%40mail.gmail.com
---
 src/test/regress/expected/partition_prune.out  | 202 -
 src/test/regress/expected/partition_prune_hash.out | 189 ---
 .../regress/expected/partition_prune_hash_1.out| 187 ---
 src/test/regress/parallel_schedule |   2 +-
 src/test/regress/serial_schedule   |   1 -
 src/test/regress/sql/partition_prune.sql   |  59 +-
 src/test/regress/sql/partition_prune_hash.sql  |  41 -
 7 files changed, 259 insertions(+), 422 deletions(-)
 delete mode 100644 src/test/regress/expected/partition_prune_hash.out
 delete mode 100644 src/test/regress/expected/partition_prune_hash_1.out
 delete mode 100644 src/test/regress/sql/partition_prune_hash.sql

diff --git a/src/test/regress/expected/partition_prune.out 
b/src/test/regress/expected/partition_prune.out
index df3fca025e..935e7dc79b 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1330,7 +1330,207 @@ explain (costs off) select * from rparted_by_int2 where 
a > 100;
  Filter: (a > '100'::bigint)
 (3 rows)
 
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, 
coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;

Re: [HACKERS] path toward faster partition pruning

2018-04-09 Thread Ashutosh Bapat

On Mon, Apr 9, 2018 at 8:56 PM, Robert Haas  wrote:
> On Fri, Apr 6, 2018 at 11:41 PM, Tom Lane  wrote:
>> David Rowley  writes:
>>> Sounds like you're saying that if we have too many alternative files
>>> then there's a chance that one could pass by luck.
>>
>> Yeah, exactly: it passed, but did it pass for the right reason?
>>
>> If there's just two expected-files, it's likely not a big problem,
>> but if you have a bunch it's something to worry about.
>>
>> I'm also wondering how come we had hash partitioning before and
>> did not have this sort of problem.  Is it just that we added a
>> new test that's more sensitive to the details of the hashing
>> (if so, could it be made less so)?  Or is there actually more
>> platform dependence now than before (and if so, why is that)?
>
> The existing hash partitioning tests did have some dependencies on the
> hash function, but they took care not to use the built-in hash
> functions.  Instead they did stuff like this:
>
> CREATE OR REPLACE FUNCTION hashint4_noop(int4, int8) RETURNS int8 AS
> $$SELECT coalesce($1,0)::int8$$ LANGUAGE sql IMMUTABLE;
> CREATE OPERATOR CLASS test_int4_ops FOR TYPE int4 USING HASH AS
> OPERATOR 1 = , FUNCTION 2 hashint4_noop(int4, int8);
> CREATE TABLE mchash (a int, b text, c jsonb)
>   PARTITION BY HASH (a test_int4_ops, b test_text_ops);
>
> I think that this approach should also be used for the new tests.
> Variant expected output files are a pain to maintain, and you
> basically just have to take whatever output you get as the right
> answer, because nobody knows what output a certain built-in hash
> function should produce for a given input except by running the code.
> If you do the kind of thing shown above, though, then you can easily
> see by inspection that you're getting the right answer.

+1.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] path toward faster partition pruning

2018-04-09 Thread Robert Haas

On Fri, Apr 6, 2018 at 11:41 PM, Tom Lane  wrote:
> David Rowley  writes:
>> Sounds like you're saying that if we have too many alternative files
>> then there's a chance that one could pass by luck.
>
> Yeah, exactly: it passed, but did it pass for the right reason?
>
> If there's just two expected-files, it's likely not a big problem,
> but if you have a bunch it's something to worry about.
>
> I'm also wondering how come we had hash partitioning before and
> did not have this sort of problem.  Is it just that we added a
> new test that's more sensitive to the details of the hashing
> (if so, could it be made less so)?  Or is there actually more
> platform dependence now than before (and if so, why is that)?

The existing hash partitioning tests did have some dependencies on the
hash function, but they took care not to use the built-in hash
functions.  Instead they did stuff like this:

CREATE OR REPLACE FUNCTION hashint4_noop(int4, int8) RETURNS int8 AS
$$SELECT coalesce($1,0)::int8$$ LANGUAGE sql IMMUTABLE;
CREATE OPERATOR CLASS test_int4_ops FOR TYPE int4 USING HASH AS
OPERATOR 1 = , FUNCTION 2 hashint4_noop(int4, int8);
CREATE TABLE mchash (a int, b text, c jsonb)
  PARTITION BY HASH (a test_int4_ops, b test_text_ops);

I think that this approach should also be used for the new tests.
Variant expected output files are a pain to maintain, and you
basically just have to take whatever output you get as the right
answer, because nobody knows what output a certain built-in hash
function should produce for a given input except by running the code.
If you do the kind of thing shown above, though, then you can easily
see by inspection that you're getting the right answer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] path toward faster partition pruning

2018-04-09 Thread Amit Langote

Hi David.

On 2018/04/09 12:48, David Rowley wrote:
> While looking at the docs in [1], I saw that we still mention:
> 
> 4. Ensure that the constraint_exclusion configuration parameter is not
> disabled in postgresql.conf. If it is, queries will not be optimized
> as desired.
> 
> This is no longer true. The attached patch removed it.
> 
> [1] https://www.postgresql.org/docs/10/static/ddl-partitioning.htm
Thanks.  I was aware of the changes that would need to be made, but you
beat me to writing the patch itself.

About the patch:

While the users no longer need to enable constraint_exclusion true for
select queries, one would still need it for update/delete queries, because
the new pruning logic only gets invoked for the former.  Alas...

Also, further down on that page, there is a 5.10.4 Partitioning and
Constraint Exclusion sub-section.  I think it would also need some tweaks
due to new developments.

I updated your patch to fix that.  Please take a look.

Thanks,
Amit
From a4fe924936fe623ff95e6aa050b8fd7d22dbbb84 Mon Sep 17 00:00:00 2001
From: "dgrow...@gmail.com" 
Date: Mon, 9 Apr 2018 15:43:32 +1200
Subject: [PATCH v2] Remove mention of constraint_exclusion in partitioning
 docs

As of 9fdb675fc5d2, this GUC now no longer has an affect on partition pruning.
---
 doc/src/sgml/ddl.sgml | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index feb2ab7792..eed8753e24 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3194,13 +3194,14 @@ CREATE INDEX ON measurement (logdate);
   
  
 
-  
-   
-Ensure that the 
-configuration parameter is not disabled in 
postgresql.conf.
-If it is, queries will not be optimized as desired.
-   
-  
+ 
+  
+   Ensure that the 
+   configuration parameter is not disabled in 
postgresql.conf.
+   While enabling it is not required for select queries, not doing so will 
result
+   in update and delete queries to not be optimized as desired.
+  
+ 
 

 
@@ -3767,10 +3768,12 @@ ANALYZE measurement;

 

-Constraint exclusion is a query optimization 
technique
-that improves performance for partitioned tables defined in the
-fashion described above (both declaratively partitioned tables and those
-implemented using inheritance).  As an example:
+Constraint exclusion is a query optimization
+technique that improves performance for partitioned tables defined in the
+fashion described above.  While it is used only for update and delete
+queries in the case of declaratively partitioned tables, it is used for all
+queries in the case of table partitioning implemented using inheritance.
+As an example:
 
 
 SET constraint_exclusion = on;
-- 
2.11.0

Re: [HACKERS] path toward faster partition pruning

2018-04-08 Thread David Rowley

While looking at the docs in [1], I saw that we still mention:

4. Ensure that the constraint_exclusion configuration parameter is not
disabled in postgresql.conf. If it is, queries will not be optimized
as desired.

This is no longer true. The attached patch removed it.

[1] https://www.postgresql.org/docs/10/static/ddl-partitioning.html

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


0001-Remove-mention-of-constraint_exclusion-in-partitioni.patch
Description: Binary data

Re: [HACKERS] path toward faster partition pruning

2018-04-07 Thread Alvaro Herrera

Andres Freund wrote:
> On 2018-04-07 08:13:23 -0300, Alvaro Herrera wrote:
> > Andres Freund wrote:
> > > I've also attempted to fix rhinoceros's failure I remarked upon a couple
> > > hours ago in
> > > https://postgr.es/m/20180406210330.wmqw42wqgiick...@alap3.anarazel.de
> > 
> > And this, too.  I was unsure if this was because we were missing calling
> > some object access hook from the new code, or the additional pruning.
> 
> That's possible.  I did attempt to skim the code, that's where my
> complain about the docs originated. There certainly isn't an
> InvokeFunctionExecuteHook() present.  It's not clear to me whether
> that's an issue - we don't invoke the hooks in a significant number of
> places either, and as far as I can discern there's not much rule or
> reason about where we invoke it.

I managed to convince myself that it's not higher-level code's
responsibility to invoke the execute hooks; the likelihood of bugs of
omission seems just too large.  But maybe I'm wrong.

There's a small number of InvokeFunctionExecuteHook calls in the
executor, but I really doubt that it exhaustively covers everyplace
where catalogued functions are called in the executor.

CC'ing KaiGai and Stephen Frost; they may want to chip in here.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-07 Thread Andres Freund

On 2018-04-07 08:13:23 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > I've also attempted to fix rhinoceros's failure I remarked upon a couple
> > hours ago in
> > https://postgr.es/m/20180406210330.wmqw42wqgiick...@alap3.anarazel.de
> 
> And this, too.  I was unsure if this was because we were missing calling
> some object access hook from the new code, or the additional pruning.

That's possible.  I did attempt to skim the code, that's where my
complain about the docs originated. There certainly isn't an
InvokeFunctionExecuteHook() present.  It's not clear to me whether
that's an issue - we don't invoke the hooks in a significant number of
places either, and as far as I can discern there's not much rule or
reason about where we invoke it.

Greetings,

Andres Freund

Re: [HACKERS] path toward faster partition pruning

2018-04-07 Thread Alvaro Herrera

Andres Freund wrote:
> Hi,
> 
> On 2018-04-07 15:49:54 +1200, David Rowley wrote:
> > Right, I suggest we wait and see if all members go green again as a
> > result of 40e42e1024c, and if they're happy then we could maybe leave
> > it as is with the 2 alternatives output files.
> 
> At least the first previously borked animal came back green (termite).

Thanks everyone for addressing this.

> I've also attempted to fix rhinoceros's failure I remarked upon a couple
> hours ago in
> https://postgr.es/m/20180406210330.wmqw42wqgiick...@alap3.anarazel.de

And this, too.  I was unsure if this was because we were missing calling
some object access hook from the new code, or the additional pruning.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-07 Thread Amit Langote

On Sat, Apr 7, 2018 at 1:39 PM, Tom Lane  wrote:
> Amit Langote  writes:
>> Given that the difference only appeared on animals that David pointed
>> out have big-endian architecture, it seems we'd only need two output
>> files.
>
> Dunno, I'm wondering whether 32 vs 64 bit will make a difference.

There was one 32-bit animal in the failing set, which apparently
produces the same hashes as others (allegedly due to endianness
difference).

powerpc 32-bit userspace on ppc64 host:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite=2018-04-06%2023%3A40%3A07

...and it has turned green since the alternative outputs fix went in.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite=2018-04-07%2004%3A06%3A09

Thanks,
Amit

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread Tom Lane

Amit Langote  writes:
> Given that the difference only appeared on animals that David pointed
> out have big-endian architecture, it seems we'd only need two output
> files.

Dunno, I'm wondering whether 32 vs 64 bit will make a difference.

regards, tom lane

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread Amit Langote

On Sat, Apr 7, 2018 at 1:09 PM, Andres Freund  wrote:
> Hi,
>
> On 2018-04-07 15:49:54 +1200, David Rowley wrote:
>> Right, I suggest we wait and see if all members go green again as a
>> result of 40e42e1024c, and if they're happy then we could maybe leave
>> it as is with the 2 alternatives output files.
>
> At least the first previously borked animal came back green (termite).
>
>
>> I don't particularly think it matters which hash partition a tuple
>> goes into, as long as the hash function spreads the values out enough
>> and most importantly, the pruning code looks for the tuple in the
>> partition that it was actually inserted into in the first place.
>> Obviously, we also want to ensure we never do anything which would
>> change the matching partition in either minor or major version
>> upgrades too.
>
> +1

+1

Given that the difference only appeared on animals that David pointed
out have big-endian architecture, it seems we'd only need two output
files.  It does seem true that the extended hashing functions  that
were adding to support partitioning would somehow be affected by
endianness.

Thank you David for creating the patches and Andres for committing it.
Buildfarm seems to be turning green where it had gone red due to the
hashing differences.

> I've also attempted to fix rhinoceros's failure I remarked upon a couple
> hours ago in
> https://postgr.es/m/20180406210330.wmqw42wqgiick...@alap3.anarazel.de

Thanks Andres.

Regards,
Amit

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread David Rowley

On 7 April 2018 at 15:18, Andres Freund  wrote:
> I've pushed the two patches (collapsed). Trying to get the BF green-ish
> again...

termite has now gone green.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread David Rowley

On 7 April 2018 at 16:09, Andres Freund  wrote:
> I've also attempted to fix rhinoceros's failure I remarked upon a couple
> hours ago in
> https://postgr.es/m/20180406210330.wmqw42wqgiick...@alap3.anarazel.de

Oh, thanks!

I had just been looking at that too...

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread David Rowley

On 7 April 2018 at 09:03, Andres Freund  wrote:
> There's also
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros=2018-04-06%2020%3A45%3A01
> *** 
> /opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/contrib/sepgsql/expected/misc.out
>2018-02-20 18:45:02.068665297 -0800
> --- 
> /opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/contrib/sepgsql/results/misc.out
> 2018-04-06 13:55:50.718253850 -0700
> ***
> *** 32,40 
>   (6 rows)
>
>   SELECT * FROM t1p WHERE o > 50 AND p like '%64%';
> - LOG:  SELinux: allowed { execute } 
> scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 
> tcontext=system_u:object_r:sepgsql_proc_exec_t:s0 tclass=db_procedure 
> name="pg_catalog.int4le(integer,integer)"
> - LOG:  SELinux: allowed { execute } 
> scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 
> tcontext=system_u:object_r:sepgsql_proc_exec_t:s0 tclass=db_procedure 
> name="pg_catalog.int4le(integer,integer)"
> - LOG:  SELinux: allowed { execute } 
> scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 
> tcontext=system_u:object_r:sepgsql_proc_exec_t:s0 tclass=db_procedure 
> name="pg_catalog.int4le(integer,integer)"
>   LOG:  SELinux: allowed { select } 
> scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 
> tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_table 
> name="public.t1p"
>   LOG:  SELinux: allowed { select } 
> scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 
> tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column 
> name="table t1p column o"
>   LOG:  SELinux: allowed { select } 
> scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 
> tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column 
> name="table t1p column p"
> --- 32,37 
>
> seems you just need to remove those rows from the expected file.

Agreed.

Patch attached.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


part_prune_sepgsql_fix.patch
Description: Binary data

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread Andres Freund

Hi,

On 2018-04-07 15:49:54 +1200, David Rowley wrote:
> Right, I suggest we wait and see if all members go green again as a
> result of 40e42e1024c, and if they're happy then we could maybe leave
> it as is with the 2 alternatives output files.

At least the first previously borked animal came back green (termite).


> I don't particularly think it matters which hash partition a tuple
> goes into, as long as the hash function spreads the values out enough
> and most importantly, the pruning code looks for the tuple in the
> partition that it was actually inserted into in the first place.
> Obviously, we also want to ensure we never do anything which would
> change the matching partition in either minor or major version
> upgrades too.

+1


I've also attempted to fix rhinoceros's failure I remarked upon a couple
hours ago in
https://postgr.es/m/20180406210330.wmqw42wqgiick...@alap3.anarazel.de

Greetings,

Andres Freund

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread David Rowley

On 7 April 2018 at 15:41, Tom Lane  wrote:
> David Rowley  writes:
>> Sounds like you're saying that if we have too many alternative files
>> then there's a chance that one could pass by luck.
>
> Yeah, exactly: it passed, but did it pass for the right reason?
>
> If there's just two expected-files, it's likely not a big problem,
> but if you have a bunch it's something to worry about.

Right, I suggest we wait and see if all members go green again as a
result of 40e42e1024c, and if they're happy then we could maybe leave
it as is with the 2 alternatives output files.

If there are some other variations that crop up, then we can think
harder about what we can do to improve the coverage.

I don't particularly think it matters which hash partition a tuple
goes into, as long as the hash function spreads the values out enough
and most importantly, the pruning code looks for the tuple in the
partition that it was actually inserted into in the first place.
Obviously, we also want to ensure we never do anything which would
change the matching partition in either minor or major version
upgrades too.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread Andres Freund

Hi,

On 2018-04-06 23:41:22 -0400, Tom Lane wrote:
> David Rowley  writes:
> > Sounds like you're saying that if we have too many alternative files
> > then there's a chance that one could pass by luck.
> 
> Yeah, exactly: it passed, but did it pass for the right reason?
> 
> If there's just two expected-files, it's likely not a big problem,
> but if you have a bunch it's something to worry about.

There should be only two alternatives, given our current hashing
implementation, right?

Greetings,

Andres Freund

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread David Rowley

On 7 April 2018 at 15:41, Tom Lane  wrote:
> I'm also wondering how come we had hash partitioning before and
> did not have this sort of problem.  Is it just that we added a
> new test that's more sensitive to the details of the hashing
> (if so, could it be made less so)?  Or is there actually more
> platform dependence now than before (and if so, why is that)?

We didn't prune HASH partitions before today. They were just all returned.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread Tom Lane

David Rowley  writes:
> Sounds like you're saying that if we have too many alternative files
> then there's a chance that one could pass by luck.

Yeah, exactly: it passed, but did it pass for the right reason?

If there's just two expected-files, it's likely not a big problem,
but if you have a bunch it's something to worry about.

I'm also wondering how come we had hash partitioning before and
did not have this sort of problem.  Is it just that we added a
new test that's more sensitive to the details of the hashing
(if so, could it be made less so)?  Or is there actually more
platform dependence now than before (and if so, why is that)?

regards, tom lane

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread Ashutosh Bapat

On Sat, Apr 7, 2018 at 8:55 AM, David Rowley
 wrote:
> On 7 April 2018 at 15:14, Ashutosh Bapat
>  wrote:
>> On Sat, Apr 7, 2018 at 8:37 AM, David Rowley
>>> Why is writing tests that produce the same output required?
>>>
>>> We have many tests with alternative outputs. Look in
>>> src/tests/regress/expected for files matching _1.out
>>>
>>
>> That's true, but we usually add such alternative output when we know
>> all the variants possible as long as "all the variants" do not cover
>> everything possible. AFAIU, that's not true here. Also, on a given
>> machine a particular row is guaranteed to fall in a given partition.
>> On a different machine it will fall in some other partition, but
>> always that partition on that machine. We don't have a way to select
>> alternate output based on the architecture. May be a better idea is to
>> use .source file, creating .out on the fly based on the architecture
>> of machine like testing the hash output for a given value to decide
>> which partition it will fall into and then crafting .out with that
>> partition's name.
>
> Sounds like you're saying that if we have too many alternative files
> then there's a chance that one could pass by luck.

Yes.

>
> Maybe we can just back up what's just been committed by actually
> executing the queries and ensuring that all rows that made it into the
> table make it back out again.

That's one way. But how would we make sure that they landed in proper
partition. Actually we do not know what's proper partition for a given
architecture. And how would we make sure that all rows with the same
partition key land in the same partition. That's why I am suggesting
to calculate the hash value, take modulo and craft the name of
partition where corresponding row will land on a given architecture.
That way, we are sure that the tuple routing logic is correct and also
the partition pruning logic.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread David Rowley

On 7 April 2018 at 15:14, Ashutosh Bapat
 wrote:
> On Sat, Apr 7, 2018 at 8:37 AM, David Rowley
>> Why is writing tests that produce the same output required?
>>
>> We have many tests with alternative outputs. Look in
>> src/tests/regress/expected for files matching _1.out
>>
>
> That's true, but we usually add such alternative output when we know
> all the variants possible as long as "all the variants" do not cover
> everything possible. AFAIU, that's not true here. Also, on a given
> machine a particular row is guaranteed to fall in a given partition.
> On a different machine it will fall in some other partition, but
> always that partition on that machine. We don't have a way to select
> alternate output based on the architecture. May be a better idea is to
> use .source file, creating .out on the fly based on the architecture
> of machine like testing the hash output for a given value to decide
> which partition it will fall into and then crafting .out with that
> partition's name.

Sounds like you're saying that if we have too many alternative files
then there's a chance that one could pass by luck.

Maybe we can just back up what's just been committed by actually
executing the queries and ensuring that all rows that made it into the
table make it back out again.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread David Rowley

On 7 April 2018 at 15:18, Andres Freund  wrote:
> I've pushed the two patches (collapsed). Trying to get the BF green-ish
> again...

Thanks!


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread Andres Freund

On 2018-04-07 15:04:37 +1200, David Rowley wrote:
> On 7 April 2018 at 15:00, Andres Freund  wrote:
> > On 2018-04-07 14:42:53 +1200, David Rowley wrote:
> >> On 7 April 2018 at 13:31, David Rowley  
> >> wrote:
> >> > Maybe the best solution is to pull those tests out of
> >> > partition_prune.sql then create partition_prune_hash and just have an
> >> > alternative .out file with the partitions which match on bigendian
> >> > machines.
> >>
> >> Here's 1 of 2. I thought it was best to get the buildfarm green again
> >> as soon as possible.
> >
> > Do you have an estimate how long it'll take you to produce patch 2? It'd
> > be cool to get this covered again soon.  If you don't have access to a
> > big endian machine, we can construct the output from the buildfarm... So
> > pulling the tests out would be the only "urgent" thing, I can go on from
> > there.
> 
> Attached.
> 
> I've not tested on a bigendian machine, but the diff -c between the
> two output files match the diff on the failing buildfarm members.

I've pushed the two patches (collapsed). Trying to get the BF green-ish
again...

- Andres

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread Ashutosh Bapat

On Sat, Apr 7, 2018 at 8:37 AM, David Rowley
 wrote:
> On 7 April 2018 at 15:03, Ashutosh Bapat
>  wrote:
>> On Sat, Apr 7, 2018 at 7:25 AM, David Rowley
>>> The only alternative would be to change all the hash functions so that
>>> they normalise their endianness. It does not sound like something that
>>> will perform very well. Plus it would break everyone's hash indexes on
>>> a pg_upgrade.
>>>
>>> pg_basebackups can't be transferred over to other architectures
>>> anyway, so I'm not so worried about tuples being routed to other
>>> partitions.
>>>
>>> Maybe someone else can see a reason why this is bad?
>>
>> I don't think the concept is bad by itself. That's expected, in fact,
>> we have added an option to pg_dump (dump through parent or some such)
>> to handle exactly this case. What Amit seems to be complaining though
>> is the regression test. We need to write regression tests so that they
>> produce the same plans, pruning same partitions by name, on all
>> architectures.
>
> Why is writing tests that produce the same output required?
>
> We have many tests with alternative outputs. Look in
> src/tests/regress/expected for files matching _1.out
>

That's true, but we usually add such alternative output when we know
all the variants possible as long as "all the variants" do not cover
everything possible. AFAIU, that's not true here. Also, on a given
machine a particular row is guaranteed to fall in a given partition.
On a different machine it will fall in some other partition, but
always that partition on that machine. We don't have a way to select
alternate output based on the architecture. May be a better idea is to
use .source file, creating .out on the fly based on the architecture
of machine like testing the hash output for a given value to decide
which partition it will fall into and then crafting .out with that
partition's name.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread David Rowley

On 7 April 2018 at 15:03, Ashutosh Bapat
 wrote:
> On Sat, Apr 7, 2018 at 7:25 AM, David Rowley
>> The only alternative would be to change all the hash functions so that
>> they normalise their endianness. It does not sound like something that
>> will perform very well. Plus it would break everyone's hash indexes on
>> a pg_upgrade.
>>
>> pg_basebackups can't be transferred over to other architectures
>> anyway, so I'm not so worried about tuples being routed to other
>> partitions.
>>
>> Maybe someone else can see a reason why this is bad?
>
> I don't think the concept is bad by itself. That's expected, in fact,
> we have added an option to pg_dump (dump through parent or some such)
> to handle exactly this case. What Amit seems to be complaining though
> is the regression test. We need to write regression tests so that they
> produce the same plans, pruning same partitions by name, on all
> architectures.

Why is writing tests that produce the same output required?

We have many tests with alternative outputs. Look in
src/tests/regress/expected for files matching _1.out

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread David Rowley

On 7 April 2018 at 15:00, Andres Freund  wrote:
> On 2018-04-07 14:42:53 +1200, David Rowley wrote:
>> On 7 April 2018 at 13:31, David Rowley  wrote:
>> > Maybe the best solution is to pull those tests out of
>> > partition_prune.sql then create partition_prune_hash and just have an
>> > alternative .out file with the partitions which match on bigendian
>> > machines.
>>
>> Here's 1 of 2. I thought it was best to get the buildfarm green again
>> as soon as possible.
>
> Do you have an estimate how long it'll take you to produce patch 2? It'd
> be cool to get this covered again soon.  If you don't have access to a
> big endian machine, we can construct the output from the buildfarm... So
> pulling the tests out would be the only "urgent" thing, I can go on from
> there.

Attached.

I've not tested on a bigendian machine, but the diff -c between the
two output files match the diff on the failing buildfarm members.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


0002-Add-HASH-partition-pruning-tests.patch
Description: Binary data

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread Ashutosh Bapat

On Sat, Apr 7, 2018 at 7:25 AM, David Rowley
 wrote:
> On 7 April 2018 at 13:50, Amit Langote  wrote:
>> On Sat, Apr 7, 2018 at 10:31 AM, David Rowley
>>> I looked at all the regression test diffs for each of the servers you
>>> mentioned and I verified that the diffs match on each of the 7
>>> servers.
>>>
>>> Maybe the best solution is to pull those tests out of
>>> partition_prune.sql then create partition_prune_hash and just have an
>>> alternative .out file with the partitions which match on bigendian
>>> machines.
>>>
>>> We could also keep them in the same file, but that's a much bigger
>>> alternative file to maintain and more likely to get broken if someone
>>> forgets to update it.
>>>
>>> What do you think?
>>
>> Yeah, that's an idea.
>>
>> Is it alright though that same data may end up in different hash
>> partitions depending on the architecture?  IIRC, that's the way we
>> decided to go when using hash partitioning, but it would've been
>> clearer if there was already some evidence in regression tests that
>> that's what we've chosen, such as, some existing tests for tuple
>> routing.
>
> The only alternative would be to change all the hash functions so that
> they normalise their endianness. It does not sound like something that
> will perform very well. Plus it would break everyone's hash indexes on
> a pg_upgrade.
>
> pg_basebackups can't be transferred over to other architectures
> anyway, so I'm not so worried about tuples being routed to other
> partitions.
>
> Maybe someone else can see a reason why this is bad?

I don't think the concept is bad by itself. That's expected, in fact,
we have added an option to pg_dump (dump through parent or some such)
to handle exactly this case. What Amit seems to be complaining though
is the regression test. We need to write regression tests so that they
produce the same plans, pruning same partitions by name, on all
architectures.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread Andres Freund

On 2018-04-07 14:42:53 +1200, David Rowley wrote:
> On 7 April 2018 at 13:31, David Rowley  wrote:
> > Maybe the best solution is to pull those tests out of
> > partition_prune.sql then create partition_prune_hash and just have an
> > alternative .out file with the partitions which match on bigendian
> > machines.
> 
> Here's 1 of 2. I thought it was best to get the buildfarm green again
> as soon as possible.

Do you have an estimate how long it'll take you to produce patch 2? It'd
be cool to get this covered again soon.  If you don't have access to a
big endian machine, we can construct the output from the buildfarm... So
pulling the tests out would be the only "urgent" thing, I can go on from
there.

- Andres

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread David Rowley

On 7 April 2018 at 13:31, David Rowley  wrote:
> Maybe the best solution is to pull those tests out of
> partition_prune.sql then create partition_prune_hash and just have an
> alternative .out file with the partitions which match on bigendian
> machines.

Here's 1 of 2. I thought it was best to get the buildfarm green again
as soon as possible.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


0001-Remove-HASH-partition-pruning-tests.patch
Description: Binary data

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread David Rowley

On 7 April 2018 at 13:50, Amit Langote  wrote:
> On Sat, Apr 7, 2018 at 10:31 AM, David Rowley
>> I looked at all the regression test diffs for each of the servers you
>> mentioned and I verified that the diffs match on each of the 7
>> servers.
>>
>> Maybe the best solution is to pull those tests out of
>> partition_prune.sql then create partition_prune_hash and just have an
>> alternative .out file with the partitions which match on bigendian
>> machines.
>>
>> We could also keep them in the same file, but that's a much bigger
>> alternative file to maintain and more likely to get broken if someone
>> forgets to update it.
>>
>> What do you think?
>
> Yeah, that's an idea.
>
> Is it alright though that same data may end up in different hash
> partitions depending on the architecture?  IIRC, that's the way we
> decided to go when using hash partitioning, but it would've been
> clearer if there was already some evidence in regression tests that
> that's what we've chosen, such as, some existing tests for tuple
> routing.

The only alternative would be to change all the hash functions so that
they normalise their endianness. It does not sound like something that
will perform very well. Plus it would break everyone's hash indexes on
a pg_upgrade.

pg_basebackups can't be transferred over to other architectures
anyway, so I'm not so worried about tuples being routed to other
partitions.

Maybe someone else can see a reason why this is bad?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread Amit Langote

On Sat, Apr 7, 2018 at 10:31 AM, David Rowley
 wrote:
> On 7 April 2018 at 12:43, David Rowley  wrote:
>> On 7 April 2018 at 12:35, Amit Langote  wrote:
>>> So this same failure occurs on (noting the architecture):
>>>
>>> Seems to be due to that the hashing function used in partitioning
>>> gives different answer for a given set of partition key values than
>>> others.
>>
>> They all look like bigendian CPUs.
>
> I looked at all the regression test diffs for each of the servers you
> mentioned and I verified that the diffs match on each of the 7
> servers.
>
> Maybe the best solution is to pull those tests out of
> partition_prune.sql then create partition_prune_hash and just have an
> alternative .out file with the partitions which match on bigendian
> machines.
>
> We could also keep them in the same file, but that's a much bigger
> alternative file to maintain and more likely to get broken if someone
> forgets to update it.
>
> What do you think?

Yeah, that's an idea.

Is it alright though that same data may end up in different hash
partitions depending on the architecture?  IIRC, that's the way we
decided to go when using hash partitioning, but it would've been
clearer if there was already some evidence in regression tests that
that's what we've chosen, such as, some existing tests for tuple
routing.

Thanks,
Amit

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread David Rowley

On 7 April 2018 at 12:43, David Rowley  wrote:
> On 7 April 2018 at 12:35, Amit Langote  wrote:
>> So this same failure occurs on (noting the architecture):
>>
>> Seems to be due to that the hashing function used in partitioning
>> gives different answer for a given set of partition key values than
>> others.
>
> They all look like bigendian CPUs.

I looked at all the regression test diffs for each of the servers you
mentioned and I verified that the diffs match on each of the 7
servers.

Maybe the best solution is to pull those tests out of
partition_prune.sql then create partition_prune_hash and just have an
alternative .out file with the partitions which match on bigendian
machines.

We could also keep them in the same file, but that's a much bigger
alternative file to maintain and more likely to get broken if someone
forgets to update it.

What do you think?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread David Rowley

On 7 April 2018 at 12:35, Amit Langote  wrote:
> Thank you Alvaro for rest of the cleanup and committing.

+10!

> So this same failure occurs on (noting the architecture):
>
> ppc64:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=quokka=2018-04-06%2020%3A09%3A52
>
> ia64:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole=2018-04-06%2022%3A32%3A24
>
> ppc64 (POWER7):
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern=2018-04-06%2022%3A58%3A13
>
> ppc64 (POWER7):
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet=2018-04-06%2023%3A02%3A13
>
> powerpc:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog=2018-04-06%2023%3A05%3A08
>
> powerpc:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust=2018-04-06%2023%3A13%3A23
>
> powerpc 32-bit userspace on ppc64 host:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite=2018-04-06%2023%3A40%3A07
>
> Seems to be due to that the hashing function used in partitioning
> gives different answer for a given set of partition key values than
> others.

They all look like bigendian CPUs.

https://en.wikipedia.org/wiki/Comparison_of_instruction_set_architectures#Endianness

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread Amit Langote

Thank you Alvaro for rest of the cleanup and committing.

On Sat, Apr 7, 2018 at 5:28 AM, Alvaro Herrera  wrote:
> So I pushed this 25 minutes ago, and already there's a couple of
> buildfarm members complaining:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=quokka=2018-04-06%2020%3A09%3A52
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite=2018-04-06%2019%3A55%3A07
>
> Both show exactly the same diff in test partition_prune:
>
> *** 
> /home/pgbuildfarm/buildroot-termite/HEAD/pgsql.build/../pgsql/src/test/regress/expected/partition_prune.out
>  Fri Apr  6 15:55:08 2018
> --- 
> /home/pgbuildfarm/buildroot-termite/HEAD/pgsql.build/src/test/regress/results/partition_prune.out
>Fri Apr  6 16:01:40 2018
> ***
> *** 1348,1357 
>   --++-
>hp0  ||
>hp0  |  1 |
> !  hp0  |  1 | xxx
>hp3  | 10 | yyy
> !  hp1  || xxx
> !  hp2  | 10 | xxx
>   (6 rows)
>
>   -- partial keys won't prune, nor would non-equality conditions
> --- 1348,1357 
>   --++-
>hp0  ||
>hp0  |  1 |
> !  hp0  | 10 | xxx
> !  hp3  || xxx
>hp3  | 10 | yyy
> !  hp2  |  1 | xxx
>   (6 rows)
>
>   -- partial keys won't prune, nor would non-equality conditions
> ***
> *** 1460,1466 
>  QUERY PLAN
>   -
>Append
> !->  Seq Scan on hp0
>Filter: ((a = 1) AND (b = 'xxx'::text))
>   (3 rows)
>
> --- 1460,1466 
>  QUERY PLAN
>   -
>Append
> !->  Seq Scan on hp2
>Filter: ((a = 1) AND (b = 'xxx'::text))
>   (3 rows)
>
> ***
> *** 1468,1474 
>QUERY PLAN
>   -
>Append
> !->  Seq Scan on hp1
>Filter: ((a IS NULL) AND (b = 'xxx'::text))
>   (3 rows)
>
> --- 1468,1474 
>QUERY PLAN
>   -
>Append
> !->  Seq Scan on hp3
>Filter: ((a IS NULL) AND (b = 'xxx'::text))
>   (3 rows)
>
> ***
> *** 1476,1482 
>   QUERY PLAN
>   --
>Append
> !->  Seq Scan on hp2
>Filter: ((a = 10) AND (b = 'xxx'::text))
>   (3 rows)
>
> --- 1476,1482 
>   QUERY PLAN
>   --
>Append
> !->  Seq Scan on hp0
>Filter: ((a = 10) AND (b = 'xxx'::text))
>   (3 rows)
>
> ***
> *** 1494,1504 
>Append
>  ->  Seq Scan on hp0
>Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 
> 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
> -->  Seq Scan on hp2
> -  Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 
> 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
>  ->  Seq Scan on hp3
>Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 
> 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
> ! (7 rows)
>
>   -- hash partitiong pruning doesn't occur with <> operator clauses
>   explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
> --- 1494,1502 
>Append
>  ->  Seq Scan on hp0
>Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 
> 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
>  ->  Seq Scan on hp3
>Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 
> 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
> ! (5 rows)

So this same failure occurs on (noting the architecture):

ppc64:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=quokka=2018-04-06%2020%3A09%3A52

ia64:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole=2018-04-06%2022%3A32%3A24

ppc64 (POWER7):
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern=2018-04-06%2022%3A58%3A13

ppc64 (POWER7):
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet=2018-04-06%2023%3A02%3A13

powerpc:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog=2018-04-06%2023%3A05%3A08

powerpc:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust=2018-04-06%2023%3A13%3A23

powerpc 32-bit userspace on ppc64 host:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite=2018-04-06%2023%3A40%3A07

Seems to be due to that the hashing function used in partitioning
gives different answer for a given set of partition key values than
others.

Thanks,
Amit

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread Alvaro Herrera

Robert Haas wrote:
> On Fri, Apr 6, 2018 at 8:24 AM, Alvaro Herrera  
> wrote:
> > I don't actually like very much the idea of putting all this code in
> > optimizer/util.  This morning it occurred to me that we should create a new
> > src/backend/partitioning/ (and a src/include/partitioning/ to go with
> > it) and drop a bunch of files there.  Even your proposed new partcache.c
> > will seem misplaced *anywhere*, since it contains support code to be
> > used by both planner and executor; in src/{backend,include}/partitioning
> > it will be able to serve both without it being a modularity wart.
> 
> Uh, what?
> 
> Surely partcache.c is correctly placed next to relcache.c and
> syscache.c and everything else in src/backend/utils/cache.

Frankly, I'm not real sure about partcache.c yet.  Are you?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread Robert Haas

On Fri, Apr 6, 2018 at 8:24 AM, Alvaro Herrera  wrote:
> I don't actually like very much the idea of putting all this code in
> optimizer/util.  This morning it occurred to me that we should create a new
> src/backend/partitioning/ (and a src/include/partitioning/ to go with
> it) and drop a bunch of files there.  Even your proposed new partcache.c
> will seem misplaced *anywhere*, since it contains support code to be
> used by both planner and executor; in src/{backend,include}/partitioning
> it will be able to serve both without it being a modularity wart.

Uh, what?

Surely partcache.c is correctly placed next to relcache.c and
syscache.c and everything else in src/backend/utils/cache.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread Andres Freund

On 2018-04-06 17:28:00 -0300, Alvaro Herrera wrote:
> So I pushed this 25 minutes ago, and already there's a couple of
> buildfarm members complaining:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=quokka=2018-04-06%2020%3A09%3A52
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite=2018-04-06%2019%3A55%3A07
> 
> Both show exactly the same diff in test partition_prune:

There's also
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros=2018-04-06%2020%3A45%3A01
*** 
/opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/contrib/sepgsql/expected/misc.out
   2018-02-20 18:45:02.068665297 -0800
--- 
/opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/contrib/sepgsql/results/misc.out
2018-04-06 13:55:50.718253850 -0700
***
*** 32,40 
  (6 rows)
  
  SELECT * FROM t1p WHERE o > 50 AND p like '%64%';
- LOG:  SELinux: allowed { execute } 
scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 
tcontext=system_u:object_r:sepgsql_proc_exec_t:s0 tclass=db_procedure 
name="pg_catalog.int4le(integer,integer)"
- LOG:  SELinux: allowed { execute } 
scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 
tcontext=system_u:object_r:sepgsql_proc_exec_t:s0 tclass=db_procedure 
name="pg_catalog.int4le(integer,integer)"
- LOG:  SELinux: allowed { execute } 
scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 
tcontext=system_u:object_r:sepgsql_proc_exec_t:s0 tclass=db_procedure 
name="pg_catalog.int4le(integer,integer)"
  LOG:  SELinux: allowed { select } 
scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 
tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_table 
name="public.t1p"
  LOG:  SELinux: allowed { select } 
scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 
tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column name="table 
t1p column o"
  LOG:  SELinux: allowed { select } 
scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 
tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column name="table 
t1p column p"
--- 32,37 

seems you just need to remove those rows from the expected file.

- Andres

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread Alvaro Herrera

So I pushed this 25 minutes ago, and already there's a couple of
buildfarm members complaining:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=quokka=2018-04-06%2020%3A09%3A52
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite=2018-04-06%2019%3A55%3A07

Both show exactly the same diff in test partition_prune:

*** 
/home/pgbuildfarm/buildroot-termite/HEAD/pgsql.build/../pgsql/src/test/regress/expected/partition_prune.out
 Fri Apr  6 15:55:08 2018
--- 
/home/pgbuildfarm/buildroot-termite/HEAD/pgsql.build/src/test/regress/results/partition_prune.out
   Fri Apr  6 16:01:40 2018
***
*** 1348,1357 
  --++-
   hp0  || 
   hp0  |  1 | 
!  hp0  |  1 | xxx
   hp3  | 10 | yyy
!  hp1  || xxx
!  hp2  | 10 | xxx
  (6 rows)
  
  -- partial keys won't prune, nor would non-equality conditions
--- 1348,1357 
  --++-
   hp0  || 
   hp0  |  1 | 
!  hp0  | 10 | xxx
!  hp3  || xxx
   hp3  | 10 | yyy
!  hp2  |  1 | xxx
  (6 rows)
  
  -- partial keys won't prune, nor would non-equality conditions
***
*** 1460,1466 
 QUERY PLAN
  -
   Append
!->  Seq Scan on hp0
   Filter: ((a = 1) AND (b = 'xxx'::text))
  (3 rows)
  
--- 1460,1466 
 QUERY PLAN
  -
   Append
!->  Seq Scan on hp2
   Filter: ((a = 1) AND (b = 'xxx'::text))
  (3 rows)
  
***
*** 1468,1474 
   QUERY PLAN  
  -
   Append
!->  Seq Scan on hp1
   Filter: ((a IS NULL) AND (b = 'xxx'::text))
  (3 rows)
  
--- 1468,1474 
   QUERY PLAN  
  -
   Append
!->  Seq Scan on hp3
   Filter: ((a IS NULL) AND (b = 'xxx'::text))
  (3 rows)
  
***
*** 1476,1482 
  QUERY PLAN
  --
   Append
!->  Seq Scan on hp2
   Filter: ((a = 10) AND (b = 'xxx'::text))
  (3 rows)
  
--- 1476,1482 
  QUERY PLAN
  --
   Append
!->  Seq Scan on hp0
   Filter: ((a = 10) AND (b = 'xxx'::text))
  (3 rows)
  
***
*** 1494,1504 
   Append
 ->  Seq Scan on hp0
   Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 
'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
-->  Seq Scan on hp2
-  Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 
'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
 ->  Seq Scan on hp3
   Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 
'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
! (7 rows)
  
  -- hash partitiong pruning doesn't occur with <> operator clauses
  explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
--- 1494,1502 
   Append
 ->  Seq Scan on hp0
   Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 
'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
 ->  Seq Scan on hp3
   Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 
'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
! (5 rows)
  
  -- hash partitiong pruning doesn't occur with <> operator clauses
  explain (costs off) select * from hp where a <> 1 and b <> 'xxx';


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread Jesper Pedersen


Hi Alvaro,

On 04/06/2018 12:41 PM, Alvaro Herrera wrote:

Here's my proposed patch.

Idle thought: how about renaming the "constfalse" argument and variables
to "contradictory" or maybe just "contradict"?



Passes check-world.

New directories, and variable rename seems like a good idea; either is ok.

Best regards,
 Jesper

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread Amit Langote

On Sat, Apr 7, 2018 at 1:41 AM, Alvaro Herrera  wrote:
> Here's my proposed patch.
>
> Idle thought: how about renaming the "constfalse" argument and variables
> to "contradictory" or maybe just "contradict"?

Sounds fine to me.

Thanks,
Amit

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread Alvaro Herrera

Amit Langote wrote:

> Some comments on the code reorganizing part of the patch:
> 
> * Did you intentionally not put PartitionBoundInfoData and its accessor
> macros in partition_internal.h.  partprune.c would not need to include
> partition.h if we do that.

Not really.

After pondering this some more, I decided to call the new file
src/include/partition/partbounds.h; and the other new file will become
src/include/partition/partprune.h.  This leads naturally to the idea
that PartitionBoundInfoData will be in partbounds.h.  However, the
typedef struct PartitionBoundInfoData *PartitionBoundInfo will have to
remain in catalog/partition.h, at least for the time being.

> * Also, I wonder why you left PartitionPruneContext in partition.h.  Isn't
> it better taken out to partprune.h?

Yes.

> * Why isn't gen_partprune_steps() in partprune.h?  I see only
> prune_append_rel_partitions() exported out of partprune.c, but the runtime
> patch needs gen_partprune_steps() to be called from createplan.c.
> * I don't see get_matching_partitions() exported either.  Runtime pruning
> patch needs that too.

True -- both exported.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread Amit Langote

On Fri, Apr 6, 2018 at 11:38 PM, Alvaro Herrera  wrote:
> Yeah, there is one sentence there I didn't quite understand and would
> like to add it to the rewritten version of the comment before I remove
> the whole ifdeffed-out comment.
>
>  *  PARTCLAUSE_MATCH_STEPS: *clause_steps set to list of "partition 
> pruning
>  *  step(s)" generated for the clause due to it being a BoolExpr or a
>  *  ScalarArrayOpExpr that's turned into one
>
> Exactly what does "ScalarArrayOpExpr that's turned into one" means?
> Does it mean we turn SAOP into BoolExpr?

Yes, we turn a ScalarArrayOpExpr into a BoolExpr and generate prune
step for the latter.  Maybe we'll have a base pruning step that can
process a ScalarArrayOpExpr directly someday.  We create base steps
only for OpExpr's for now.

> If you look at the rest of the rewritten comment, you'll notice some
> things probably need more explaining.  Wording suggestions welcome.

When I looked at it earlier today, I thought your rewrite looked much better.

Thanks,
Amit

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread Amit Langote

On Fri, Apr 6, 2018 at 11:54 PM, Alvaro Herrera  wrote:
> Alvaro Herrera wrote:
>
>> Yeah.  Looking at this function, I noticed it tests for BooleanTest, and
>> falls back to checking "not_clause" and a few equals.  Does it make
>> sense if the clause is a SAOP?  I added this assert:
>>   Assert(IsA(clause, BooleanTest) ||
>>  IsA(clause, BoolExpr) ||
>>  IsA(clause, RelabelType));
>>
>> and it failed:
>> #3  0x556cf04505db in match_boolean_partition_clause (partopfamily=424,
>> clause=0x556cf1041670, partkey=0x556cf1042218, rightop=0x7ffe520ec068)
>> at /pgsql/source/master/src/backend/optimizer/util/partprune.c:2159
>> 2159  Assert(IsA(clause, BooleanTest) ||
>> (gdb) print *clause
>> $1 = {type = T_ScalarArrayOpExpr}
>>
>> I'm not sure whether or not this function can trust that what's incoming
>> must absolutely be only those node types.
>
> So this is what I need for current regression tests not to crash
> anymore:
>
> Assert(IsA(clause, BooleanTest) ||
>IsA(clause, BoolExpr) ||
>IsA(clause, RelabelType) ||
>IsA(clause, ScalarArrayOpExpr) ||
>IsA(clause, OpExpr) ||
>IsA(clause, Var));
>
> I'm not confident in my ability to write code to handle all possible
> cases right now (obviously there must be more cases that are not covered
> by current regression tests), so I'll leave it without the assert since
> it handles a couple of the useful cases, but I suspect it could stand
> some more improvement.
>
> I guess the question is, how interesting is boolean partitioning?  I bet
> it has its uses.

match_boolean_partition_clauses() exists to capture some cases where
an OpExpr (any expression that returns a Boolean for that matter)
itself is the partition key:

create table boolpart (a int) partition by list ((a = 1));
create table boolpart_t partition of boolpart for values in ('true');
create table boolpart_f partition of boolpart for values in ('false');
explain select * from boolpart where a = 1;
QUERY PLAN
--
 Append  (cost=0.00..41.88 rows=13 width=4)
   ->  Seq Scan on boolpart_t  (cost=0.00..41.88 rows=13 width=4)
 Filter: (a = 1)
(3 rows)

explain select * from boolpart where a = 2;
QUERY PLAN
--
 Append  (cost=0.00..41.88 rows=13 width=4)
   ->  Seq Scan on boolpart_f  (cost=0.00..41.88 rows=13 width=4)
 Filter: (a = 2)
(3 rows)

So, it's not that we're only in position to accept certain node types
in match_boolean_partition_clauses().  Before it existed, the pruning
didn't work because it wasn't matched to the partition key in the
special way that match_boolean_partition_clauses() does and end up in
the block in match_clause_to_partition_key() where the OpExpr's are
analyzed for normal (non-Boolean) situations, where we extract either
the leftop or rightop and try to match it with the partition key.

It might as well be:

create table boolpart (a int) partition by list ((a in (1, 2)));

Requiring us to be position to match an ScalarArrayOpExpr with the
partition key.

This resembles match_boolean_index_clause(), by the way.

Thanks,
Amit

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread Alvaro Herrera

BTW, having both key_is_null and key_is_not_null output args to convey a
single bit of info is a bit lame.  I'm removing it.  We could do the
same with a single boolean, since the return value already indicates
it's a matching IS [NOT] NULL clause; we only need to indicate whether
the NOT is present.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-06 Thread Alvaro Herrera

David Rowley wrote:

> 2. I guess this will be removed before commit.
> 
> +#if 0
> 
> +#endif

Yeah, there is one sentence there I didn't quite understand and would
like to add it to the rewritten version of the comment before I remove
the whole ifdeffed-out comment.

 *  PARTCLAUSE_MATCH_STEPS: *clause_steps set to list of "partition pruning
 *  step(s)" generated for the clause due to it being a BoolExpr or a
 *  ScalarArrayOpExpr that's turned into one

Exactly what does "ScalarArrayOpExpr that's turned into one" means?
Does it mean we turn SAOP into BoolExpr?

(Yes, I know "#if 0" inside a comment doesn't do anything.  It's only
documentation for myself.)

If you look at the rest of the rewritten comment, you'll notice some
things probably need more explaining.  Wording suggestions welcome.

> 3. This comment seems like a strange thing to write just before
> testing if the clause matches the partition key.
> 
> + /* Clause does not match this partition key. */
> + if (equal(leftop, partkey))
> + *rightop = not_clause((Node *) clause)
> + ? (Expr *) makeBoolConst(false, false)
> + : (Expr *) makeBoolConst(true, false);

Yeah.  Looking at this function, I noticed it tests for BooleanTest, and
falls back to checking "not_clause" and a few equals.  Does it make
sense if the clause is a SAOP?  I added this assert:
Assert(IsA(clause, BooleanTest) ||
   IsA(clause, BoolExpr) ||
   IsA(clause, RelabelType));

and it failed:
#3  0x556cf04505db in match_boolean_partition_clause (partopfamily=424, 
clause=0x556cf1041670, partkey=0x556cf1042218, rightop=0x7ffe520ec068)
at /pgsql/source/master/src/backend/optimizer/util/partprune.c:2159
2159Assert(IsA(clause, BooleanTest) ||
(gdb) print *clause
$1 = {type = T_ScalarArrayOpExpr}

I'm not sure whether or not this function can trust that what's incoming
must absolutely be only those node types.

> 4. Comment needs removed.
> 
> + * has_default_part - Whether the table has a default partition

Done.

> The only other thing I noted on this pass is that we could get rid of:
> 
> + /* go check the next clause. */
> + if (unsupported_clause)
> + break;
> 
> and just "continue" instead of "break" in all cases apart from case
> PARTCLAUSE_UNSUPPORTED:
> 
> it would save a few lines and a single condition. What's there works,
> but thought this might be better...

Makes sense -- looking.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-05 Thread Amit Langote

Hi.

On 2018/04/06 7:35, Alvaro Herrera wrote:
> I seems pretty clear that putting get_matching_partitions() in
> catalog/partition.c is totally the wrong thing; it belongs wholly in
> partprune. I think the reason you put it there is that it requires
> access to a lot of internals that are static in partition.c.  In the
> attached not yet cleaned version of the patch, I have moved a whole lot
> of what you added to partition.c to partprune.c; and for the functions
> and struct declarations that were required to make it work, I created
> catalog/partition_internal.h.

Yes, I really wanted for most of the new code that this patch adds to land
in the planner, especially after Robert's comments here:

https://www.postgresql.org/message-id/CA%2BTgmoabi-29Vs8H0xkjtYB%3DcU%2BGVCrNwPz7okpa3KsoLmdEUQ%40mail.gmail.com

It would've been nice if we'd gotten the "reorganizing partitioning code"
thread resolved sooner.

> I changed a lot of code also, but cosmetic changes only.
> 
> I'll clean this up a bit more now, and try to commit shortly (or early
> tomorrow); wanted to share current status now in case I have to rush
> out.

Some comments on the code reorganizing part of the patch:

* Did you intentionally not put PartitionBoundInfoData and its accessor
macros in partition_internal.h.  partprune.c would not need to include
partition.h if we do that.

* Also, I wonder why you left PartitionPruneContext in partition.h.  Isn't
it better taken out to partprune.h?

I have done that in the attached.

* Why isn't gen_partprune_steps() in partprune.h?  I see only
prune_append_rel_partitions() exported out of partprune.c, but the runtime
patch needs gen_partprune_steps() to be called from createplan.c.

* I don't see get_matching_partitions() exported either.  Runtime pruning
patch needs that too.

Maybe you've thought something about these two items though.

Thanks,
Amit
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index d5e91b111d..f89c99f544 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -24,7 +24,6 @@
 #include "catalog/indexing.h"
 #include "catalog/objectaddress.h"
 #include "catalog/partition.h"
-#include "catalog/partition_internal.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_inherits_fn.h"
diff --git a/src/backend/optimizer/util/partprune.c 
b/src/backend/optimizer/util/partprune.c
index b0638d5aa6..93553b5d13 100644
--- a/src/backend/optimizer/util/partprune.c
+++ b/src/backend/optimizer/util/partprune.c
@@ -19,8 +19,6 @@
 
 #include "access/hash.h"
 #include "access/nbtree.h"
-#include "catalog/partition.h"
-#include "catalog/partition_internal.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_type.h"
@@ -35,6 +33,7 @@
 #include "parser/parse_coerce.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
+#include "utils/array.h"
 #include "utils/lsyscache.h"
 
 /*
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 0bcaa36165..1c17553917 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -13,7 +13,7 @@
 #ifndef PARTITION_H
 #define PARTITION_H
 
-#include "fmgr.h"
+#include "catalog/partition_internal.h"
 #include "executor/tuptable.h"
 #include "nodes/execnodes.h"
 #include "parser/parse_node.h"
@@ -23,65 +23,6 @@
 #define HASH_PARTITION_SEED UINT64CONST(0x7A5B22367996DCFD)
 
 /*
- * PartitionBoundInfoData encapsulates a set of partition bounds. It is
- * usually associated with partitioned tables as part of its partition
- * descriptor, but may also be used to represent a virtual partitioned
- * table such as a partitioned joinrel within the planner.
- *
- * A list partition datum that is known to be NULL is never put into the
- * datums array. Instead, it is tracked using the null_index field.
- *
- * In the case of range partitioning, ndatums will typically be far less than
- * 2 * nparts, because a partition's upper bound and the next partition's lower
- * bound are the same in most common cases, and we only store one of them (the
- * upper bound).  In case of hash partitioning, ndatums will be same as the
- * number of partitions.
- *
- * For range and list partitioned tables, datums is an array of datum-tuples
- * with key->partnatts datums each.  For hash partitioned tables, it is an 
array
- * of datum-tuples with 2 datums, modulus and remainder, corresponding to a
- * given partition.
- *
- * The datums in datums array are arranged in increasing order as defined by
- * functions qsort_partition_rbound_cmp(), qsort_partition_list_value_cmp() and
- * qsort_partition_hbound_cmp() for range, list and hash partitioned tables
- * respectively. For range and list partitions this simply means that the
- * datums in the datums array are arranged in increasing order as defined by
- * the partition key's operator classes and collations.
- *
- * In the case of list

Re: [HACKERS] path toward faster partition pruning

2018-04-05 Thread David Rowley

On 6 April 2018 at 12:02, David Rowley  wrote:
> On 6 April 2018 at 10:35, Alvaro Herrera  wrote:
> The only other thing I noted on this pass is that we could get rid of:
>
> + /* go check the next clause. */
> + if (unsupported_clause)
> + break;
>
> and just "continue" instead of "break" in all cases apart from case
> PARTCLAUSE_UNSUPPORTED:

I should have said remove:

+ if (unsupported_clause)

The "break" would still be required.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-05 Thread David Rowley

On 6 April 2018 at 10:35, Alvaro Herrera  wrote:
> I changed a lot of code also, but cosmetic changes only.
>
> I'll clean this up a bit more now, and try to commit shortly (or early
> tomorrow); wanted to share current status now in case I have to rush
> out.

I made a complete pass over the patch you sent. I only noted down the
following few things:

1.

+ * off < 0, meaning the look-up value is smaller that all bounds,

that -> than

2. I guess this will be removed before commit.

+#if 0

+#endif

3. This comment seems like a strange thing to write just before
testing if the clause matches the partition key.

+ /* Clause does not match this partition key. */
+ if (equal(leftop, partkey))
+ *rightop = not_clause((Node *) clause)
+ ? (Expr *) makeBoolConst(false, false)
+ : (Expr *) makeBoolConst(true, false);

4. Comment needs removed.

+ * has_default_part - Whether the table has a default partition

The only other thing I noted on this pass is that we could get rid of:

+ /* go check the next clause. */
+ if (unsupported_clause)
+ break;

and just "continue" instead of "break" in all cases apart from case
PARTCLAUSE_UNSUPPORTED:

it would save a few lines and a single condition. What's there works,
but thought this might be better...

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-05 Thread Alvaro Herrera

> @@ -1717,8 +1691,8 @@ expand_partitioned_rtentry(PlannerInfo *root, 
> RangeTblEntry *parentrte,
>* parentrte already has the root partrel's updatedCols translated to 
> match
>* the attribute ordering of parentrel.
>*/
> - if (!*part_cols_updated)
> - *part_cols_updated =
> + if (!root->partColsUpdated)
> + root->partColsUpdated =
>   has_partition_attrs(parentrel, parentrte->updatedCols, 
> NULL);

Hmm, surely this should be |= to avoid resetting a value set in a
previous call to this function?  In the previous coding it wasn't
necessary because it was a local variable ...  (though, isn't it a bit
odd to have this in PlannerInfo?  seems like it should be in
resultRelInfo, but then you already have it there so I suppose this one
does *more*)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-05 Thread Alvaro Herrera

Amit Langote wrote:

> >> 1. Still not sure about RelOptInfo->has_default_part. This flag is
> >> only looked at in generate_partition_pruning_steps. The RelOptInfo and
> >> the boundinfo is available to look at, it's just that the
> >> partition_bound_has_default macro is defined in partition.c rather
> >> than partition.h.
> 
> Hmm, it might not be such a bad idea to bring out the
> PartitionBoundInfoData into partition.h.  If we do that, we won't need the
> has_default_part that the patch adds to RelOptInfo.
> 
> In the Attached v50 set, 0002 does that.

After looking at this for a moment, I again come to the conclusion that
the overall layout of partitioning code and definitions is terrible.
But we already know that, and there's a patch in commitfest to improve
things.  So my intention right now is to hold my nose and get this
pushed; we'll fix it afterwards.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-05 Thread Amit Langote

Hi.

On 2018/04/05 0:45, Jesper Pedersen wrote:
> Hi,
> 
> On 04/04/2018 09:29 AM, David Rowley wrote:
>> Thanks for updating. I've made a pass over v49 and I didn't find very
>> much wrong with it.
>>
>> The only real bug I found was a missing IsA(rinfo->clause, Const) in
>> the pseudoconstant check inside
>> generate_partition_pruning_steps_internal.

Fixed.

>> Most of the changes are comment fixes with a few stylistic changes
>> thrown which are pretty much all there just to try to shrink the code
>> a line or two or reduce indentation.
>>
>> I feel pretty familiar with this code now and assuming the attached is
>> included I'm happy for someone else, hopefully, a committer to take a
>> look at it.

Thank you, your changes look good to me.

>> I'll leave the following notes:
>>
>> 1. Still not sure about RelOptInfo->has_default_part. This flag is
>> only looked at in generate_partition_pruning_steps. The RelOptInfo and
>> the boundinfo is available to look at, it's just that the
>> partition_bound_has_default macro is defined in partition.c rather
>> than partition.h.

Hmm, it might not be such a bad idea to bring out the
PartitionBoundInfoData into partition.h.  If we do that, we won't need the
has_default_part that the patch adds to RelOptInfo.

In the Attached v50 set, 0002 does that.

>> 2. Don't really like the new isopne variable name. It's not very
>> simple to decode, perhaps something like is_not_eq is better?

isopne does sound a bit unintelligible.  I propose op_is_ne so that it
sounds consistent with the preceding member of the struct that's called
opno.  I want to keep "ne" and not start calling it not_eq, as a few other
places use the string "ne" to refer to a similar thing, like:

/* inequality */
Datum
range_ne(PG_FUNCTION_ARGS)

Datum
timestamptz_ne_date(PG_FUNCTION_ARGS)

Since the field is local to partprune.c, I guess that it's fine as the
comment where it's defined tells what it is.

>> 3. The part of the code I'm least familiar with is
>> get_steps_using_prefix_recurse(). I admit to not having had time to
>> fully understand that and consider ways to break it.

The purpose of that code is to generate *all* needed steps to be combined
using COMBINE_INTERSECT such that the pruning will occur using the most
restrictive set of clauses in cases where the same key is referenced in
multiple restriction clauses containing non-equality operators.  So, for a
range partitioned table on (a, b):

For a query like

explain select * from foo a <= 1 and a <= 3 and b < 5 and b <= 10

Pruning steps generated to be combined with an enclosing INTERSECT step
will be as follows:

<= (1, 10)
<  (1, 5)
<= (3, 10)
<  (3, 5)

>> Marking as ready for committer.

Thank you!

> Passes check-world, and CommitFest app has been updated to reflect the
> current patch set. Trivial changes attached.

Merged these changes.  Thanks again Jesper.

Attached v50.

Thanks,
Amit

Re: [HACKERS] path toward faster partition pruning

2018-04-04 Thread Jesper Pedersen


Hi,

On 04/04/2018 09:29 AM, David Rowley wrote:

Thanks for updating. I've made a pass over v49 and I didn't find very
much wrong with it.

The only real bug I found was a missing IsA(rinfo->clause, Const) in
the pseudoconstant check inside
generate_partition_pruning_steps_internal.

Most of the changes are comment fixes with a few stylistic changes
thrown which are pretty much all there just to try to shrink the code
a line or two or reduce indentation.

I feel pretty familiar with this code now and assuming the attached is
included I'm happy for someone else, hopefully, a committer to take a
look at it.

I'll leave the following notes:

1. Still not sure about RelOptInfo->has_default_part. This flag is
only looked at in generate_partition_pruning_steps. The RelOptInfo and
the boundinfo is available to look at, it's just that the
partition_bound_has_default macro is defined in partition.c rather
than partition.h.

2. Don't really like the new isopne variable name. It's not very
simple to decode, perhaps something like is_not_eq is better?

3. The part of the code I'm least familiar with is
get_steps_using_prefix_recurse(). I admit to not having had time to
fully understand that and consider ways to break it.

Marking as ready for committer.



Passes check-world, and CommitFest app has been updated to reflect the 
current patch set. Trivial changes attached.


Best regards,
 Jesper
>From 82f718579dc8e06ab77d76df4ed72f0f03ed4a4e Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Wed, 4 Apr 2018 11:27:59 -0400
Subject: [PATCH] Trivial changes

---
 src/backend/catalog/partition.c| 10 +-
 src/backend/optimizer/util/partprune.c |  8 
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index d6bce9f348..7a268e05dc 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -146,7 +146,7 @@ typedef struct PruneStepResult
 {
 	/*
 	 * This contains the offsets of the bounds in a table's boundinfo, each of
-	 * which is a bound whose corresponding partition is selected by a a given
+	 * which is a bound whose corresponding partition is selected by a given
 	 * pruning step.
 	 */
 	Bitmapset  *bound_offsets;
@@ -2026,7 +2026,7 @@ get_matching_hash_bounds(PartitionPruneContext *context,
 		 int opstrategy, Datum *values, int nvalues,
 		 FmgrInfo *partsupfunc, Bitmapset *nullkeys)
 {
-	PruneStepResult *result = palloc0(sizeof(PruneStepResult));
+	PruneStepResult *result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
 	PartitionBoundInfo boundinfo = context->boundinfo;
 	int		   *partindices = boundinfo->indexes;
 	int			partnatts = context->partnatts;
@@ -2093,7 +2093,7 @@ get_matching_list_bounds(PartitionPruneContext *context,
 		 int opstrategy, Datum value, int nvalues,
 		 FmgrInfo *partsupfunc, Bitmapset *nullkeys)
 {
-	PruneStepResult *result = palloc0(sizeof(PruneStepResult));
+	PruneStepResult *result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
 	PartitionBoundInfo boundinfo = context->boundinfo;
 	int			off,
 minoff,
@@ -2147,7 +2147,7 @@ get_matching_list_bounds(PartitionPruneContext *context,
 		return result;
 	}
 
-	/* Speical case handling of values coming from a <> operator clause. */
+	/* Special case handling of values coming from a <> operator clause. */
 	if (opstrategy == InvalidStrategy)
 	{
 		/*
@@ -2295,7 +2295,7 @@ get_matching_range_bounds(PartitionPruneContext *context,
 		  int opstrategy, Datum *values, int nvalues,
 		  FmgrInfo *partsupfunc, Bitmapset *nullkeys)
 {
-	PruneStepResult *result = palloc0(sizeof(PruneStepResult));
+	PruneStepResult *result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
 	PartitionBoundInfo boundinfo = context->boundinfo;
 	Oid		   *partcollation = context->partcollation;
 	int			partnatts = context->partnatts;
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
index 75b7232f5d..2d06c1a519 100644
--- a/src/backend/optimizer/util/partprune.c
+++ b/src/backend/optimizer/util/partprune.c
@@ -433,8 +433,8 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel,
 			}
 
 			/*
-			 * Fall-through for a NOT clause, which if it's a Boolean clause
-			 * clause, will be handled in match_clause_to_partition_key(). We
+			 * Fall-through for a NOT clause, which if it's a Boolean clause,
+			 * will be handled in match_clause_to_partition_key(). We
 			 * currently don't perform any pruning for more complex NOT
 			 * clauses.
 			 */
@@ -665,7 +665,7 @@ match_clause_to_partition_key(RelOptInfo *rel,
 	 */
 	if (match_boolean_partition_clause(partopfamily, clause, partkey, ))
 	{
-		*pc = palloc(sizeof(PartClauseInfo));
+		*pc = (PartClauseInfo *) palloc(sizeof(PartClauseInfo));
 		(*pc)->keyno = partkeyidx;
 		/* Do pruning with the Boolean equality operator. */
 		(*pc)->opno = BooleanEqualOperator;
@@

Re: [HACKERS] path toward faster partition pruning

2018-04-04 Thread David Rowley

On 4 April 2018 at 19:04, Amit Langote  wrote:
> On 2018/04/04 14:42, Amit Langote wrote:
>> Attached v48.
>
> I had forgotten to remove the static_pruning parameter I had added in the
> v47, because it is no longer used.  Static pruning now occurs even if a
> step contains all Params, in which case each of
> get_matching_hash/list/range_bounds() functions returns offsets of all
> non-null datums, because the Params cannot be resolved to actual values
> during static pruning.

Thanks for updating. I've made a pass over v49 and I didn't find very
much wrong with it.

The only real bug I found was a missing IsA(rinfo->clause, Const) in
the pseudoconstant check inside
generate_partition_pruning_steps_internal.

Most of the changes are comment fixes with a few stylistic changes
thrown which are pretty much all there just to try to shrink the code
a line or two or reduce indentation.

I feel pretty familiar with this code now and assuming the attached is
included I'm happy for someone else, hopefully, a committer to take a
look at it.

I'll leave the following notes:

1. Still not sure about RelOptInfo->has_default_part. This flag is
only looked at in generate_partition_pruning_steps. The RelOptInfo and
the boundinfo is available to look at, it's just that the
partition_bound_has_default macro is defined in partition.c rather
than partition.h.

2. Don't really like the new isopne variable name. It's not very
simple to decode, perhaps something like is_not_eq is better?

3. The part of the code I'm least familiar with is
get_steps_using_prefix_recurse(). I admit to not having had time to
fully understand that and consider ways to break it.

Marking as ready for committer.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

v49_fixes_drowley.patch
Description: Binary data

Re: [HACKERS] path toward faster partition pruning

2018-04-04 Thread David Rowley

On 4 April 2018 at 19:04, Amit Langote  wrote:
> Attached v49.

Thank for including the changes. I'll look now.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-04 Thread David Rowley

On 4 April 2018 at 17:42, Amit Langote  wrote:
> I'm not sure about the following change in your patch:
>
> -   if (!result->scan_null)
> -   result->scan_null = step_result->scan_null;
> -   if (!result->scan_default)
> -   result->scan_default = step_result->scan_default;
> +   result->scan_null |= step_result->scan_null;
> +   result->scan_default |= step_result->scan_default;
>
> Afaik, |= does bitwise OR, which even if it might give the result we want,
> is not a logical operation.  I had written the original code using the
> following definition of logical OR.
>
>   a OR b = if a then true else b

Ok, no problem. I only changed that to make it more compact.

For the record we do the same in plenty of over places over the code base:

E.g.

parse->hasSubLinks |= subquery->hasSubLinks;

/* If subquery had any RLS conditions, now main query does too */
parse->hasRowSecurity |= subquery->hasRowSecurity;


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-03 Thread David Rowley

On 4 April 2018 at 16:00, Tom Lane  wrote:
> David Rowley  writes:
>> It's true that the const simplification code will generally rewrite
>> most NOT(clause) to use the negator operator, but if the operator does
>> not have a negator it can't do this.
>> ...
>> At the moment pruning does not work for this case at all. Perhaps it should?
>
> It's hard to see why we'd expend extra effort to optimize such situations.
> The right answer would invariably be to fix the inadequate operator
> definition, because missing the negator link would hobble many other
> cases besides this.
>
> Now if you can show a case where the extra smarts would be useful
> without presuming a badly-written opclass, it's a different matter.

Okay, well that certainly sounds like less work.

In that case, the comment which claims we handle the NOT clauses needs
to be updated to mention that we only handle boolean NOT clauses and
don't optimize the remainder.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-03 Thread Tom Lane

David Rowley  writes:
> It's true that the const simplification code will generally rewrite
> most NOT(clause) to use the negator operator, but if the operator does
> not have a negator it can't do this.
> ...
> At the moment pruning does not work for this case at all. Perhaps it should?

It's hard to see why we'd expend extra effort to optimize such situations.
The right answer would invariably be to fix the inadequate operator
definition, because missing the negator link would hobble many other
cases besides this.

Now if you can show a case where the extra smarts would be useful
without presuming a badly-written opclass, it's a different matter.

regards, tom lane

Re: [HACKERS] path toward faster partition pruning

2018-04-03 Thread David Rowley

On 4 April 2018 at 11:22, David Rowley  wrote:
> On 4 April 2018 at 09:47, David Rowley  wrote:
>> I think it would be better to just have special handling in
>> get_matching_list_bound so that it knows it's performing <>
>> elimination. I'd thought about passing some other opstrategy but the
>> only safe one I thought to use was InvalidStrategy, which is already
>> used by NULL handling.
>
> I'm currently working up a patch to do this the way I think is best.
>
> I'll submit it soon and we can review and get your thoughts on it.

I've attached a rough cut version of what I think is a good solution
for this. It's based on v46, not your latest v47, sorry.

This makes get_matching_list_bounds() aware that it's performing the
not-equal pruning via the opstrategy which allows it to not return all
partitions when there are no values in this case. Instead, we return
the NULL partition, so that we later invert that and return everything
apart from the NULL partition. A strict clause will allow us that
much, even if we can't get the actual value being compared to, at the
time.

There's also a bunch of other changes in there:

1. Adding missing step_id in copyfuncs.c
2. Simplified including the default partition in a bunch of cases.
3. Made it so scan_default and scan_null are only ever set to true if
there's a partition for that.
4. Changed get_matching_*_bounds to return the entire result struct
instead of the Bitmapset and pass the remaining bool values back
through params. I didn't really like how you'd change this to pass all
the bool flags back as params. There's a perfectly good struct there
to provide the entire result in a single return value. I know you've
disagreed with this already, so would be nice to get a 3rd opinion.
5. Rename get_matching_hash_bound to get_matching_hash_bounds. The
LIST and RANGE version of this function both had a plural name. I
didn't see any reason for the hash case to be different.

Let me know what you think.

I've patched the run-time pruning v18 against this and it now passes regression.

I need to do a bit more testing on this to ensure it works for all
cases, but thought I'd send now as I suspect you're currently around
to look.

There might be another issue with the patch too, but I'll send a
separate email about that.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

v46_fixes_drowley.patch
Description: Binary data

Re: [HACKERS] path toward faster partition pruning

2018-04-03 Thread David Rowley

On 4 April 2018 at 09:47, David Rowley  wrote:
> On 4 April 2018 at 00:02, Amit Langote  wrote:
>> But actually, the presence of only Params in the pruning steps should
>> result in the pruning not being invoked at all (at least for the static
>> pruning case), thus selecting all partitions containing non-null data.  It
>> is better to implement that instead of a workaround like scan_all_nonnulls
>> side-channel I was talking about.
>
> I don't think this is quite true. Since we're only using strict
> clauses, a list of quals with just Params still means that NULLs can't
> match. If you skip the step altogether then won't you have you've lost
> the chance at pruning away any NULL-only partition?
>
> I think it would be better to just have special handling in
> get_matching_list_bound so that it knows it's performing <>
> elimination. I'd thought about passing some other opstrategy but the
> only safe one I thought to use was InvalidStrategy, which is already
> used by NULL handling.

I'm currently working up a patch to do this the way I think is best.

I'll submit it soon and we can review and get your thoughts on it.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-03 Thread David Rowley

On 4 April 2018 at 00:02, Amit Langote  wrote:
> But actually, the presence of only Params in the pruning steps should
> result in the pruning not being invoked at all (at least for the static
> pruning case), thus selecting all partitions containing non-null data.  It
> is better to implement that instead of a workaround like scan_all_nonnulls
> side-channel I was talking about.

I don't think this is quite true. Since we're only using strict
clauses, a list of quals with just Params still means that NULLs can't
match. If you skip the step altogether then won't you have you've lost
the chance at pruning away any NULL-only partition?

I think it would be better to just have special handling in
get_matching_list_bound so that it knows it's performing <>
elimination. I'd thought about passing some other opstrategy but the
only safe one I thought to use was InvalidStrategy, which is already
used by NULL handling.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-02 Thread David Rowley

On 2 April 2018 at 17:18, Amit Langote  wrote:
> On 2018/03/31 0:55, David Rowley wrote:
>> explain (analyze, costs off, summary off, timing off)  execute q1 (1,2,2,0);
>>QUERY PLAN
>> 
>>  Result (actual rows=0 loops=1)
>>One-Time Filter: false
>> (2 rows)
>
> Hmm.  It is the newly added inversion step that's causing this.  When
> creating a generic plan (that is when the planning happens via
> BuildCachedPlan called with boundParams set to NULL), the presence of
> Params will cause an inversion step's source step to produce
> scan-all-partitions sort of result, which the inversion step dutifully
> inverts to a scan-no-partitions result.
>
> I have tried to attack that problem by handling the
> no-values-to-prune-with case using a side-channel to propagate the
> scan-all-partitions result through possibly multiple steps.  That is, a
> base pruning step will set datum_offsets in a PruneStepResult only if
> pruning is carried out by actually comparing values with the partition
> bounds.  If no values were provided (like in the generic plan case), it
> will set a scan_all_nonnull flag instead and return without setting
> datum_offsets.  Combine steps perform their combining duty only if
> datum_offset contains a valid value, that is, if scan_all_nonnulls is not set.

I'm afraid this is still not correct :-(

The following code is not doing the right thing:

+ case COMBINE_UNION:
+ foreach(lc1, cstep->source_stepids)
+ {
+ int step_id = lfirst_int(lc1);
+ PruneStepResult *step_result;
+
+ /*
+ * step_results[step_id] must contain a valid result,
+ * which is confirmed by the fact that cstep's step_id is
+ * greater than step_id and the fact that results of the
+ * individual steps are evaluated in sequence of their
+ * step_ids.
+ */
+ if (step_id >= cstep->step.step_id)
+ elog(ERROR, "invalid pruning combine step argument");
+ step_result = step_results[step_id];
+ Assert(step_result != NULL);
+
+ result->scan_all_nonnull = step_result->scan_all_nonnull;

The last line there is not properly performing a union, it just sets
the result_scan_all_nonnull to whatever the last step's value was.

At the very least it should be |= but I don't really like this new code.

Why did you move away from just storing the matching partitions in a
Bitmapset? If you want to store all non-null partitions, then why not
just set the bits for all non-null partitions? That would cut down on
bugs like this since the combining of step results would just be
simple unions or intersects.

Also, the following code could be made a bit nicer

+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ result->bound_offsets = get_matching_hash_bound(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys,
+ >scan_all_nonnull);

Why not allocate the PruneStepResult inside the get_matching_*_bound,
that way you wouldn't need all those out parameters to set the bool
fields.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-04-01 Thread Amit Langote

On 2018/03/30 22:41, David Rowley wrote:
> On 31 March 2018 at 02:00, David Rowley  wrote:
>> On 31 March 2018 at 01:18, David Rowley  wrote:
>>> I've noticed that there are no outfuncs or readfuncs for all the new
>>> Step types you've added.
>>>
>>> Also, the copy func does not properly copy the step_id in the base
>>> node type. This will remain at 0 after a copyObject()
>>
>> Attaching it as it may save you some time from doing it yourself.
>> Please check it though.
> 
> The attached might be slightly easier to apply. The previous version
> was based on top of some other changes I'd been making.

Thanks David.  I have merged this.

Regards,
Amit

Re: [HACKERS] path toward faster partition pruning

2018-03-30 Thread David Rowley

On 30 March 2018 at 18:38, Amit Langote  wrote:
> Please find attached the updated patches.

There's a bit of a strange case with v45 around prepared statements.
I've not debugged this yet, but in case you get there first, here's
the case:

create table listp (a int, b int) partition by list (a);
create table listp_1 partition of listp for values in(1) partition by list (b);
create table listp_1_1 partition of listp_1 for values in(1);
create table listp_2 partition of listp for values in(2) partition by list (b);
create table listp_2_1 partition of listp_2 for values in(2);

explain select * from listp where b in(1,2) and 2<>b and 0<>b; -- this
one looks fine.
 QUERY PLAN

 Append  (cost=0.00..49.66 rows=22 width=8)
   ->  Seq Scan on listp_1_1  (cost=0.00..49.55 rows=22 width=8)
 Filter: ((b = ANY ('{1,2}'::integer[])) AND (2 <> b) AND (0 <> b))
(3 rows)

prepare q1 (int,int,int,int) as select * from listp where b in($1,$2)
and $3 <> b and $4 <> b;
execute q1 (1,2,3,4);
execute q1 (1,2,3,4);
execute q1 (1,2,3,4);
execute q1 (1,2,3,4);
execute q1 (1,2,3,4);
explain (analyze, costs off, summary off, timing off)  execute q1 (1,2,2,0);
   QUERY PLAN

 Result (actual rows=0 loops=1)
   One-Time Filter: false
(2 rows)

My best guess is that something ate the bits out of a Bitmapset of the
matching partitions somewhere.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

1 2 3 >

1 - 100 of 247 matches

Mail list logo