Re: [HACKERS] path toward faster partition pruning
On 09-05-2018 17:30, Alvaro Herrera wrote: Marina Polyakova wrote: Hello everyone in this thread! I got a similar server crash as in [1] on the master branch since the commit 9fdb675fc5d2de825414e05939727de8b120ae81 when the assertion fails because the second argument ScalarArrayOpExpr is not a Const or an ArrayExpr, but is an ArrayCoerceExpr (see [2]): Hello Marina, thanks for reporting this. I have pushed all fixes derived from this report -- thanks to Amit and Michaël for those. I verified your test case no longer crashes. If you have more elaborate test cases, please do try these too. Hello, thank you all very much! :) -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] path toward faster partition pruning
On 2018/05/09 22:43, Alvaro Herrera wrote: > Amit Langote wrote: >> On 2018/05/09 11:31, David Rowley wrote: >>> On 9 May 2018 at 14:29, Amit Langotewrote: On 2018/05/09 11:20, Michael Paquier wrote: > While looking at this code, is there any reason to not make > gen_partprune_steps static? This is only used in partprune.c for now, > so the intention is to make it available for future patches? > >> Here is a patch that does that. > > Pushed, thanks. Thank you. Regards, Amit
Re: [HACKERS] path toward faster partition pruning
On Wed, May 09, 2018 at 10:39:07AM -0300, Alvaro Herrera wrote: > Michael Paquier wrote: >> This removes a useless default clause in partprune.c and it got >> forgotten in the crowd. Just attaching it again here, and it can just >> be applied on top of the rest. > > Done, thanks for insisting. Thanks! -- Michael signature.asc Description: PGP signature
Re: [HACKERS] path toward faster partition pruning
Marina Polyakova wrote: > Hello everyone in this thread! > I got a similar server crash as in [1] on the master branch since the commit > 9fdb675fc5d2de825414e05939727de8b120ae81 when the assertion fails because > the second argument ScalarArrayOpExpr is not a Const or an ArrayExpr, but is > an ArrayCoerceExpr (see [2]): Hello Marina, thanks for reporting this. I have pushed all fixes derived from this report -- thanks to Amit and Michaël for those. I verified your test case no longer crashes. If you have more elaborate test cases, please do try these too. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] path toward faster partition pruning
Amit Langote wrote: > On 2018/05/09 11:31, David Rowley wrote: > > On 9 May 2018 at 14:29, Amit Langotewrote: > >> On 2018/05/09 11:20, Michael Paquier wrote: > >>> While looking at this code, is there any reason to not make > >>> gen_partprune_steps static? This is only used in partprune.c for now, > >>> so the intention is to make it available for future patches? > Here is a patch that does that. Pushed, thanks. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] path toward faster partition pruning
Michael Paquier wrote: > Alvaro, could it be possible to consider as well the patch I posted > here? > https://www.postgresql.org/message-id/20180424012042.gd1...@paquier.xyz > > This removes a useless default clause in partprune.c and it got > forgotten in the crowd. Just attaching it again here, and it can just > be applied on top of the rest. Done, thanks for insisting. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] path toward faster partition pruning
On Wed, May 09, 2018 at 02:01:26PM +0900, Amit Langote wrote: > On 2018/05/09 11:31, David Rowley wrote: >> On 9 May 2018 at 14:29, Amit Langotewrote: >>> On 2018/05/09 11:20, Michael Paquier wrote: While looking at this code, is there any reason to not make gen_partprune_steps static? This is only used in partprune.c for now, so the intention is to make it available for future patches? >>> >>> Yeah, making it static might be a good idea. I had made it externally >>> visible, because I was under the impression that the runtime pruning >>> related code would want to call it from elsewhere within the planner. >>> But, instead it introduced a make_partition_pruneinfo() which in turn >>> calls get_partprune_steps. >> >> Yeah. Likely left over from when run-time pruning was generating the >> steps during execution rather than during planning. > > Here is a patch that does that. Thanks, Amit. Alvaro, could it be possible to consider as well the patch I posted here? https://www.postgresql.org/message-id/20180424012042.gd1...@paquier.xyz This removes a useless default clause in partprune.c and it got forgotten in the crowd. Just attaching it again here, and it can just be applied on top of the rest. -- Michael diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c index f8844ef2eb..cbbb4c1827 100644 --- a/src/backend/partitioning/partprune.c +++ b/src/backend/partitioning/partprune.c @@ -2950,10 +2950,6 @@ perform_pruning_combine_step(PartitionPruneContext *context, } } break; - - default: - elog(ERROR, "invalid pruning combine op: %d", - (int) cstep->combineOp); } return result; signature.asc Description: PGP signature
Re: [HACKERS] path toward faster partition pruning
On 2018/05/09 11:31, David Rowley wrote: > On 9 May 2018 at 14:29, Amit Langotewrote: >> On 2018/05/09 11:20, Michael Paquier wrote: >>> While looking at this code, is there any reason to not make >>> gen_partprune_steps static? This is only used in partprune.c for now, >>> so the intention is to make it available for future patches? >> >> Yeah, making it static might be a good idea. I had made it externally >> visible, because I was under the impression that the runtime pruning >> related code would want to call it from elsewhere within the planner. >> But, instead it introduced a make_partition_pruneinfo() which in turn >> calls get_partprune_steps. > > Yeah. Likely left over from when run-time pruning was generating the > steps during execution rather than during planning. Here is a patch that does that. Thanks, Amit diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c index f954b92a6b..f1f7b2dea9 100644 --- a/src/backend/partitioning/partprune.c +++ b/src/backend/partitioning/partprune.c @@ -116,6 +116,8 @@ typedef struct PruneStepResult } PruneStepResult; +static List *gen_partprune_steps(RelOptInfo *rel, List *clauses, + bool *contradictory); static List *gen_partprune_steps_internal(GeneratePruningStepsContext *context, RelOptInfo *rel, List *clauses, bool *contradictory); @@ -355,7 +357,7 @@ make_partition_pruneinfo(PlannerInfo *root, List *partition_rels, * If the clauses in the input list are contradictory or there is a * pseudo-constant "false", *contradictory is set to true upon return. */ -List * +static List * gen_partprune_steps(RelOptInfo *rel, List *clauses, bool *contradictory) { GeneratePruningStepsContext context; diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h index c9fe95dc30..3d114b4c71 100644 --- a/src/include/partitioning/partprune.h +++ b/src/include/partitioning/partprune.h @@ -67,7 +67,5 @@ extern List *make_partition_pruneinfo(PlannerInfo *root, List *partition_rels, extern Relids prune_append_rel_partitions(RelOptInfo *rel); extern Bitmapset *get_matching_partitions(PartitionPruneContext *context, List *pruning_steps); -extern List *gen_partprune_steps(RelOptInfo *rel, List *clauses, - bool *contradictory); #endif /* PARTPRUNE_H */
Re: [HACKERS] path toward faster partition pruning
On 9 May 2018 at 14:29, Amit Langotewrote: > On 2018/05/09 11:20, Michael Paquier wrote: >> While looking at this code, is there any reason to not make >> gen_partprune_steps static? This is only used in partprune.c for now, >> so the intention is to make it available for future patches? > > Yeah, making it static might be a good idea. I had made it externally > visible, because I was under the impression that the runtime pruning > related code would want to call it from elsewhere within the planner. > But, instead it introduced a make_partition_pruneinfo() which in turn > calls get_partprune_steps. Yeah. Likely left over from when run-time pruning was generating the steps during execution rather than during planning. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] path toward faster partition pruning
On 2018/05/09 11:20, Michael Paquier wrote: > While looking at this code, is there any reason to not make > gen_partprune_steps static? This is only used in partprune.c for now, > so the intention is to make it available for future patches? Yeah, making it static might be a good idea. I had made it externally visible, because I was under the impression that the runtime pruning related code would want to call it from elsewhere within the planner. But, instead it introduced a make_partition_pruneinfo() which in turn calls get_partprune_steps. Thanks, Amit
Re: [HACKERS] path toward faster partition pruning
On Tue, May 08, 2018 at 07:05:46PM -0300, Alvaro Herrera wrote: > The reason for this crash is that gen_partprune_steps_internal() is > unable to generate any steps for the clause -- which is natural, since > the operator is not in a btree opclass. There are various callers > of gen_partprune_steps_internal that are aware that it could return an > empty set of steps, but this one in match_clause_to_partition_key for > the ScalarArrayOpExpr case was being a bit too optimistic. Indeed. While looking at this code, is there any reason to not make gen_partprune_steps static? This is only used in partprune.c for now, so the intention is to make it available for future patches? -- Michael signature.asc Description: PGP signature
Re: [HACKERS] path toward faster partition pruning
Hi. On 2018/05/09 7:05, Alvaro Herrera wrote: > So I found that this query also crashed (using your rig), > > create table coercepart (a varchar) partition by list (a); > create table coercepart_ab partition of coercepart for values in ('ab'); > create table coercepart_bc partition of coercepart for values in ('bc'); > create table coercepart_cd partition of coercepart for values in ('cd'); > explain (costs off) select * from coercepart where a ~ any ('{ab}'); > > The reason for this crash is that gen_partprune_steps_internal() is > unable to generate any steps for the clause -- which is natural, since > the operator is not in a btree opclass. There are various callers > of gen_partprune_steps_internal that are aware that it could return an > empty set of steps, but this one in match_clause_to_partition_key for > the ScalarArrayOpExpr case was being a bit too optimistic. Yeah, good catch! That fixes the crash, but looking around that code a bit, it seems that we shouldn't even have gotten up to the point you're proposing to fix. If saop_op is not in the partitioning opfamily, it should have bailed out much sooner, that is, here: /* * In case of NOT IN (..), we get a '<>', which we handle if list * partitioning is in use and we're able to confirm that it's negator * is a btree equality operator belonging to the partitioning operator * family. */ if (!op_in_opfamily(saop_op, partopfamily)) { negator = get_negator(saop_op); if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily)) { } + else + /* otherwise, unsupported! */ + return PARTCLAUSE_UNSUPPORTED; Let me propose that we also have this along with the rest of the changes you made in that part of the function. So, attached is an updated patch. Thanks, Amit diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c index f954b92a6b..be9ea8a6db 100644 --- a/src/backend/partitioning/partprune.c +++ b/src/backend/partitioning/partprune.c @@ -571,8 +571,9 @@ get_matching_partitions(PartitionPruneContext *context, List *pruning_steps) * For BoolExpr clauses, we recursively generate steps for each argument, and * return a PartitionPruneStepCombine of their results. * - * The generated steps are added to the context's steps list. Each step is - * assigned a step identifier, unique even across recursive calls. + * The return value is a list of the steps generated, which are also added to + * the context's steps list. Each step is assigned a step identifier, unique + * even across recursive calls. * * If we find clauses that are mutually contradictory, or a pseudoconstant * clause that contains false, we set *contradictory to true and return NIL @@ -1599,6 +1600,7 @@ match_clause_to_partition_key(RelOptInfo *rel, List *elem_exprs, *elem_clauses; ListCell *lc1; + boolcontradictory; if (IsA(leftop, RelabelType)) leftop = ((RelabelType *) leftop)->arg; @@ -1617,7 +1619,7 @@ match_clause_to_partition_key(RelOptInfo *rel, * Only allow strict operators. This will guarantee nulls are * filtered. */ - if (!op_strict(saop->opno)) + if (!op_strict(saop_op)) return PARTCLAUSE_UNSUPPORTED; /* Useless if the array has any volatile functions. */ @@ -1650,6 +1652,9 @@ match_clause_to_partition_key(RelOptInfo *rel, if (strategy != BTEqualStrategyNumber) return PARTCLAUSE_UNSUPPORTED; } + else + /* otherwise, unsupported! */ + return PARTCLAUSE_UNSUPPORTED; } /* @@ -1690,7 +1695,7 @@ match_clause_to_partition_key(RelOptInfo *rel, elem_exprs = lappend(elem_exprs, elem_expr); } } - else + else if (IsA(rightop, ArrayExpr)) { ArrayExpr *arrexpr = castNode(ArrayExpr, rightop); @@ -1704,6 +1709,11 @@ match_clause_to_partition_key(RelOptInfo *rel, elem_exprs = arrexpr->elements; } + else + { + /* Give up on any other clause types. */ + return PARTCLAUSE_UNSUPPORTED; + } /* * Now generate a list of clauses, one for each array element, of the @@ -1722,36 +1732,21 @@ match_clause_to_partition_key(RelOptInfo *rel, } /* -* Build a combine step as if for an OR clause or add the clauses to -
Re: [HACKERS] path toward faster partition pruning
So I found that this query also crashed (using your rig), create table coercepart (a varchar) partition by list (a); create table coercepart_ab partition of coercepart for values in ('ab'); create table coercepart_bc partition of coercepart for values in ('bc'); create table coercepart_cd partition of coercepart for values in ('cd'); explain (costs off) select * from coercepart where a ~ any ('{ab}'); The reason for this crash is that gen_partprune_steps_internal() is unable to generate any steps for the clause -- which is natural, since the operator is not in a btree opclass. There are various callers of gen_partprune_steps_internal that are aware that it could return an empty set of steps, but this one in match_clause_to_partition_key for the ScalarArrayOpExpr case was being a bit too optimistic. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c index f954b92a6b..f8aaccfa18 100644 --- a/src/backend/partitioning/partprune.c +++ b/src/backend/partitioning/partprune.c @@ -571,8 +571,9 @@ get_matching_partitions(PartitionPruneContext *context, List *pruning_steps) * For BoolExpr clauses, we recursively generate steps for each argument, and * return a PartitionPruneStepCombine of their results. * - * The generated steps are added to the context's steps list. Each step is - * assigned a step identifier, unique even across recursive calls. + * The return value is a list of the steps generated, which are also added to + * the context's steps list. Each step is assigned a step identifier, unique + * even across recursive calls. * * If we find clauses that are mutually contradictory, or a pseudoconstant * clause that contains false, we set *contradictory to true and return NIL @@ -1599,6 +1600,7 @@ match_clause_to_partition_key(RelOptInfo *rel, List *elem_exprs, *elem_clauses; ListCell *lc1; + boolcontradictory; if (IsA(leftop, RelabelType)) leftop = ((RelabelType *) leftop)->arg; @@ -1617,7 +1619,7 @@ match_clause_to_partition_key(RelOptInfo *rel, * Only allow strict operators. This will guarantee nulls are * filtered. */ - if (!op_strict(saop->opno)) + if (!op_strict(saop_op)) return PARTCLAUSE_UNSUPPORTED; /* Useless if the array has any volatile functions. */ @@ -1690,7 +1692,7 @@ match_clause_to_partition_key(RelOptInfo *rel, elem_exprs = lappend(elem_exprs, elem_expr); } } - else + else if (IsA(rightop, ArrayExpr)) { ArrayExpr *arrexpr = castNode(ArrayExpr, rightop); @@ -1704,6 +1706,11 @@ match_clause_to_partition_key(RelOptInfo *rel, elem_exprs = arrexpr->elements; } + else + { + /* Give up on any other clause types. */ + return PARTCLAUSE_UNSUPPORTED; + } /* * Now generate a list of clauses, one for each array element, of the @@ -1722,36 +1729,21 @@ match_clause_to_partition_key(RelOptInfo *rel, } /* -* Build a combine step as if for an OR clause or add the clauses to -* the end of the list that's being processed currently. +* If we have an ANY clause and multiple elements, first turn the list +* of clauses into an OR expression. */ if (saop->useOr && list_length(elem_clauses) > 1) - { - Expr *orexpr; - boolcontradictory; + elem_clauses = list_make1(makeBoolExpr(OR_EXPR, elem_clauses, -1)); - orexpr = makeBoolExpr(OR_EXPR, elem_clauses, -1); - *clause_steps = - gen_partprune_steps_internal(context, rel, list_make1(orexpr), - ); - if (contradictory) - return PARTCLAUSE_MATCH_CONTRADICT; - - Assert(list_length(*clause_steps) == 1); - return PARTCLAUSE_MATCH_STEPS; - } - else - { - boolcontradictory; - - *clause_steps = - gen_partprune_steps_internal(context, rel, elem_clauses, -
Re: [HACKERS] path toward faster partition pruning
Michael Paquier wrote: > So the problem appears when an expression needs to use > COERCION_PATH_ARRAYCOERCE for a type coercion from one type to another, > which requires an execution state to be able to build the list of > elements. The clause matching happens at planning state, so while there > are surely cases where this could be improved depending on the > expression type, I propose to just discard all clauses which do not > match OpExpr and ArrayExpr for now, as per the attached. It would be > definitely a good practice to have a default code path returning > PARTCLAUSE_UNSUPPORTED where the element list cannot be built. > > Thoughts? I found a related crash and I'm investigating it further. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [Suspect SPAM] Re: [HACKERS] path toward faster partition pruning
On Tue, May 08, 2018 at 04:07:41PM +0900, Amit Langote wrote: > I have to agree to go with this conservative approach for now. Although > we might be able to evaluate the array elements by applying the coercion > specified by ArrayCoerceExpr, let's save that as an improvement to be > pursued later. Thanks for confirming. Yes, non-volatile functions would be actually safe, and we'd need to be careful about NULL handling as well, but that's definitely out of scope for v11. > FWIW, constraint exclusion wouldn't prune in this case either (that is, if > you try this example with PG 10 or using HEAD as of the parent of > 9fdb675fc5), but it doesn't crash like the new pruning code does. Yeah, I have noticed that. -- Michael signature.asc Description: PGP signature
Re: [Suspect SPAM] Re: [HACKERS] path toward faster partition pruning
Thank you Marina for the report and Michael for following up. On 2018/05/07 16:56, Michael Paquier wrote: > On Mon, May 07, 2018 at 10:37:10AM +0900, Michael Paquier wrote: >> On Fri, May 04, 2018 at 12:32:23PM +0300, Marina Polyakova wrote: >>> I got a similar server crash as in [1] on the master branch since the commit >>> 9fdb675fc5d2de825414e05939727de8b120ae81 when the assertion fails because >>> the second argument ScalarArrayOpExpr is not a Const or an ArrayExpr, but is >>> an ArrayCoerceExpr (see [2]): >> >> Indeed, I can see the crash. I have been playing with this stuff and I >> am in the middle of writing the patch, but let's track this properly for >> now. > > So the problem appears when an expression needs to use > COERCION_PATH_ARRAYCOERCE for a type coercion from one type to another, > which requires an execution state to be able to build the list of > elements. The clause matching happens at planning state, so while there > are surely cases where this could be improved depending on the > expression type, I propose to just discard all clauses which do not > match OpExpr and ArrayExpr for now, as per the attached. It would be > definitely a good practice to have a default code path returning > PARTCLAUSE_UNSUPPORTED where the element list cannot be built. > > Thoughts? I have to agree to go with this conservative approach for now. Although we might be able to evaluate the array elements by applying the coercion specified by ArrayCoerceExpr, let's save that as an improvement to be pursued later. FWIW, constraint exclusion wouldn't prune in this case either (that is, if you try this example with PG 10 or using HEAD as of the parent of 9fdb675fc5), but it doesn't crash like the new pruning code does. Thanks again. Regards, Amit
Re: [HACKERS] path toward faster partition pruning
On Mon, May 07, 2018 at 10:37:10AM +0900, Michael Paquier wrote: > On Fri, May 04, 2018 at 12:32:23PM +0300, Marina Polyakova wrote: > > I got a similar server crash as in [1] on the master branch since the commit > > 9fdb675fc5d2de825414e05939727de8b120ae81 when the assertion fails because > > the second argument ScalarArrayOpExpr is not a Const or an ArrayExpr, but is > > an ArrayCoerceExpr (see [2]): > > Indeed, I can see the crash. I have been playing with this stuff and I > am in the middle of writing the patch, but let's track this properly for > now. So the problem appears when an expression needs to use COERCION_PATH_ARRAYCOERCE for a type coercion from one type to another, which requires an execution state to be able to build the list of elements. The clause matching happens at planning state, so while there are surely cases where this could be improved depending on the expression type, I propose to just discard all clauses which do not match OpExpr and ArrayExpr for now, as per the attached. It would be definitely a good practice to have a default code path returning PARTCLAUSE_UNSUPPORTED where the element list cannot be built. Thoughts? -- Michael diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c index f954b92a6b..2d2f88e880 100644 --- a/src/backend/partitioning/partprune.c +++ b/src/backend/partitioning/partprune.c @@ -1690,7 +1690,7 @@ match_clause_to_partition_key(RelOptInfo *rel, elem_exprs = lappend(elem_exprs, elem_expr); } } - else + else if (IsA(rightop, ArrayExpr)) { ArrayExpr *arrexpr = castNode(ArrayExpr, rightop); @@ -1704,6 +1704,16 @@ match_clause_to_partition_key(RelOptInfo *rel, elem_exprs = arrexpr->elements; } + else + { + /* + * Ignore all other clause types. It could be possible here + * to reach this code path with a type coercion from an + * array type to another with ArrayCoerceExpr which depends on + * per-element execution for the conversion. + */ + return PARTCLAUSE_UNSUPPORTED; + } /* * Now generate a list of clauses, one for each array element, of the diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out index e0cc5f3393..86dcd62d55 100644 --- a/src/test/regress/expected/partition_prune.out +++ b/src/test/regress/expected/partition_prune.out @@ -1073,6 +1073,48 @@ explain (costs off) select * from boolpart where a is not unknown; Filter: (a IS NOT UNKNOWN) (7 rows) +-- type coercion from one array type to another, no pruning +create table coercepart (a varchar) partition by list (a); +create table coercepart_ab partition of coercepart for values in ('ab'); +create table coercepart_cd partition of coercepart for values in ('cd'); +create table coercepart_ef_gh partition of coercepart for values in ('ef', 'gh'); +explain (costs off) select * from coercepart where a in ('ab', to_char(125, '999')); + QUERY PLAN +-- + Append + -> Seq Scan on coercepart_ab + Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, (to_char(125, '999'::text))::character varying])::text[])) + -> Seq Scan on coercepart_cd + Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, (to_char(125, '999'::text))::character varying])::text[])) + -> Seq Scan on coercepart_ef_gh + Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, (to_char(125, '999'::text))::character varying])::text[])) +(7 rows) + +explain (costs off) select * from coercepart where a in ('ab', NULL, to_char(125, '999')); + QUERY PLAN +--- + Append + -> Seq Scan on coercepart_ab + Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, NULL::character varying, (to_char(125, '999'::text))::character varying])::text[])) + -> Seq Scan on coercepart_cd + Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, NULL::character varying, (to_char(125, '999'::text))::character varying])::text[])) + -> Seq Scan on coercepart_ef_gh + Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, NULL::character varying, (to_char(125, '999'::text))::character varying])::text[])) +(7 rows) + +explain (costs off) select * from coercepart where a in ('ef', 'gh', to_char(125, '999')); + QUERY PLAN
Re: [HACKERS] path toward faster partition pruning
On 07-05-2018 4:37, Michael Paquier wrote: On Fri, May 04, 2018 at 12:32:23PM +0300, Marina Polyakova wrote: I got a similar server crash as in [1] on the master branch since the commit 9fdb675fc5d2de825414e05939727de8b120ae81 when the assertion fails because the second argument ScalarArrayOpExpr is not a Const or an ArrayExpr, but is an ArrayCoerceExpr (see [2]): Indeed, I can see the crash. I have been playing with this stuff and I am in the middle of writing the patch, but let's track this properly for now. Thank you very much! -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] path toward faster partition pruning
On Fri, May 04, 2018 at 12:32:23PM +0300, Marina Polyakova wrote: > I got a similar server crash as in [1] on the master branch since the commit > 9fdb675fc5d2de825414e05939727de8b120ae81 when the assertion fails because > the second argument ScalarArrayOpExpr is not a Const or an ArrayExpr, but is > an ArrayCoerceExpr (see [2]): Indeed, I can see the crash. I have been playing with this stuff and I am in the middle of writing the patch, but let's track this properly for now. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] path toward faster partition pruning
Hello everyone in this thread! I got a similar server crash as in [1] on the master branch since the commit 9fdb675fc5d2de825414e05939727de8b120ae81 when the assertion fails because the second argument ScalarArrayOpExpr is not a Const or an ArrayExpr, but is an ArrayCoerceExpr (see [2]): =# create table list_parted ( a varchar ) partition by list (a); =# create table part_ab_cd partition of list_parted for values in ('ab', 'cd'); =# CREATE OR REPLACE FUNCTION public.x_stl_text_integer ( ) RETURNS text STABLE AS $body$ BEGIN RAISE NOTICE 's text integer'; RETURN 1::text; END; $body$ LANGUAGE 'plpgsql'; =# explain (costs off) select * from list_parted where a in ('ab', 'cd', x_stl_text_integer()); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !> [1] https://www.postgresql.org/message-id/CAKcux6nCsCmu9oUnnuKZkeBenYvUFbU2Lt4q2MFNEb7QErzn8w%40mail.gmail.com [2] partprune.c, function match_clause_to_partition_key: if (IsA(rightop, Const)) { ... } else { ArrayExpr *arrexpr = castNode(ArrayExpr, rightop); # fails here ... } -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] path toward faster partition pruning
On 2018/04/21 0:58, Alvaro Herrera wrote: > Amit Langote wrote: > >> PS: git grep "partition by hash\|PARTITION BY HASH" on src/test indicates >> that there are hash partitioning related tests in create_table, >> foreign_key, and partition_join files as well. Do we want to use the >> custom opclass in those files as well? > > By the way, let me suggest 'git grep -i' instead. Ah, thanks. Regards, Amit
Re: [HACKERS] path toward faster partition pruning
Amit Langote wrote: > PS: git grep "partition by hash\|PARTITION BY HASH" on src/test indicates > that there are hash partitioning related tests in create_table, > foreign_key, and partition_join files as well. Do we want to use the > custom opclass in those files as well? By the way, let me suggest 'git grep -i' instead. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] path toward faster partition pruning
On Fri, Apr 13, 2018 at 10:50 AM, Alvaro Herrerawrote: > Robert Haas wrote: >> On Wed, Apr 11, 2018 at 8:35 AM, Alvaro Herrera >> wrote: >> > Here's an idea. Why don't we move the function/opclass creation lines >> > to insert.sql, without the DROPs, and use the same functions/opclasses >> > in the three tests insert.sql, alter_table.sql, hash_part.sql and >> > partition_prune.sql, i.e. not recreate what are essentially the same >> > objects three times? This also leaves them around for the pg_upgrade >> > test, which is not a bad thing. >> >> That sounds good, but maybe we should go further and move the >> partitioning tests out of generically-named things like insert.sql >> altogether and have test names that actually mention partitioning. > > I don't think that's necessary to fix the problem that > partition_prune_hash.sql file has two expected output files. If you > want to propose such a reorganization, feel free, but let's not hijack > the patch at hand. For the record, I'm not a fan of the idea. Fair enough. I don't think I'm hacking the thread much more than it was already hijacked; and it was just a thought. I haven't really studied the tests well enough to have a really clear idea what a better organization would look like. It was just that, for example, the commit that added hash partitioning added tests to 5 different files, and some things had to be duplicated as a result. It sounds like what you've already done improves that, but I was wondering if there's a way to do better. I don't feel super-strongly about it though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] path toward faster partition pruning
Robert Haas wrote: > On Wed, Apr 11, 2018 at 8:35 AM, Alvaro Herrera> wrote: > > Here's an idea. Why don't we move the function/opclass creation lines > > to insert.sql, without the DROPs, and use the same functions/opclasses > > in the three tests insert.sql, alter_table.sql, hash_part.sql and > > partition_prune.sql, i.e. not recreate what are essentially the same > > objects three times? This also leaves them around for the pg_upgrade > > test, which is not a bad thing. > > That sounds good, but maybe we should go further and move the > partitioning tests out of generically-named things like insert.sql > altogether and have test names that actually mention partitioning. I don't think that's necessary to fix the problem that partition_prune_hash.sql file has two expected output files. If you want to propose such a reorganization, feel free, but let's not hijack the patch at hand. For the record, I'm not a fan of the idea. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] path toward faster partition pruning
On Fri, Apr 13, 2018 at 7:45 AM, Amit Langotewrote: > On 2018/04/13 1:47, Robert Haas wrote: >> On Wed, Apr 11, 2018 at 8:35 AM, Alvaro Herrera >> wrote: >>> Here's an idea. Why don't we move the function/opclass creation lines >>> to insert.sql, without the DROPs, and use the same functions/opclasses >>> in the three tests insert.sql, alter_table.sql, hash_part.sql and >>> partition_prune.sql, i.e. not recreate what are essentially the same >>> objects three times? This also leaves them around for the pg_upgrade >>> test, which is not a bad thing. >> >> That sounds good, but maybe we should go further and move the >> partitioning tests out of generically-named things like insert.sql >> altogether and have test names that actually mention partitioning. > > Do you mean to do that for *all* files that have tests exercising some > partitioning code, even if it's just one test? I can see that tests in at > least some of them could be put into their own partition_ file as follows: > > partition_insert (including tests in insert_conflict) > partition_update > partition_triggers > partition_indexing (indexing.sql added when partitioned indexes went in) > partition_ddl (for the tests in create_table and alter_table) > We haven't generally created test files specific to a table type for example temporary tables or unlogged tables, instead have created files by SQL commands. But then that's not true for indexes; we have separate files for indexes and we also have separate file for materialized views and also for various data types. So, our test file organization seems to have cut across of SQL commands and object types. But partitioning seems an area large enough to have files of its own; we already have partition_join and partition_aggregate. Do we want to move to a directory based organization for tests also, where sql/ expected/ will have directories within them for various types of objects like partitioned tables, indexes, regular tables, datatypes etc. and each of those will have files organized by sql commands? An immediate question arises as to where to add the files which exercises a mixture of objects; may be in a directory corresponding to the primary object like materialized views over partitioned tables, would fit materialized view (or just views?) directory. Whatever organization we want to use, it should be easy to find testcases for relevant functionality e.g. all tests for partitioned tables or all alter table command tests. > What about the tests in inherit.sql that start with: > > -- > -- Check that constraint exclusion works correctly with partitions using > -- implicit constraints generated from the partition bound information. > -- > > Maybe, just move all of them to partition_prune.sql, because we no longer > use constraint exclusion for pruning? I think we need to have some testcases somwhere to test constraint exclusion on partitions and partitioned tables, those do not necessarily fit partition pruning. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] path toward faster partition pruning
On 13 April 2018 at 14:15, Amit Langotewrote: > On 2018/04/13 1:47, Robert Haas wrote: >> On Wed, Apr 11, 2018 at 8:35 AM, Alvaro Herrera >> wrote: >>> Here's an idea. Why don't we move the function/opclass creation lines >>> to insert.sql, without the DROPs, and use the same functions/opclasses >>> in the three tests insert.sql, alter_table.sql, hash_part.sql and >>> partition_prune.sql, i.e. not recreate what are essentially the same >>> objects three times? This also leaves them around for the pg_upgrade >>> test, which is not a bad thing. >> >> That sounds good, but maybe we should go further and move the >> partitioning tests out of generically-named things like insert.sql >> altogether and have test names that actually mention partitioning. > > Do you mean to do that for *all* files that have tests exercising some > partitioning code, even if it's just one test? I can see that tests in at > least some of them could be put into their own partition_ file as follows: Wouldn't it be best to just move hash partition tests into hash_part? Leave all the other stuff where it is? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] path toward faster partition pruning
On 2018/04/13 1:47, Robert Haas wrote: > On Wed, Apr 11, 2018 at 8:35 AM, Alvaro Herrera> wrote: >> Here's an idea. Why don't we move the function/opclass creation lines >> to insert.sql, without the DROPs, and use the same functions/opclasses >> in the three tests insert.sql, alter_table.sql, hash_part.sql and >> partition_prune.sql, i.e. not recreate what are essentially the same >> objects three times? This also leaves them around for the pg_upgrade >> test, which is not a bad thing. > > That sounds good, but maybe we should go further and move the > partitioning tests out of generically-named things like insert.sql > altogether and have test names that actually mention partitioning. Do you mean to do that for *all* files that have tests exercising some partitioning code, even if it's just one test? I can see that tests in at least some of them could be put into their own partition_ file as follows: partition_insert (including tests in insert_conflict) partition_update partition_triggers partition_indexing (indexing.sql added when partitioned indexes went in) partition_ddl (for the tests in create_table and alter_table) That leaves: cluster create_index (one test here could be moved to partition_indexing?) foreign_data (could be moved to partition_ddl?) foreign_key (could be moved to partition_ddl?) hash_part(leave alone, because already contains 'part' in the name?) identity join plancache plpgsql publication rowsecurity rules stats_ext tablesample truncate updatable_views vacuum What about the tests in inherit.sql that start with: -- -- Check that constraint exclusion works correctly with partitions using -- implicit constraints generated from the partition bound information. -- Maybe, just move all of them to partition_prune.sql, because we no longer use constraint exclusion for pruning? Thanks, Amit
Re: [HACKERS] path toward faster partition pruning
On Wed, Apr 11, 2018 at 8:35 AM, Alvaro Herrerawrote: > Here's an idea. Why don't we move the function/opclass creation lines > to insert.sql, without the DROPs, and use the same functions/opclasses > in the three tests insert.sql, alter_table.sql, hash_part.sql and > partition_prune.sql, i.e. not recreate what are essentially the same > objects three times? This also leaves them around for the pg_upgrade > test, which is not a bad thing. That sounds good, but maybe we should go further and move the partitioning tests out of generically-named things like insert.sql altogether and have test names that actually mention partitioning. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] path toward faster partition pruning
On 2018/04/11 21:35, Alvaro Herrera wrote: > Here's an idea. Why don't we move the function/opclass creation lines > to insert.sql, without the DROPs, and use the same functions/opclasses > in the three tests insert.sql, alter_table.sql, hash_part.sql and > partition_prune.sql, i.e. not recreate what are essentially the same > objects three times? This also leaves them around for the pg_upgrade > test, which is not a bad thing. > > (This would require a few updates to insert.sql because the definitions > there are different, but it shouldn't be a problem coverage-wise.) OK, I've tried doing that. Needed adjustments to hash_part.sql as well. The hash function for int4 was defined differently in insert.sql, alter_table.sql, and hash_part.sql. I went with the definition in insert.sql, which although slightly different from the one alter_table.sql, didn't affect the latter's output in any way. Since the definition in hash_part.sql was totally different, a couple of tests needed adjusting after starting to use hash opclasses defined in insert.sql. Attached updated patch. PS: git grep "partition by hash\|PARTITION BY HASH" on src/test indicates that there are hash partitioning related tests in create_table, foreign_key, and partition_join files as well. Do we want to use the custom opclass in those files as well? Thanks, Amit From 5a01d81aa7e90ef130b245c5e38b02fe9be5e8d7 Mon Sep 17 00:00:00 2001 From: amitDate: Tue, 10 Apr 2018 16:06:33 +0900 Subject: [PATCH v4] Rewrite hash partition pruning tests to use custom opclass Relying on platform-provided hashing functions makes tests unreliable as shown by buildfarm recently. This adds adjusted tests to partition_prune.sql itself and hence partition_prune_hash.sql is deleted along with two expected output files. Discussion: https://postgr.es/m/CA%2BTgmoZ0D5kJbt8eKXtvVdvTcGGWn6ehWCRSZbWytD-uzH92mQ%40mail.gmail.com --- src/test/regress/expected/alter_table.out | 15 +- src/test/regress/expected/hash_part.out| 23 +-- src/test/regress/expected/insert.out | 32 +++- src/test/regress/expected/partition_prune.out | 191 + src/test/regress/expected/partition_prune_hash.out | 189 .../regress/expected/partition_prune_hash_1.out| 187 src/test/regress/parallel_schedule | 2 +- src/test/regress/serial_schedule | 1 - src/test/regress/sql/alter_table.sql | 15 +- src/test/regress/sql/hash_part.sql | 24 +-- src/test/regress/sql/insert.sql| 36 +++- src/test/regress/sql/partition_prune.sql | 44 - src/test/regress/sql/partition_prune_hash.sql | 41 - 13 files changed, 305 insertions(+), 495 deletions(-) delete mode 100644 src/test/regress/expected/partition_prune_hash.out delete mode 100644 src/test/regress/expected/partition_prune_hash_1.out delete mode 100644 src/test/regress/sql/partition_prune_hash.sql diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out index 63845910a6..50b9443e2d 100644 --- a/src/test/regress/expected/alter_table.out +++ b/src/test/regress/expected/alter_table.out @@ -3662,20 +3662,13 @@ CREATE TABLE quuux2 PARTITION OF quuux FOR VALUES IN (2); INFO: updated partition constraint for default partition "quuux_default1" is implied by existing constraints DROP TABLE quuux; -- check validation when attaching hash partitions --- The default hash functions as they exist today aren't portable; they can --- return different results on different machines. Depending upon how the --- values are hashed, the row may map to different partitions, which result in --- regression failure. To avoid this, let's create a non-default hash function --- that just returns the input value unchanged. -CREATE OR REPLACE FUNCTION dummy_hashint4(a int4, seed int8) RETURNS int8 AS -$$ BEGIN RETURN (a + 1 + seed); END; $$ LANGUAGE 'plpgsql' IMMUTABLE; -CREATE OPERATOR CLASS custom_opclass FOR TYPE int4 USING HASH AS -OPERATOR 1 = , FUNCTION 2 dummy_hashint4(int4, int8); +-- Use hand-rolled hash functions and operator class to get predictable result +-- on different matchines. part_test_int4_ops is defined in insert.sql. -- check that the new partition won't overlap with an existing partition CREATE TABLE hash_parted ( a int, b int -) PARTITION BY HASH (a custom_opclass); +) PARTITION BY HASH (a part_test_int4_ops); CREATE TABLE hpart_1 PARTITION OF hash_parted FOR VALUES WITH (MODULUS 4, REMAINDER 0); CREATE TABLE fail_part (LIKE hpart_1); ALTER TABLE hash_parted ATTACH PARTITION fail_part FOR VALUES WITH (MODULUS 8, REMAINDER 4); @@ -3840,8 +3833,6 @@ SELECT * FROM list_parted; DROP TABLE list_parted, list_parted2, range_parted; DROP TABLE fail_def_part; DROP TABLE hash_parted; -DROP OPERATOR CLASS custom_opclass USING HASH; -DROP FUNCTION
Re: [HACKERS] path toward faster partition pruning
Here's an idea. Why don't we move the function/opclass creation lines to insert.sql, without the DROPs, and use the same functions/opclasses in the three tests insert.sql, alter_table.sql, hash_part.sql and partition_prune.sql, i.e. not recreate what are essentially the same objects three times? This also leaves them around for the pg_upgrade test, which is not a bad thing. (This would require a few updates to insert.sql because the definitions there are different, but it shouldn't be a problem coverage-wise.) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] path toward faster partition pruning
On Wed, Apr 11, 2018 at 2:52 PM, Amit Langotewrote: >> >> I've attached a delta patch that applies to your v2 which does this. >> Do you think it's worth doing? > > We can see check by inspection that individual values are in appropriate > partitions, which is the point of having the inserts and the select just > above the actual pruning related tests. So, I'm not sure if adding the > satisfies_hash_partition against each pruning tests adds much. +1. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] path toward faster partition pruning
Hi David. Thanks for the review. On 2018/04/11 17:59, David Rowley wrote: > On 11 April 2018 at 18:04, Amit Langotewrote: >> Updated patch attached. > > Thanks for the updated patch. > > The only thing I'm not sure about is the chances you've made to the > COALESCE function. > > +CREATE OR REPLACE FUNCTION pp_hashint4_noop(int4, int8) RETURNS int8 AS > +$$SELECT coalesce($1, $2)::int8$$ LANGUAGE sql IMMUTABLE; > +CREATE OPERATOR CLASS pp_test_int4_ops FOR TYPE int4 USING HASH AS > +OPERATOR 1 = , FUNCTION 2 pp_hashint4_noop(int4, int8); > +CREATE OR REPLACE FUNCTION pp_hashtext_length(text, int8) RETURNS int8 AS > +$$SELECT length(coalesce($1, ''))::int8$$ LANGUAGE sql IMMUTABLE; > > Why does one default to the seed and the other to an empty string? > Shouldn't they both do the same thing? If you were to copy the > hash_part.sql you'd just coalesce($1, 0) and coalesce($1, ''), any > special reason not to do that? Oops, so I hadn't actually restored it to the way it is in hash_part.sql. Also, Ashutosh was talking about the custom hashing function used in insert.sql, not hash_part.sql, which I based my revision upon. Fixed it now. > Also just wondering if it's worth adding some verification that we've > actually eliminated the correct partitions by backing the tests up > with a call to satisfies_hash_partition. > > I've attached a delta patch that applies to your v2 which does this. > Do you think it's worth doing? We can see check by inspection that individual values are in appropriate partitions, which is the point of having the inserts and the select just above the actual pruning related tests. So, I'm not sure if adding the satisfies_hash_partition against each pruning tests adds much. Attached revised patch. Thanks, Amit From 4685448a7eb2eaf5feceea2206d136c135b2dea7 Mon Sep 17 00:00:00 2001 From: amit Date: Tue, 10 Apr 2018 16:06:33 +0900 Subject: [PATCH v3] Rewrite hash partition pruning tests to use custom opclass Relying on platform-provided hashing functions makes tests unreliable as shown by buildfarm recently. This adds adjusted tests to partition_prune.sql itself and hence partition_prune_hash.sql is deleted along with two expected output files. Discussion: https://postgr.es/m/CA%2BTgmoZ0D5kJbt8eKXtvVdvTcGGWn6ehWCRSZbWytD-uzH92mQ%40mail.gmail.com --- src/test/regress/expected/partition_prune.out | 237 + src/test/regress/expected/partition_prune_hash.out | 189 .../regress/expected/partition_prune_hash_1.out| 187 src/test/regress/parallel_schedule | 2 +- src/test/regress/serial_schedule | 1 - src/test/regress/sql/partition_prune.sql | 70 +- src/test/regress/sql/partition_prune_hash.sql | 41 7 files changed, 307 insertions(+), 420 deletions(-) delete mode 100644 src/test/regress/expected/partition_prune_hash.out delete mode 100644 src/test/regress/expected/partition_prune_hash_1.out delete mode 100644 src/test/regress/sql/partition_prune_hash.sql diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out index df3fca025e..d13389b9c2 100644 --- a/src/test/regress/expected/partition_prune.out +++ b/src/test/regress/expected/partition_prune.out @@ -1332,6 +1332,243 @@ explain (costs off) select * from rparted_by_int2 where a > 100; drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2; -- +-- Test Partition pruning for HASH partitioning +-- We roll our own operator classes to use for tests, because depending on the +-- platform-provided hashing functions makes tests unreliable +-- +CREATE OR REPLACE FUNCTION pp_hashint4_noop(int4, int8) RETURNS int8 AS +$$SELECT coalesce($1, 0)::int8$$ LANGUAGE sql IMMUTABLE; +CREATE OPERATOR CLASS pp_test_int4_ops FOR TYPE int4 USING HASH AS +OPERATOR 1 = , FUNCTION 2 pp_hashint4_noop(int4, int8); +CREATE OR REPLACE FUNCTION pp_hashtext_length(text, int8) RETURNS int8 AS +$$SELECT length(coalesce($1, ''))::int8$$ LANGUAGE sql IMMUTABLE; +CREATE OPERATOR CLASS pp_test_text_ops FOR TYPE text USING HASH AS +OPERATOR 1 = , FUNCTION 2 pp_hashtext_length(text, int8); +create table hp (a int, b text) partition by hash (a pp_test_int4_ops, b pp_test_text_ops); +create table hp0 partition of hp for values with (modulus 4, remainder 0); +create table hp3 partition of hp for values with (modulus 4, remainder 3); +create table hp1 partition of hp for values with (modulus 4, remainder 1); +create table hp2 partition of hp for values with (modulus 4, remainder 2); +insert into hp values (null, null); +insert into hp values (1, null); +insert into hp values (1, 'xxx'); +insert into hp values (null, 'xxx'); +insert into hp values (2, 'xxx'); +insert into hp values (1, 'abcde'); +select tableoid::regclass, * from hp order by 1; +
Re: [HACKERS] path toward faster partition pruning
On 11 April 2018 at 18:04, Amit Langotewrote: > Updated patch attached. Thanks for the updated patch. The only thing I'm not sure about is the chances you've made to the COALESCE function. +CREATE OR REPLACE FUNCTION pp_hashint4_noop(int4, int8) RETURNS int8 AS +$$SELECT coalesce($1, $2)::int8$$ LANGUAGE sql IMMUTABLE; +CREATE OPERATOR CLASS pp_test_int4_ops FOR TYPE int4 USING HASH AS +OPERATOR 1 = , FUNCTION 2 pp_hashint4_noop(int4, int8); +CREATE OR REPLACE FUNCTION pp_hashtext_length(text, int8) RETURNS int8 AS +$$SELECT length(coalesce($1, ''))::int8$$ LANGUAGE sql IMMUTABLE; Why does one default to the seed and the other to an empty string? Shouldn't they both do the same thing? If you were to copy the hash_part.sql you'd just coalesce($1, 0) and coalesce($1, ''), any special reason not to do that? Also just wondering if it's worth adding some verification that we've actually eliminated the correct partitions by backing the tests up with a call to satisfies_hash_partition. I've attached a delta patch that applies to your v2 which does this. Do you think it's worth doing? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services add_satisfies_hash_partition_verification.patch Description: Binary data
Re: [HACKERS] path toward faster partition pruning
Thanks for the review. On 2018/04/10 21:02, David Rowley wrote: > On 10 April 2018 at 20:56, Amit Langotewrote: >> On 2018/04/10 13:27, Ashutosh Bapat wrote: >>> On Mon, Apr 9, 2018 at 8:56 PM, Robert Haas wrote: CREATE OR REPLACE FUNCTION hashint4_noop(int4, int8) RETURNS int8 AS $$SELECT coalesce($1,0)::int8$$ LANGUAGE sql IMMUTABLE; CREATE OPERATOR CLASS test_int4_ops FOR TYPE int4 USING HASH AS OPERATOR 1 = , FUNCTION 2 hashint4_noop(int4, int8); CREATE TABLE mchash (a int, b text, c jsonb) PARTITION BY HASH (a test_int4_ops, b test_text_ops); >> >> Thanks for the idea. I think it makes sense and also agree that alternate >> outputs approach is not perfectly reliable and maintainable. >> >>> +1. >> >> Attached find a patch that rewrites hash partition pruning tests that >> away. It creates two hash operator classes, one for int4 and another for >> text type and uses them to create hash partitioned table to be used in the >> tests, like done in the existing tests in hash_part.sql. Since that makes >> tests (hopefully) reliably return the same result always, I no longer see >> the need to keep them in a separate partition_prune_hash.sql. The >> reasoning behind having the separate file was to keep the alternative >> output file small as David explained in [1]. >> [1] >> https://www.postgresql.org/message-id/CAKJS1f-SON_hAekqoV4_WQwJBtJ_rvvSe68jRNhuYcXqQ8PoQg%40mail.gmail.com > > I had a quick look, but I'm still confused about why a function like > hash_uint32_extended() is susceptible to varying results depending on > CPU endianness but hash_combine64 is not. It might as well be the combination of both that's sensitive to endianness. I too am not sure exactly which part. They're are both used in succession in compute_hash_value: /* * Compute hash for each datum value by calling respective * datatype-specific hash functions of each partition key * attribute. */ hash = FunctionCall2([i], values[i], seed); /* Form a single 64-bit hash value */ rowHash = hash_combine64(rowHash, DatumGetUInt64(hash)); > Apart from that confusion, looking at the patch: > > +CREATE OR REPLACE FUNCTION pp_hashint4_noop(int4, int8) RETURNS int8 AS > +$$SELECT coalesce($1)::int8$$ LANGUAGE sql IMMUTABLE STRICT; > +CREATE OPERATOR CLASS pp_test_int4_ops FOR TYPE int4 USING HASH AS > +OPERATOR 1 = , FUNCTION 2 pp_hashint4_noop(int4, int8); > +CREATE OR REPLACE FUNCTION pp_hashtext_length(text, int8) RETURNS int8 AS > +$$SELECT length(coalesce($1))::int8$$ LANGUAGE sql IMMUTABLE STRICT; > > > Why coalesce here? Maybe I've not thought of something, but coalesce > only seems useful to me if there's > 1 argument. Plus the function is > strict, so not sure it's really doing even if you added a default. After reading Ashutosh's comment, I realized I didn't really mean to add the STRICT to those function definitions. As these are not operators, but support (hash) procedures, it's insignificant to the pruning code whether they are STRICT or not, unlike clause operators where it is. Also, I've adopted the coalesce-based hashing function from hash_part.sql, albeit with unnecessary tweaks. I've not read anywhere about why the coalesce was used in the first place, but it's insignificant for our purpose here anyway. > I know this one was there before, but I only just noticed it: > > +-- pruning should work if non-null values are provided for all the keys > +explain (costs off) select * from hp where a is null and b is null; > > The comment is a bit misleading given the first test below it is > testing for nulls. Maybe it can be changed to > > +-- pruning should work if values or is null clauses are provided for > all partition keys. I have adjusted the comments. Updated patch attached. Thanks, Amit From 5a6d00d4d9d6aa8bb84dc9699646ee5c4fa77719 Mon Sep 17 00:00:00 2001 From: amit Date: Tue, 10 Apr 2018 16:06:33 +0900 Subject: [PATCH v2] Rewrite hash partition pruning tests to use custom opclass Relying on platform-provided hashing functions makes tests unreliable as shown by buildfarm recently. This adds adjusted tests to partition_prune.sql itself and hence partition_prune_hash.sql is deleted along with two expected output files. Discussion: https://postgr.es/m/CA%2BTgmoZ0D5kJbt8eKXtvVdvTcGGWn6ehWCRSZbWytD-uzH92mQ%40mail.gmail.com --- src/test/regress/expected/partition_prune.out | 201 + src/test/regress/expected/partition_prune_hash.out | 189 --- .../regress/expected/partition_prune_hash_1.out| 187 --- src/test/regress/parallel_schedule | 2 +- src/test/regress/serial_schedule | 1 - src/test/regress/sql/partition_prune.sql | 58 +- src/test/regress/sql/partition_prune_hash.sql |
Re: [HACKERS] path toward faster partition pruning
Thanks for the comment. On 2018/04/10 21:11, Ashutosh Bapat wrote: > On Tue, Apr 10, 2018 at 5:32 PM, David Rowley >wrote: >> Apart from that confusion, looking at the patch: >> >> +CREATE OR REPLACE FUNCTION pp_hashint4_noop(int4, int8) RETURNS int8 AS >> +$$SELECT coalesce($1)::int8$$ LANGUAGE sql IMMUTABLE STRICT; >> +CREATE OPERATOR CLASS pp_test_int4_ops FOR TYPE int4 USING HASH AS >> +OPERATOR 1 = , FUNCTION 2 pp_hashint4_noop(int4, int8); >> +CREATE OR REPLACE FUNCTION pp_hashtext_length(text, int8) RETURNS int8 AS >> +$$SELECT length(coalesce($1))::int8$$ LANGUAGE sql IMMUTABLE STRICT; >> >> >> Why coalesce here? Maybe I've not thought of something, but coalesce >> only seems useful to me if there's > 1 argument. Plus the function is >> strict, so not sure it's really doing even if you added a default. > > I think Amit Langote wanted to write coalesce($1, $2), $2 being the > seed for hash function. See how hash operator class functions are > defined in sql/insert.sql. Actually, I referenced functions and operator classes defined in hash_part.sql, not insert.sql. Although as you point out, I didn't think very hard about the significance of $2 passed to coalesce in those functions. I will fix that and add it back, along with some other changes that makes them almost identical with definitions in hash_part.sql. > May be we should just use the same > functions or even the same tables. Because hash_part.sql and partition_prune.sql tests run in parallel, I've decided to rename the functions, operator classes, and the tables in partition_prune.sql. It seems like a good idea in any case. Also, since the existing pruning tests were written with that table, I decided not to change that. Will post an updated patch after addressing David's comment. Regards, Amit
Re: [HACKERS] path toward faster partition pruning
On Tue, Apr 10, 2018 at 5:32 PM, David Rowleywrote: > On 10 April 2018 at 20:56, Amit Langote wrote: >> On 2018/04/10 13:27, Ashutosh Bapat wrote: >>> On Mon, Apr 9, 2018 at 8:56 PM, Robert Haas wrote: CREATE OR REPLACE FUNCTION hashint4_noop(int4, int8) RETURNS int8 AS $$SELECT coalesce($1,0)::int8$$ LANGUAGE sql IMMUTABLE; CREATE OPERATOR CLASS test_int4_ops FOR TYPE int4 USING HASH AS OPERATOR 1 = , FUNCTION 2 hashint4_noop(int4, int8); CREATE TABLE mchash (a int, b text, c jsonb) PARTITION BY HASH (a test_int4_ops, b test_text_ops); >> >> Thanks for the idea. I think it makes sense and also agree that alternate >> outputs approach is not perfectly reliable and maintainable. >> >>> +1. >> >> Attached find a patch that rewrites hash partition pruning tests that >> away. It creates two hash operator classes, one for int4 and another for >> text type and uses them to create hash partitioned table to be used in the >> tests, like done in the existing tests in hash_part.sql. Since that makes >> tests (hopefully) reliably return the same result always, I no longer see >> the need to keep them in a separate partition_prune_hash.sql. The >> reasoning behind having the separate file was to keep the alternative >> output file small as David explained in [1]. >> [1] >> https://www.postgresql.org/message-id/CAKJS1f-SON_hAekqoV4_WQwJBtJ_rvvSe68jRNhuYcXqQ8PoQg%40mail.gmail.com > > I had a quick look, but I'm still confused about why a function like > hash_uint32_extended() is susceptible to varying results depending on > CPU endianness but hash_combine64 is not. > > Apart from that confusion, looking at the patch: > > +CREATE OR REPLACE FUNCTION pp_hashint4_noop(int4, int8) RETURNS int8 AS > +$$SELECT coalesce($1)::int8$$ LANGUAGE sql IMMUTABLE STRICT; > +CREATE OPERATOR CLASS pp_test_int4_ops FOR TYPE int4 USING HASH AS > +OPERATOR 1 = , FUNCTION 2 pp_hashint4_noop(int4, int8); > +CREATE OR REPLACE FUNCTION pp_hashtext_length(text, int8) RETURNS int8 AS > +$$SELECT length(coalesce($1))::int8$$ LANGUAGE sql IMMUTABLE STRICT; > > > Why coalesce here? Maybe I've not thought of something, but coalesce > only seems useful to me if there's > 1 argument. Plus the function is > strict, so not sure it's really doing even if you added a default. I think Amit Langote wanted to write coalesce($1, $2), $2 being the seed for hash function. See how hash operator class functions are defined in sql/insert.sql. May be we should just use the same functions or even the same tables. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] path toward faster partition pruning
On 10 April 2018 at 20:56, Amit Langotewrote: > On 2018/04/10 13:27, Ashutosh Bapat wrote: >> On Mon, Apr 9, 2018 at 8:56 PM, Robert Haas wrote: >>> CREATE OR REPLACE FUNCTION hashint4_noop(int4, int8) RETURNS int8 AS >>> $$SELECT coalesce($1,0)::int8$$ LANGUAGE sql IMMUTABLE; >>> CREATE OPERATOR CLASS test_int4_ops FOR TYPE int4 USING HASH AS >>> OPERATOR 1 = , FUNCTION 2 hashint4_noop(int4, int8); >>> CREATE TABLE mchash (a int, b text, c jsonb) >>> PARTITION BY HASH (a test_int4_ops, b test_text_ops); > > Thanks for the idea. I think it makes sense and also agree that alternate > outputs approach is not perfectly reliable and maintainable. > >> +1. > > Attached find a patch that rewrites hash partition pruning tests that > away. It creates two hash operator classes, one for int4 and another for > text type and uses them to create hash partitioned table to be used in the > tests, like done in the existing tests in hash_part.sql. Since that makes > tests (hopefully) reliably return the same result always, I no longer see > the need to keep them in a separate partition_prune_hash.sql. The > reasoning behind having the separate file was to keep the alternative > output file small as David explained in [1]. > [1] > https://www.postgresql.org/message-id/CAKJS1f-SON_hAekqoV4_WQwJBtJ_rvvSe68jRNhuYcXqQ8PoQg%40mail.gmail.com I had a quick look, but I'm still confused about why a function like hash_uint32_extended() is susceptible to varying results depending on CPU endianness but hash_combine64 is not. Apart from that confusion, looking at the patch: +CREATE OR REPLACE FUNCTION pp_hashint4_noop(int4, int8) RETURNS int8 AS +$$SELECT coalesce($1)::int8$$ LANGUAGE sql IMMUTABLE STRICT; +CREATE OPERATOR CLASS pp_test_int4_ops FOR TYPE int4 USING HASH AS +OPERATOR 1 = , FUNCTION 2 pp_hashint4_noop(int4, int8); +CREATE OR REPLACE FUNCTION pp_hashtext_length(text, int8) RETURNS int8 AS +$$SELECT length(coalesce($1))::int8$$ LANGUAGE sql IMMUTABLE STRICT; Why coalesce here? Maybe I've not thought of something, but coalesce only seems useful to me if there's > 1 argument. Plus the function is strict, so not sure it's really doing even if you added a default. I know this one was there before, but I only just noticed it: +-- pruning should work if non-null values are provided for all the keys +explain (costs off) select * from hp where a is null and b is null; The comment is a bit misleading given the first test below it is testing for nulls. Maybe it can be changed to +-- pruning should work if values or is null clauses are provided for all partition keys. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] path toward faster partition pruning
On 2018/04/10 13:27, Ashutosh Bapat wrote: > On Mon, Apr 9, 2018 at 8:56 PM, Robert Haaswrote: >> On Fri, Apr 6, 2018 at 11:41 PM, Tom Lane wrote: >>> David Rowley writes: Sounds like you're saying that if we have too many alternative files then there's a chance that one could pass by luck. >>> >>> Yeah, exactly: it passed, but did it pass for the right reason? >>> >>> If there's just two expected-files, it's likely not a big problem, >>> but if you have a bunch it's something to worry about. >>> >>> I'm also wondering how come we had hash partitioning before and >>> did not have this sort of problem. Is it just that we added a >>> new test that's more sensitive to the details of the hashing >>> (if so, could it be made less so)? Or is there actually more >>> platform dependence now than before (and if so, why is that)? >> >> The existing hash partitioning tests did have some dependencies on the >> hash function, but they took care not to use the built-in hash >> functions. Instead they did stuff like this: >> >> CREATE OR REPLACE FUNCTION hashint4_noop(int4, int8) RETURNS int8 AS >> $$SELECT coalesce($1,0)::int8$$ LANGUAGE sql IMMUTABLE; >> CREATE OPERATOR CLASS test_int4_ops FOR TYPE int4 USING HASH AS >> OPERATOR 1 = , FUNCTION 2 hashint4_noop(int4, int8); >> CREATE TABLE mchash (a int, b text, c jsonb) >> PARTITION BY HASH (a test_int4_ops, b test_text_ops); >> >> I think that this approach should also be used for the new tests. >> Variant expected output files are a pain to maintain, and you >> basically just have to take whatever output you get as the right >> answer, because nobody knows what output a certain built-in hash >> function should produce for a given input except by running the code. >> If you do the kind of thing shown above, though, then you can easily >> see by inspection that you're getting the right answer. Thanks for the idea. I think it makes sense and also agree that alternate outputs approach is not perfectly reliable and maintainable. > +1. Attached find a patch that rewrites hash partition pruning tests that away. It creates two hash operator classes, one for int4 and another for text type and uses them to create hash partitioned table to be used in the tests, like done in the existing tests in hash_part.sql. Since that makes tests (hopefully) reliably return the same result always, I no longer see the need to keep them in a separate partition_prune_hash.sql. The reasoning behind having the separate file was to keep the alternative output file small as David explained in [1]. However, I noticed that there is a bug in RelationBuildPartitionKey that causes a crash when using a SQL function as partition support function as the revised tests do, so please refer to and apply the patches I posted here before running the revised tests: https://www.postgresql.org/message-id/3041e853-b1dd-a0c6-ff21-7cc5633bffd0%40lab.ntt.co.jp Thanks, Amit [1] https://www.postgresql.org/message-id/CAKJS1f-SON_hAekqoV4_WQwJBtJ_rvvSe68jRNhuYcXqQ8PoQg%40mail.gmail.com From c1508fc715a7783108f626c67c76fcc1f2303719 Mon Sep 17 00:00:00 2001 From: amit Date: Tue, 10 Apr 2018 16:06:33 +0900 Subject: [PATCH v1] Rewrite hash partition pruning tests to use custom opclass Relying on platform-provided hashing functions makes tests unreliable as shown by buildfarm recently. This adds adjusted tests to partition_prune.sql itself and hence partition_prune_hash.sql is deleted along with two expected output files. Discussion: https://postgr.es/m/CA%2BTgmoZ0D5kJbt8eKXtvVdvTcGGWn6ehWCRSZbWytD-uzH92mQ%40mail.gmail.com --- src/test/regress/expected/partition_prune.out | 202 - src/test/regress/expected/partition_prune_hash.out | 189 --- .../regress/expected/partition_prune_hash_1.out| 187 --- src/test/regress/parallel_schedule | 2 +- src/test/regress/serial_schedule | 1 - src/test/regress/sql/partition_prune.sql | 59 +- src/test/regress/sql/partition_prune_hash.sql | 41 - 7 files changed, 259 insertions(+), 422 deletions(-) delete mode 100644 src/test/regress/expected/partition_prune_hash.out delete mode 100644 src/test/regress/expected/partition_prune_hash_1.out delete mode 100644 src/test/regress/sql/partition_prune_hash.sql diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out index df3fca025e..935e7dc79b 100644 --- a/src/test/regress/expected/partition_prune.out +++ b/src/test/regress/expected/partition_prune.out @@ -1330,7 +1330,207 @@ explain (costs off) select * from rparted_by_int2 where a > 100; Filter: (a > '100'::bigint) (3 rows) -drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
Re: [HACKERS] path toward faster partition pruning
On Mon, Apr 9, 2018 at 8:56 PM, Robert Haaswrote: > On Fri, Apr 6, 2018 at 11:41 PM, Tom Lane wrote: >> David Rowley writes: >>> Sounds like you're saying that if we have too many alternative files >>> then there's a chance that one could pass by luck. >> >> Yeah, exactly: it passed, but did it pass for the right reason? >> >> If there's just two expected-files, it's likely not a big problem, >> but if you have a bunch it's something to worry about. >> >> I'm also wondering how come we had hash partitioning before and >> did not have this sort of problem. Is it just that we added a >> new test that's more sensitive to the details of the hashing >> (if so, could it be made less so)? Or is there actually more >> platform dependence now than before (and if so, why is that)? > > The existing hash partitioning tests did have some dependencies on the > hash function, but they took care not to use the built-in hash > functions. Instead they did stuff like this: > > CREATE OR REPLACE FUNCTION hashint4_noop(int4, int8) RETURNS int8 AS > $$SELECT coalesce($1,0)::int8$$ LANGUAGE sql IMMUTABLE; > CREATE OPERATOR CLASS test_int4_ops FOR TYPE int4 USING HASH AS > OPERATOR 1 = , FUNCTION 2 hashint4_noop(int4, int8); > CREATE TABLE mchash (a int, b text, c jsonb) > PARTITION BY HASH (a test_int4_ops, b test_text_ops); > > I think that this approach should also be used for the new tests. > Variant expected output files are a pain to maintain, and you > basically just have to take whatever output you get as the right > answer, because nobody knows what output a certain built-in hash > function should produce for a given input except by running the code. > If you do the kind of thing shown above, though, then you can easily > see by inspection that you're getting the right answer. +1. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] path toward faster partition pruning
On Fri, Apr 6, 2018 at 11:41 PM, Tom Lanewrote: > David Rowley writes: >> Sounds like you're saying that if we have too many alternative files >> then there's a chance that one could pass by luck. > > Yeah, exactly: it passed, but did it pass for the right reason? > > If there's just two expected-files, it's likely not a big problem, > but if you have a bunch it's something to worry about. > > I'm also wondering how come we had hash partitioning before and > did not have this sort of problem. Is it just that we added a > new test that's more sensitive to the details of the hashing > (if so, could it be made less so)? Or is there actually more > platform dependence now than before (and if so, why is that)? The existing hash partitioning tests did have some dependencies on the hash function, but they took care not to use the built-in hash functions. Instead they did stuff like this: CREATE OR REPLACE FUNCTION hashint4_noop(int4, int8) RETURNS int8 AS $$SELECT coalesce($1,0)::int8$$ LANGUAGE sql IMMUTABLE; CREATE OPERATOR CLASS test_int4_ops FOR TYPE int4 USING HASH AS OPERATOR 1 = , FUNCTION 2 hashint4_noop(int4, int8); CREATE TABLE mchash (a int, b text, c jsonb) PARTITION BY HASH (a test_int4_ops, b test_text_ops); I think that this approach should also be used for the new tests. Variant expected output files are a pain to maintain, and you basically just have to take whatever output you get as the right answer, because nobody knows what output a certain built-in hash function should produce for a given input except by running the code. If you do the kind of thing shown above, though, then you can easily see by inspection that you're getting the right answer. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] path toward faster partition pruning
Hi David. On 2018/04/09 12:48, David Rowley wrote: > While looking at the docs in [1], I saw that we still mention: > > 4. Ensure that the constraint_exclusion configuration parameter is not > disabled in postgresql.conf. If it is, queries will not be optimized > as desired. > > This is no longer true. The attached patch removed it. > > [1] https://www.postgresql.org/docs/10/static/ddl-partitioning.htm Thanks. I was aware of the changes that would need to be made, but you beat me to writing the patch itself. About the patch: While the users no longer need to enable constraint_exclusion true for select queries, one would still need it for update/delete queries, because the new pruning logic only gets invoked for the former. Alas... Also, further down on that page, there is a 5.10.4 Partitioning and Constraint Exclusion sub-section. I think it would also need some tweaks due to new developments. I updated your patch to fix that. Please take a look. Thanks, Amit From a4fe924936fe623ff95e6aa050b8fd7d22dbbb84 Mon Sep 17 00:00:00 2001 From: "dgrow...@gmail.com"Date: Mon, 9 Apr 2018 15:43:32 +1200 Subject: [PATCH v2] Remove mention of constraint_exclusion in partitioning docs As of 9fdb675fc5d2, this GUC now no longer has an affect on partition pruning. --- doc/src/sgml/ddl.sgml | 25 ++--- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index feb2ab7792..eed8753e24 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -3194,13 +3194,14 @@ CREATE INDEX ON measurement (logdate); - - -Ensure that the -configuration parameter is not disabled in postgresql.conf. -If it is, queries will not be optimized as desired. - - + + + Ensure that the + configuration parameter is not disabled in postgresql.conf. + While enabling it is not required for select queries, not doing so will result + in update and delete queries to not be optimized as desired. + + @@ -3767,10 +3768,12 @@ ANALYZE measurement; -Constraint exclusion is a query optimization technique -that improves performance for partitioned tables defined in the -fashion described above (both declaratively partitioned tables and those -implemented using inheritance). As an example: +Constraint exclusion is a query optimization +technique that improves performance for partitioned tables defined in the +fashion described above. While it is used only for update and delete +queries in the case of declaratively partitioned tables, it is used for all +queries in the case of table partitioning implemented using inheritance. +As an example: SET constraint_exclusion = on; -- 2.11.0
Re: [HACKERS] path toward faster partition pruning
While looking at the docs in [1], I saw that we still mention: 4. Ensure that the constraint_exclusion configuration parameter is not disabled in postgresql.conf. If it is, queries will not be optimized as desired. This is no longer true. The attached patch removed it. [1] https://www.postgresql.org/docs/10/static/ddl-partitioning.html -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services 0001-Remove-mention-of-constraint_exclusion-in-partitioni.patch Description: Binary data
Re: [HACKERS] path toward faster partition pruning
Andres Freund wrote: > On 2018-04-07 08:13:23 -0300, Alvaro Herrera wrote: > > Andres Freund wrote: > > > I've also attempted to fix rhinoceros's failure I remarked upon a couple > > > hours ago in > > > https://postgr.es/m/20180406210330.wmqw42wqgiick...@alap3.anarazel.de > > > > And this, too. I was unsure if this was because we were missing calling > > some object access hook from the new code, or the additional pruning. > > That's possible. I did attempt to skim the code, that's where my > complain about the docs originated. There certainly isn't an > InvokeFunctionExecuteHook() present. It's not clear to me whether > that's an issue - we don't invoke the hooks in a significant number of > places either, and as far as I can discern there's not much rule or > reason about where we invoke it. I managed to convince myself that it's not higher-level code's responsibility to invoke the execute hooks; the likelihood of bugs of omission seems just too large. But maybe I'm wrong. There's a small number of InvokeFunctionExecuteHook calls in the executor, but I really doubt that it exhaustively covers everyplace where catalogued functions are called in the executor. CC'ing KaiGai and Stephen Frost; they may want to chip in here. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] path toward faster partition pruning
On 2018-04-07 08:13:23 -0300, Alvaro Herrera wrote: > Andres Freund wrote: > > I've also attempted to fix rhinoceros's failure I remarked upon a couple > > hours ago in > > https://postgr.es/m/20180406210330.wmqw42wqgiick...@alap3.anarazel.de > > And this, too. I was unsure if this was because we were missing calling > some object access hook from the new code, or the additional pruning. That's possible. I did attempt to skim the code, that's where my complain about the docs originated. There certainly isn't an InvokeFunctionExecuteHook() present. It's not clear to me whether that's an issue - we don't invoke the hooks in a significant number of places either, and as far as I can discern there's not much rule or reason about where we invoke it. Greetings, Andres Freund
Re: [HACKERS] path toward faster partition pruning
Andres Freund wrote: > Hi, > > On 2018-04-07 15:49:54 +1200, David Rowley wrote: > > Right, I suggest we wait and see if all members go green again as a > > result of 40e42e1024c, and if they're happy then we could maybe leave > > it as is with the 2 alternatives output files. > > At least the first previously borked animal came back green (termite). Thanks everyone for addressing this. > I've also attempted to fix rhinoceros's failure I remarked upon a couple > hours ago in > https://postgr.es/m/20180406210330.wmqw42wqgiick...@alap3.anarazel.de And this, too. I was unsure if this was because we were missing calling some object access hook from the new code, or the additional pruning. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] path toward faster partition pruning
On Sat, Apr 7, 2018 at 1:39 PM, Tom Lanewrote: > Amit Langote writes: >> Given that the difference only appeared on animals that David pointed >> out have big-endian architecture, it seems we'd only need two output >> files. > > Dunno, I'm wondering whether 32 vs 64 bit will make a difference. There was one 32-bit animal in the failing set, which apparently produces the same hashes as others (allegedly due to endianness difference). powerpc 32-bit userspace on ppc64 host: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite=2018-04-06%2023%3A40%3A07 ...and it has turned green since the alternative outputs fix went in. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite=2018-04-07%2004%3A06%3A09 Thanks, Amit
Re: [HACKERS] path toward faster partition pruning
Amit Langotewrites: > Given that the difference only appeared on animals that David pointed > out have big-endian architecture, it seems we'd only need two output > files. Dunno, I'm wondering whether 32 vs 64 bit will make a difference. regards, tom lane
Re: [HACKERS] path toward faster partition pruning
On Sat, Apr 7, 2018 at 1:09 PM, Andres Freundwrote: > Hi, > > On 2018-04-07 15:49:54 +1200, David Rowley wrote: >> Right, I suggest we wait and see if all members go green again as a >> result of 40e42e1024c, and if they're happy then we could maybe leave >> it as is with the 2 alternatives output files. > > At least the first previously borked animal came back green (termite). > > >> I don't particularly think it matters which hash partition a tuple >> goes into, as long as the hash function spreads the values out enough >> and most importantly, the pruning code looks for the tuple in the >> partition that it was actually inserted into in the first place. >> Obviously, we also want to ensure we never do anything which would >> change the matching partition in either minor or major version >> upgrades too. > > +1 +1 Given that the difference only appeared on animals that David pointed out have big-endian architecture, it seems we'd only need two output files. It does seem true that the extended hashing functions that were adding to support partitioning would somehow be affected by endianness. Thank you David for creating the patches and Andres for committing it. Buildfarm seems to be turning green where it had gone red due to the hashing differences. > I've also attempted to fix rhinoceros's failure I remarked upon a couple > hours ago in > https://postgr.es/m/20180406210330.wmqw42wqgiick...@alap3.anarazel.de Thanks Andres. Regards, Amit
Re: [HACKERS] path toward faster partition pruning
On 7 April 2018 at 15:18, Andres Freundwrote: > I've pushed the two patches (collapsed). Trying to get the BF green-ish > again... termite has now gone green. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] path toward faster partition pruning
On 7 April 2018 at 16:09, Andres Freundwrote: > I've also attempted to fix rhinoceros's failure I remarked upon a couple > hours ago in > https://postgr.es/m/20180406210330.wmqw42wqgiick...@alap3.anarazel.de Oh, thanks! I had just been looking at that too... -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] path toward faster partition pruning
On 7 April 2018 at 09:03, Andres Freundwrote: > There's also > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros=2018-04-06%2020%3A45%3A01 > *** > /opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/contrib/sepgsql/expected/misc.out >2018-02-20 18:45:02.068665297 -0800 > --- > /opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/contrib/sepgsql/results/misc.out > 2018-04-06 13:55:50.718253850 -0700 > *** > *** 32,40 > (6 rows) > > SELECT * FROM t1p WHERE o > 50 AND p like '%64%'; > - LOG: SELinux: allowed { execute } > scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 > tcontext=system_u:object_r:sepgsql_proc_exec_t:s0 tclass=db_procedure > name="pg_catalog.int4le(integer,integer)" > - LOG: SELinux: allowed { execute } > scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 > tcontext=system_u:object_r:sepgsql_proc_exec_t:s0 tclass=db_procedure > name="pg_catalog.int4le(integer,integer)" > - LOG: SELinux: allowed { execute } > scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 > tcontext=system_u:object_r:sepgsql_proc_exec_t:s0 tclass=db_procedure > name="pg_catalog.int4le(integer,integer)" > LOG: SELinux: allowed { select } > scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 > tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_table > name="public.t1p" > LOG: SELinux: allowed { select } > scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 > tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column > name="table t1p column o" > LOG: SELinux: allowed { select } > scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 > tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column > name="table t1p column p" > --- 32,37 > > seems you just need to remove those rows from the expected file. Agreed. Patch attached. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services part_prune_sepgsql_fix.patch Description: Binary data
Re: [HACKERS] path toward faster partition pruning
Hi, On 2018-04-07 15:49:54 +1200, David Rowley wrote: > Right, I suggest we wait and see if all members go green again as a > result of 40e42e1024c, and if they're happy then we could maybe leave > it as is with the 2 alternatives output files. At least the first previously borked animal came back green (termite). > I don't particularly think it matters which hash partition a tuple > goes into, as long as the hash function spreads the values out enough > and most importantly, the pruning code looks for the tuple in the > partition that it was actually inserted into in the first place. > Obviously, we also want to ensure we never do anything which would > change the matching partition in either minor or major version > upgrades too. +1 I've also attempted to fix rhinoceros's failure I remarked upon a couple hours ago in https://postgr.es/m/20180406210330.wmqw42wqgiick...@alap3.anarazel.de Greetings, Andres Freund
Re: [HACKERS] path toward faster partition pruning
On 7 April 2018 at 15:41, Tom Lanewrote: > David Rowley writes: >> Sounds like you're saying that if we have too many alternative files >> then there's a chance that one could pass by luck. > > Yeah, exactly: it passed, but did it pass for the right reason? > > If there's just two expected-files, it's likely not a big problem, > but if you have a bunch it's something to worry about. Right, I suggest we wait and see if all members go green again as a result of 40e42e1024c, and if they're happy then we could maybe leave it as is with the 2 alternatives output files. If there are some other variations that crop up, then we can think harder about what we can do to improve the coverage. I don't particularly think it matters which hash partition a tuple goes into, as long as the hash function spreads the values out enough and most importantly, the pruning code looks for the tuple in the partition that it was actually inserted into in the first place. Obviously, we also want to ensure we never do anything which would change the matching partition in either minor or major version upgrades too. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] path toward faster partition pruning
Hi, On 2018-04-06 23:41:22 -0400, Tom Lane wrote: > David Rowleywrites: > > Sounds like you're saying that if we have too many alternative files > > then there's a chance that one could pass by luck. > > Yeah, exactly: it passed, but did it pass for the right reason? > > If there's just two expected-files, it's likely not a big problem, > but if you have a bunch it's something to worry about. There should be only two alternatives, given our current hashing implementation, right? Greetings, Andres Freund
Re: [HACKERS] path toward faster partition pruning
On 7 April 2018 at 15:41, Tom Lanewrote: > I'm also wondering how come we had hash partitioning before and > did not have this sort of problem. Is it just that we added a > new test that's more sensitive to the details of the hashing > (if so, could it be made less so)? Or is there actually more > platform dependence now than before (and if so, why is that)? We didn't prune HASH partitions before today. They were just all returned. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] path toward faster partition pruning
David Rowleywrites: > Sounds like you're saying that if we have too many alternative files > then there's a chance that one could pass by luck. Yeah, exactly: it passed, but did it pass for the right reason? If there's just two expected-files, it's likely not a big problem, but if you have a bunch it's something to worry about. I'm also wondering how come we had hash partitioning before and did not have this sort of problem. Is it just that we added a new test that's more sensitive to the details of the hashing (if so, could it be made less so)? Or is there actually more platform dependence now than before (and if so, why is that)? regards, tom lane
Re: [HACKERS] path toward faster partition pruning
On Sat, Apr 7, 2018 at 8:55 AM, David Rowleywrote: > On 7 April 2018 at 15:14, Ashutosh Bapat > wrote: >> On Sat, Apr 7, 2018 at 8:37 AM, David Rowley >>> Why is writing tests that produce the same output required? >>> >>> We have many tests with alternative outputs. Look in >>> src/tests/regress/expected for files matching _1.out >>> >> >> That's true, but we usually add such alternative output when we know >> all the variants possible as long as "all the variants" do not cover >> everything possible. AFAIU, that's not true here. Also, on a given >> machine a particular row is guaranteed to fall in a given partition. >> On a different machine it will fall in some other partition, but >> always that partition on that machine. We don't have a way to select >> alternate output based on the architecture. May be a better idea is to >> use .source file, creating .out on the fly based on the architecture >> of machine like testing the hash output for a given value to decide >> which partition it will fall into and then crafting .out with that >> partition's name. > > Sounds like you're saying that if we have too many alternative files > then there's a chance that one could pass by luck. Yes. > > Maybe we can just back up what's just been committed by actually > executing the queries and ensuring that all rows that made it into the > table make it back out again. That's one way. But how would we make sure that they landed in proper partition. Actually we do not know what's proper partition for a given architecture. And how would we make sure that all rows with the same partition key land in the same partition. That's why I am suggesting to calculate the hash value, take modulo and craft the name of partition where corresponding row will land on a given architecture. That way, we are sure that the tuple routing logic is correct and also the partition pruning logic. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] path toward faster partition pruning
On 7 April 2018 at 15:14, Ashutosh Bapatwrote: > On Sat, Apr 7, 2018 at 8:37 AM, David Rowley >> Why is writing tests that produce the same output required? >> >> We have many tests with alternative outputs. Look in >> src/tests/regress/expected for files matching _1.out >> > > That's true, but we usually add such alternative output when we know > all the variants possible as long as "all the variants" do not cover > everything possible. AFAIU, that's not true here. Also, on a given > machine a particular row is guaranteed to fall in a given partition. > On a different machine it will fall in some other partition, but > always that partition on that machine. We don't have a way to select > alternate output based on the architecture. May be a better idea is to > use .source file, creating .out on the fly based on the architecture > of machine like testing the hash output for a given value to decide > which partition it will fall into and then crafting .out with that > partition's name. Sounds like you're saying that if we have too many alternative files then there's a chance that one could pass by luck. Maybe we can just back up what's just been committed by actually executing the queries and ensuring that all rows that made it into the table make it back out again. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] path toward faster partition pruning
On 7 April 2018 at 15:18, Andres Freundwrote: > I've pushed the two patches (collapsed). Trying to get the BF green-ish > again... Thanks! -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] path toward faster partition pruning
On 2018-04-07 15:04:37 +1200, David Rowley wrote: > On 7 April 2018 at 15:00, Andres Freundwrote: > > On 2018-04-07 14:42:53 +1200, David Rowley wrote: > >> On 7 April 2018 at 13:31, David Rowley > >> wrote: > >> > Maybe the best solution is to pull those tests out of > >> > partition_prune.sql then create partition_prune_hash and just have an > >> > alternative .out file with the partitions which match on bigendian > >> > machines. > >> > >> Here's 1 of 2. I thought it was best to get the buildfarm green again > >> as soon as possible. > > > > Do you have an estimate how long it'll take you to produce patch 2? It'd > > be cool to get this covered again soon. If you don't have access to a > > big endian machine, we can construct the output from the buildfarm... So > > pulling the tests out would be the only "urgent" thing, I can go on from > > there. > > Attached. > > I've not tested on a bigendian machine, but the diff -c between the > two output files match the diff on the failing buildfarm members. I've pushed the two patches (collapsed). Trying to get the BF green-ish again... - Andres
Re: [HACKERS] path toward faster partition pruning
On Sat, Apr 7, 2018 at 8:37 AM, David Rowleywrote: > On 7 April 2018 at 15:03, Ashutosh Bapat > wrote: >> On Sat, Apr 7, 2018 at 7:25 AM, David Rowley >>> The only alternative would be to change all the hash functions so that >>> they normalise their endianness. It does not sound like something that >>> will perform very well. Plus it would break everyone's hash indexes on >>> a pg_upgrade. >>> >>> pg_basebackups can't be transferred over to other architectures >>> anyway, so I'm not so worried about tuples being routed to other >>> partitions. >>> >>> Maybe someone else can see a reason why this is bad? >> >> I don't think the concept is bad by itself. That's expected, in fact, >> we have added an option to pg_dump (dump through parent or some such) >> to handle exactly this case. What Amit seems to be complaining though >> is the regression test. We need to write regression tests so that they >> produce the same plans, pruning same partitions by name, on all >> architectures. > > Why is writing tests that produce the same output required? > > We have many tests with alternative outputs. Look in > src/tests/regress/expected for files matching _1.out > That's true, but we usually add such alternative output when we know all the variants possible as long as "all the variants" do not cover everything possible. AFAIU, that's not true here. Also, on a given machine a particular row is guaranteed to fall in a given partition. On a different machine it will fall in some other partition, but always that partition on that machine. We don't have a way to select alternate output based on the architecture. May be a better idea is to use .source file, creating .out on the fly based on the architecture of machine like testing the hash output for a given value to decide which partition it will fall into and then crafting .out with that partition's name. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] path toward faster partition pruning
On 7 April 2018 at 15:03, Ashutosh Bapatwrote: > On Sat, Apr 7, 2018 at 7:25 AM, David Rowley >> The only alternative would be to change all the hash functions so that >> they normalise their endianness. It does not sound like something that >> will perform very well. Plus it would break everyone's hash indexes on >> a pg_upgrade. >> >> pg_basebackups can't be transferred over to other architectures >> anyway, so I'm not so worried about tuples being routed to other >> partitions. >> >> Maybe someone else can see a reason why this is bad? > > I don't think the concept is bad by itself. That's expected, in fact, > we have added an option to pg_dump (dump through parent or some such) > to handle exactly this case. What Amit seems to be complaining though > is the regression test. We need to write regression tests so that they > produce the same plans, pruning same partitions by name, on all > architectures. Why is writing tests that produce the same output required? We have many tests with alternative outputs. Look in src/tests/regress/expected for files matching _1.out -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] path toward faster partition pruning
On 7 April 2018 at 15:00, Andres Freundwrote: > On 2018-04-07 14:42:53 +1200, David Rowley wrote: >> On 7 April 2018 at 13:31, David Rowley wrote: >> > Maybe the best solution is to pull those tests out of >> > partition_prune.sql then create partition_prune_hash and just have an >> > alternative .out file with the partitions which match on bigendian >> > machines. >> >> Here's 1 of 2. I thought it was best to get the buildfarm green again >> as soon as possible. > > Do you have an estimate how long it'll take you to produce patch 2? It'd > be cool to get this covered again soon. If you don't have access to a > big endian machine, we can construct the output from the buildfarm... So > pulling the tests out would be the only "urgent" thing, I can go on from > there. Attached. I've not tested on a bigendian machine, but the diff -c between the two output files match the diff on the failing buildfarm members. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services 0002-Add-HASH-partition-pruning-tests.patch Description: Binary data
Re: [HACKERS] path toward faster partition pruning
On Sat, Apr 7, 2018 at 7:25 AM, David Rowleywrote: > On 7 April 2018 at 13:50, Amit Langote wrote: >> On Sat, Apr 7, 2018 at 10:31 AM, David Rowley >>> I looked at all the regression test diffs for each of the servers you >>> mentioned and I verified that the diffs match on each of the 7 >>> servers. >>> >>> Maybe the best solution is to pull those tests out of >>> partition_prune.sql then create partition_prune_hash and just have an >>> alternative .out file with the partitions which match on bigendian >>> machines. >>> >>> We could also keep them in the same file, but that's a much bigger >>> alternative file to maintain and more likely to get broken if someone >>> forgets to update it. >>> >>> What do you think? >> >> Yeah, that's an idea. >> >> Is it alright though that same data may end up in different hash >> partitions depending on the architecture? IIRC, that's the way we >> decided to go when using hash partitioning, but it would've been >> clearer if there was already some evidence in regression tests that >> that's what we've chosen, such as, some existing tests for tuple >> routing. > > The only alternative would be to change all the hash functions so that > they normalise their endianness. It does not sound like something that > will perform very well. Plus it would break everyone's hash indexes on > a pg_upgrade. > > pg_basebackups can't be transferred over to other architectures > anyway, so I'm not so worried about tuples being routed to other > partitions. > > Maybe someone else can see a reason why this is bad? I don't think the concept is bad by itself. That's expected, in fact, we have added an option to pg_dump (dump through parent or some such) to handle exactly this case. What Amit seems to be complaining though is the regression test. We need to write regression tests so that they produce the same plans, pruning same partitions by name, on all architectures. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] path toward faster partition pruning
On 2018-04-07 14:42:53 +1200, David Rowley wrote: > On 7 April 2018 at 13:31, David Rowleywrote: > > Maybe the best solution is to pull those tests out of > > partition_prune.sql then create partition_prune_hash and just have an > > alternative .out file with the partitions which match on bigendian > > machines. > > Here's 1 of 2. I thought it was best to get the buildfarm green again > as soon as possible. Do you have an estimate how long it'll take you to produce patch 2? It'd be cool to get this covered again soon. If you don't have access to a big endian machine, we can construct the output from the buildfarm... So pulling the tests out would be the only "urgent" thing, I can go on from there. - Andres
Re: [HACKERS] path toward faster partition pruning
On 7 April 2018 at 13:31, David Rowleywrote: > Maybe the best solution is to pull those tests out of > partition_prune.sql then create partition_prune_hash and just have an > alternative .out file with the partitions which match on bigendian > machines. Here's 1 of 2. I thought it was best to get the buildfarm green again as soon as possible. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services 0001-Remove-HASH-partition-pruning-tests.patch Description: Binary data
Re: [HACKERS] path toward faster partition pruning
On 7 April 2018 at 13:50, Amit Langotewrote: > On Sat, Apr 7, 2018 at 10:31 AM, David Rowley >> I looked at all the regression test diffs for each of the servers you >> mentioned and I verified that the diffs match on each of the 7 >> servers. >> >> Maybe the best solution is to pull those tests out of >> partition_prune.sql then create partition_prune_hash and just have an >> alternative .out file with the partitions which match on bigendian >> machines. >> >> We could also keep them in the same file, but that's a much bigger >> alternative file to maintain and more likely to get broken if someone >> forgets to update it. >> >> What do you think? > > Yeah, that's an idea. > > Is it alright though that same data may end up in different hash > partitions depending on the architecture? IIRC, that's the way we > decided to go when using hash partitioning, but it would've been > clearer if there was already some evidence in regression tests that > that's what we've chosen, such as, some existing tests for tuple > routing. The only alternative would be to change all the hash functions so that they normalise their endianness. It does not sound like something that will perform very well. Plus it would break everyone's hash indexes on a pg_upgrade. pg_basebackups can't be transferred over to other architectures anyway, so I'm not so worried about tuples being routed to other partitions. Maybe someone else can see a reason why this is bad? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] path toward faster partition pruning
On Sat, Apr 7, 2018 at 10:31 AM, David Rowleywrote: > On 7 April 2018 at 12:43, David Rowley wrote: >> On 7 April 2018 at 12:35, Amit Langote wrote: >>> So this same failure occurs on (noting the architecture): >>> >>> Seems to be due to that the hashing function used in partitioning >>> gives different answer for a given set of partition key values than >>> others. >> >> They all look like bigendian CPUs. > > I looked at all the regression test diffs for each of the servers you > mentioned and I verified that the diffs match on each of the 7 > servers. > > Maybe the best solution is to pull those tests out of > partition_prune.sql then create partition_prune_hash and just have an > alternative .out file with the partitions which match on bigendian > machines. > > We could also keep them in the same file, but that's a much bigger > alternative file to maintain and more likely to get broken if someone > forgets to update it. > > What do you think? Yeah, that's an idea. Is it alright though that same data may end up in different hash partitions depending on the architecture? IIRC, that's the way we decided to go when using hash partitioning, but it would've been clearer if there was already some evidence in regression tests that that's what we've chosen, such as, some existing tests for tuple routing. Thanks, Amit
Re: [HACKERS] path toward faster partition pruning
On 7 April 2018 at 12:43, David Rowleywrote: > On 7 April 2018 at 12:35, Amit Langote wrote: >> So this same failure occurs on (noting the architecture): >> >> Seems to be due to that the hashing function used in partitioning >> gives different answer for a given set of partition key values than >> others. > > They all look like bigendian CPUs. I looked at all the regression test diffs for each of the servers you mentioned and I verified that the diffs match on each of the 7 servers. Maybe the best solution is to pull those tests out of partition_prune.sql then create partition_prune_hash and just have an alternative .out file with the partitions which match on bigendian machines. We could also keep them in the same file, but that's a much bigger alternative file to maintain and more likely to get broken if someone forgets to update it. What do you think? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] path toward faster partition pruning
On 7 April 2018 at 12:35, Amit Langotewrote: > Thank you Alvaro for rest of the cleanup and committing. +10! > So this same failure occurs on (noting the architecture): > > ppc64: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=quokka=2018-04-06%2020%3A09%3A52 > > ia64: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole=2018-04-06%2022%3A32%3A24 > > ppc64 (POWER7): > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern=2018-04-06%2022%3A58%3A13 > > ppc64 (POWER7): > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet=2018-04-06%2023%3A02%3A13 > > powerpc: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog=2018-04-06%2023%3A05%3A08 > > powerpc: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust=2018-04-06%2023%3A13%3A23 > > powerpc 32-bit userspace on ppc64 host: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite=2018-04-06%2023%3A40%3A07 > > Seems to be due to that the hashing function used in partitioning > gives different answer for a given set of partition key values than > others. They all look like bigendian CPUs. https://en.wikipedia.org/wiki/Comparison_of_instruction_set_architectures#Endianness -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] path toward faster partition pruning
Thank you Alvaro for rest of the cleanup and committing. On Sat, Apr 7, 2018 at 5:28 AM, Alvaro Herrerawrote: > So I pushed this 25 minutes ago, and already there's a couple of > buildfarm members complaining: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=quokka=2018-04-06%2020%3A09%3A52 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite=2018-04-06%2019%3A55%3A07 > > Both show exactly the same diff in test partition_prune: > > *** > /home/pgbuildfarm/buildroot-termite/HEAD/pgsql.build/../pgsql/src/test/regress/expected/partition_prune.out > Fri Apr 6 15:55:08 2018 > --- > /home/pgbuildfarm/buildroot-termite/HEAD/pgsql.build/src/test/regress/results/partition_prune.out >Fri Apr 6 16:01:40 2018 > *** > *** 1348,1357 > --++- >hp0 || >hp0 | 1 | > ! hp0 | 1 | xxx >hp3 | 10 | yyy > ! hp1 || xxx > ! hp2 | 10 | xxx > (6 rows) > > -- partial keys won't prune, nor would non-equality conditions > --- 1348,1357 > --++- >hp0 || >hp0 | 1 | > ! hp0 | 10 | xxx > ! hp3 || xxx >hp3 | 10 | yyy > ! hp2 | 1 | xxx > (6 rows) > > -- partial keys won't prune, nor would non-equality conditions > *** > *** 1460,1466 > QUERY PLAN > - >Append > !-> Seq Scan on hp0 >Filter: ((a = 1) AND (b = 'xxx'::text)) > (3 rows) > > --- 1460,1466 > QUERY PLAN > - >Append > !-> Seq Scan on hp2 >Filter: ((a = 1) AND (b = 'xxx'::text)) > (3 rows) > > *** > *** 1468,1474 >QUERY PLAN > - >Append > !-> Seq Scan on hp1 >Filter: ((a IS NULL) AND (b = 'xxx'::text)) > (3 rows) > > --- 1468,1474 >QUERY PLAN > - >Append > !-> Seq Scan on hp3 >Filter: ((a IS NULL) AND (b = 'xxx'::text)) > (3 rows) > > *** > *** 1476,1482 > QUERY PLAN > -- >Append > !-> Seq Scan on hp2 >Filter: ((a = 10) AND (b = 'xxx'::text)) > (3 rows) > > --- 1476,1482 > QUERY PLAN > -- >Append > !-> Seq Scan on hp0 >Filter: ((a = 10) AND (b = 'xxx'::text)) > (3 rows) > > *** > *** 1494,1504 >Append > -> Seq Scan on hp0 >Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = > 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL))) > --> Seq Scan on hp2 > - Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = > 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL))) > -> Seq Scan on hp3 >Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = > 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL))) > ! (7 rows) > > -- hash partitiong pruning doesn't occur with <> operator clauses > explain (costs off) select * from hp where a <> 1 and b <> 'xxx'; > --- 1494,1502 >Append > -> Seq Scan on hp0 >Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = > 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL))) > -> Seq Scan on hp3 >Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = > 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL))) > ! (5 rows) So this same failure occurs on (noting the architecture): ppc64: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=quokka=2018-04-06%2020%3A09%3A52 ia64: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole=2018-04-06%2022%3A32%3A24 ppc64 (POWER7): https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern=2018-04-06%2022%3A58%3A13 ppc64 (POWER7): https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet=2018-04-06%2023%3A02%3A13 powerpc: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog=2018-04-06%2023%3A05%3A08 powerpc: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust=2018-04-06%2023%3A13%3A23 powerpc 32-bit userspace on ppc64 host: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite=2018-04-06%2023%3A40%3A07 Seems to be due to that the hashing function used in partitioning gives different answer for a given set of partition key values than others. Thanks, Amit
Re: [HACKERS] path toward faster partition pruning
Robert Haas wrote: > On Fri, Apr 6, 2018 at 8:24 AM, Alvaro Herrera> wrote: > > I don't actually like very much the idea of putting all this code in > > optimizer/util. This morning it occurred to me that we should create a new > > src/backend/partitioning/ (and a src/include/partitioning/ to go with > > it) and drop a bunch of files there. Even your proposed new partcache.c > > will seem misplaced *anywhere*, since it contains support code to be > > used by both planner and executor; in src/{backend,include}/partitioning > > it will be able to serve both without it being a modularity wart. > > Uh, what? > > Surely partcache.c is correctly placed next to relcache.c and > syscache.c and everything else in src/backend/utils/cache. Frankly, I'm not real sure about partcache.c yet. Are you? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] path toward faster partition pruning
On Fri, Apr 6, 2018 at 8:24 AM, Alvaro Herrerawrote: > I don't actually like very much the idea of putting all this code in > optimizer/util. This morning it occurred to me that we should create a new > src/backend/partitioning/ (and a src/include/partitioning/ to go with > it) and drop a bunch of files there. Even your proposed new partcache.c > will seem misplaced *anywhere*, since it contains support code to be > used by both planner and executor; in src/{backend,include}/partitioning > it will be able to serve both without it being a modularity wart. Uh, what? Surely partcache.c is correctly placed next to relcache.c and syscache.c and everything else in src/backend/utils/cache. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] path toward faster partition pruning
On 2018-04-06 17:28:00 -0300, Alvaro Herrera wrote: > So I pushed this 25 minutes ago, and already there's a couple of > buildfarm members complaining: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=quokka=2018-04-06%2020%3A09%3A52 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite=2018-04-06%2019%3A55%3A07 > > Both show exactly the same diff in test partition_prune: There's also https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros=2018-04-06%2020%3A45%3A01 *** /opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/contrib/sepgsql/expected/misc.out 2018-02-20 18:45:02.068665297 -0800 --- /opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/contrib/sepgsql/results/misc.out 2018-04-06 13:55:50.718253850 -0700 *** *** 32,40 (6 rows) SELECT * FROM t1p WHERE o > 50 AND p like '%64%'; - LOG: SELinux: allowed { execute } scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 tcontext=system_u:object_r:sepgsql_proc_exec_t:s0 tclass=db_procedure name="pg_catalog.int4le(integer,integer)" - LOG: SELinux: allowed { execute } scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 tcontext=system_u:object_r:sepgsql_proc_exec_t:s0 tclass=db_procedure name="pg_catalog.int4le(integer,integer)" - LOG: SELinux: allowed { execute } scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 tcontext=system_u:object_r:sepgsql_proc_exec_t:s0 tclass=db_procedure name="pg_catalog.int4le(integer,integer)" LOG: SELinux: allowed { select } scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_table name="public.t1p" LOG: SELinux: allowed { select } scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column name="table t1p column o" LOG: SELinux: allowed { select } scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column name="table t1p column p" --- 32,37 seems you just need to remove those rows from the expected file. - Andres
Re: [HACKERS] path toward faster partition pruning
So I pushed this 25 minutes ago, and already there's a couple of buildfarm members complaining: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=quokka=2018-04-06%2020%3A09%3A52 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite=2018-04-06%2019%3A55%3A07 Both show exactly the same diff in test partition_prune: *** /home/pgbuildfarm/buildroot-termite/HEAD/pgsql.build/../pgsql/src/test/regress/expected/partition_prune.out Fri Apr 6 15:55:08 2018 --- /home/pgbuildfarm/buildroot-termite/HEAD/pgsql.build/src/test/regress/results/partition_prune.out Fri Apr 6 16:01:40 2018 *** *** 1348,1357 --++- hp0 || hp0 | 1 | ! hp0 | 1 | xxx hp3 | 10 | yyy ! hp1 || xxx ! hp2 | 10 | xxx (6 rows) -- partial keys won't prune, nor would non-equality conditions --- 1348,1357 --++- hp0 || hp0 | 1 | ! hp0 | 10 | xxx ! hp3 || xxx hp3 | 10 | yyy ! hp2 | 1 | xxx (6 rows) -- partial keys won't prune, nor would non-equality conditions *** *** 1460,1466 QUERY PLAN - Append !-> Seq Scan on hp0 Filter: ((a = 1) AND (b = 'xxx'::text)) (3 rows) --- 1460,1466 QUERY PLAN - Append !-> Seq Scan on hp2 Filter: ((a = 1) AND (b = 'xxx'::text)) (3 rows) *** *** 1468,1474 QUERY PLAN - Append !-> Seq Scan on hp1 Filter: ((a IS NULL) AND (b = 'xxx'::text)) (3 rows) --- 1468,1474 QUERY PLAN - Append !-> Seq Scan on hp3 Filter: ((a IS NULL) AND (b = 'xxx'::text)) (3 rows) *** *** 1476,1482 QUERY PLAN -- Append !-> Seq Scan on hp2 Filter: ((a = 10) AND (b = 'xxx'::text)) (3 rows) --- 1476,1482 QUERY PLAN -- Append !-> Seq Scan on hp0 Filter: ((a = 10) AND (b = 'xxx'::text)) (3 rows) *** *** 1494,1504 Append -> Seq Scan on hp0 Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL))) --> Seq Scan on hp2 - Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL))) -> Seq Scan on hp3 Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL))) ! (7 rows) -- hash partitiong pruning doesn't occur with <> operator clauses explain (costs off) select * from hp where a <> 1 and b <> 'xxx'; --- 1494,1502 Append -> Seq Scan on hp0 Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL))) -> Seq Scan on hp3 Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL))) ! (5 rows) -- hash partitiong pruning doesn't occur with <> operator clauses explain (costs off) select * from hp where a <> 1 and b <> 'xxx'; -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] path toward faster partition pruning
Hi Alvaro, On 04/06/2018 12:41 PM, Alvaro Herrera wrote: Here's my proposed patch. Idle thought: how about renaming the "constfalse" argument and variables to "contradictory" or maybe just "contradict"? Passes check-world. New directories, and variable rename seems like a good idea; either is ok. Best regards, Jesper
Re: [HACKERS] path toward faster partition pruning
On Sat, Apr 7, 2018 at 1:41 AM, Alvaro Herrerawrote: > Here's my proposed patch. > > Idle thought: how about renaming the "constfalse" argument and variables > to "contradictory" or maybe just "contradict"? Sounds fine to me. Thanks, Amit
Re: [HACKERS] path toward faster partition pruning
Amit Langote wrote: > Some comments on the code reorganizing part of the patch: > > * Did you intentionally not put PartitionBoundInfoData and its accessor > macros in partition_internal.h. partprune.c would not need to include > partition.h if we do that. Not really. After pondering this some more, I decided to call the new file src/include/partition/partbounds.h; and the other new file will become src/include/partition/partprune.h. This leads naturally to the idea that PartitionBoundInfoData will be in partbounds.h. However, the typedef struct PartitionBoundInfoData *PartitionBoundInfo will have to remain in catalog/partition.h, at least for the time being. > * Also, I wonder why you left PartitionPruneContext in partition.h. Isn't > it better taken out to partprune.h? Yes. > * Why isn't gen_partprune_steps() in partprune.h? I see only > prune_append_rel_partitions() exported out of partprune.c, but the runtime > patch needs gen_partprune_steps() to be called from createplan.c. > * I don't see get_matching_partitions() exported either. Runtime pruning > patch needs that too. True -- both exported. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] path toward faster partition pruning
On Fri, Apr 6, 2018 at 11:38 PM, Alvaro Herrerawrote: > Yeah, there is one sentence there I didn't quite understand and would > like to add it to the rewritten version of the comment before I remove > the whole ifdeffed-out comment. > > * PARTCLAUSE_MATCH_STEPS: *clause_steps set to list of "partition > pruning > * step(s)" generated for the clause due to it being a BoolExpr or a > * ScalarArrayOpExpr that's turned into one > > Exactly what does "ScalarArrayOpExpr that's turned into one" means? > Does it mean we turn SAOP into BoolExpr? Yes, we turn a ScalarArrayOpExpr into a BoolExpr and generate prune step for the latter. Maybe we'll have a base pruning step that can process a ScalarArrayOpExpr directly someday. We create base steps only for OpExpr's for now. > If you look at the rest of the rewritten comment, you'll notice some > things probably need more explaining. Wording suggestions welcome. When I looked at it earlier today, I thought your rewrite looked much better. Thanks, Amit
Re: [HACKERS] path toward faster partition pruning
On Fri, Apr 6, 2018 at 11:54 PM, Alvaro Herrerawrote: > Alvaro Herrera wrote: > >> Yeah. Looking at this function, I noticed it tests for BooleanTest, and >> falls back to checking "not_clause" and a few equals. Does it make >> sense if the clause is a SAOP? I added this assert: >> Assert(IsA(clause, BooleanTest) || >> IsA(clause, BoolExpr) || >> IsA(clause, RelabelType)); >> >> and it failed: >> #3 0x556cf04505db in match_boolean_partition_clause (partopfamily=424, >> clause=0x556cf1041670, partkey=0x556cf1042218, rightop=0x7ffe520ec068) >> at /pgsql/source/master/src/backend/optimizer/util/partprune.c:2159 >> 2159 Assert(IsA(clause, BooleanTest) || >> (gdb) print *clause >> $1 = {type = T_ScalarArrayOpExpr} >> >> I'm not sure whether or not this function can trust that what's incoming >> must absolutely be only those node types. > > So this is what I need for current regression tests not to crash > anymore: > > Assert(IsA(clause, BooleanTest) || >IsA(clause, BoolExpr) || >IsA(clause, RelabelType) || >IsA(clause, ScalarArrayOpExpr) || >IsA(clause, OpExpr) || >IsA(clause, Var)); > > I'm not confident in my ability to write code to handle all possible > cases right now (obviously there must be more cases that are not covered > by current regression tests), so I'll leave it without the assert since > it handles a couple of the useful cases, but I suspect it could stand > some more improvement. > > I guess the question is, how interesting is boolean partitioning? I bet > it has its uses. match_boolean_partition_clauses() exists to capture some cases where an OpExpr (any expression that returns a Boolean for that matter) itself is the partition key: create table boolpart (a int) partition by list ((a = 1)); create table boolpart_t partition of boolpart for values in ('true'); create table boolpart_f partition of boolpart for values in ('false'); explain select * from boolpart where a = 1; QUERY PLAN -- Append (cost=0.00..41.88 rows=13 width=4) -> Seq Scan on boolpart_t (cost=0.00..41.88 rows=13 width=4) Filter: (a = 1) (3 rows) explain select * from boolpart where a = 2; QUERY PLAN -- Append (cost=0.00..41.88 rows=13 width=4) -> Seq Scan on boolpart_f (cost=0.00..41.88 rows=13 width=4) Filter: (a = 2) (3 rows) So, it's not that we're only in position to accept certain node types in match_boolean_partition_clauses(). Before it existed, the pruning didn't work because it wasn't matched to the partition key in the special way that match_boolean_partition_clauses() does and end up in the block in match_clause_to_partition_key() where the OpExpr's are analyzed for normal (non-Boolean) situations, where we extract either the leftop or rightop and try to match it with the partition key. It might as well be: create table boolpart (a int) partition by list ((a in (1, 2))); Requiring us to be position to match an ScalarArrayOpExpr with the partition key. This resembles match_boolean_index_clause(), by the way. Thanks, Amit
Re: [HACKERS] path toward faster partition pruning
BTW, having both key_is_null and key_is_not_null output args to convey a single bit of info is a bit lame. I'm removing it. We could do the same with a single boolean, since the return value already indicates it's a matching IS [NOT] NULL clause; we only need to indicate whether the NOT is present. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] path toward faster partition pruning
David Rowley wrote: > 2. I guess this will be removed before commit. > > +#if 0 > > +#endif Yeah, there is one sentence there I didn't quite understand and would like to add it to the rewritten version of the comment before I remove the whole ifdeffed-out comment. * PARTCLAUSE_MATCH_STEPS: *clause_steps set to list of "partition pruning * step(s)" generated for the clause due to it being a BoolExpr or a * ScalarArrayOpExpr that's turned into one Exactly what does "ScalarArrayOpExpr that's turned into one" means? Does it mean we turn SAOP into BoolExpr? (Yes, I know "#if 0" inside a comment doesn't do anything. It's only documentation for myself.) If you look at the rest of the rewritten comment, you'll notice some things probably need more explaining. Wording suggestions welcome. > 3. This comment seems like a strange thing to write just before > testing if the clause matches the partition key. > > + /* Clause does not match this partition key. */ > + if (equal(leftop, partkey)) > + *rightop = not_clause((Node *) clause) > + ? (Expr *) makeBoolConst(false, false) > + : (Expr *) makeBoolConst(true, false); Yeah. Looking at this function, I noticed it tests for BooleanTest, and falls back to checking "not_clause" and a few equals. Does it make sense if the clause is a SAOP? I added this assert: Assert(IsA(clause, BooleanTest) || IsA(clause, BoolExpr) || IsA(clause, RelabelType)); and it failed: #3 0x556cf04505db in match_boolean_partition_clause (partopfamily=424, clause=0x556cf1041670, partkey=0x556cf1042218, rightop=0x7ffe520ec068) at /pgsql/source/master/src/backend/optimizer/util/partprune.c:2159 2159Assert(IsA(clause, BooleanTest) || (gdb) print *clause $1 = {type = T_ScalarArrayOpExpr} I'm not sure whether or not this function can trust that what's incoming must absolutely be only those node types. > 4. Comment needs removed. > > + * has_default_part - Whether the table has a default partition Done. > The only other thing I noted on this pass is that we could get rid of: > > + /* go check the next clause. */ > + if (unsupported_clause) > + break; > > and just "continue" instead of "break" in all cases apart from case > PARTCLAUSE_UNSUPPORTED: > > it would save a few lines and a single condition. What's there works, > but thought this might be better... Makes sense -- looking. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] path toward faster partition pruning
Hi. On 2018/04/06 7:35, Alvaro Herrera wrote: > I seems pretty clear that putting get_matching_partitions() in > catalog/partition.c is totally the wrong thing; it belongs wholly in > partprune. I think the reason you put it there is that it requires > access to a lot of internals that are static in partition.c. In the > attached not yet cleaned version of the patch, I have moved a whole lot > of what you added to partition.c to partprune.c; and for the functions > and struct declarations that were required to make it work, I created > catalog/partition_internal.h. Yes, I really wanted for most of the new code that this patch adds to land in the planner, especially after Robert's comments here: https://www.postgresql.org/message-id/CA%2BTgmoabi-29Vs8H0xkjtYB%3DcU%2BGVCrNwPz7okpa3KsoLmdEUQ%40mail.gmail.com It would've been nice if we'd gotten the "reorganizing partitioning code" thread resolved sooner. > I changed a lot of code also, but cosmetic changes only. > > I'll clean this up a bit more now, and try to commit shortly (or early > tomorrow); wanted to share current status now in case I have to rush > out. Some comments on the code reorganizing part of the patch: * Did you intentionally not put PartitionBoundInfoData and its accessor macros in partition_internal.h. partprune.c would not need to include partition.h if we do that. * Also, I wonder why you left PartitionPruneContext in partition.h. Isn't it better taken out to partprune.h? I have done that in the attached. * Why isn't gen_partprune_steps() in partprune.h? I see only prune_append_rel_partitions() exported out of partprune.c, but the runtime patch needs gen_partprune_steps() to be called from createplan.c. * I don't see get_matching_partitions() exported either. Runtime pruning patch needs that too. Maybe you've thought something about these two items though. Thanks, Amit diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index d5e91b111d..f89c99f544 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -24,7 +24,6 @@ #include "catalog/indexing.h" #include "catalog/objectaddress.h" #include "catalog/partition.h" -#include "catalog/partition_internal.h" #include "catalog/pg_collation.h" #include "catalog/pg_inherits.h" #include "catalog/pg_inherits_fn.h" diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c index b0638d5aa6..93553b5d13 100644 --- a/src/backend/optimizer/util/partprune.c +++ b/src/backend/optimizer/util/partprune.c @@ -19,8 +19,6 @@ #include "access/hash.h" #include "access/nbtree.h" -#include "catalog/partition.h" -#include "catalog/partition_internal.h" #include "catalog/pg_operator.h" #include "catalog/pg_opfamily.h" #include "catalog/pg_type.h" @@ -35,6 +33,7 @@ #include "parser/parse_coerce.h" #include "parser/parsetree.h" #include "rewrite/rewriteManip.h" +#include "utils/array.h" #include "utils/lsyscache.h" /* diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h index 0bcaa36165..1c17553917 100644 --- a/src/include/catalog/partition.h +++ b/src/include/catalog/partition.h @@ -13,7 +13,7 @@ #ifndef PARTITION_H #define PARTITION_H -#include "fmgr.h" +#include "catalog/partition_internal.h" #include "executor/tuptable.h" #include "nodes/execnodes.h" #include "parser/parse_node.h" @@ -23,65 +23,6 @@ #define HASH_PARTITION_SEED UINT64CONST(0x7A5B22367996DCFD) /* - * PartitionBoundInfoData encapsulates a set of partition bounds. It is - * usually associated with partitioned tables as part of its partition - * descriptor, but may also be used to represent a virtual partitioned - * table such as a partitioned joinrel within the planner. - * - * A list partition datum that is known to be NULL is never put into the - * datums array. Instead, it is tracked using the null_index field. - * - * In the case of range partitioning, ndatums will typically be far less than - * 2 * nparts, because a partition's upper bound and the next partition's lower - * bound are the same in most common cases, and we only store one of them (the - * upper bound). In case of hash partitioning, ndatums will be same as the - * number of partitions. - * - * For range and list partitioned tables, datums is an array of datum-tuples - * with key->partnatts datums each. For hash partitioned tables, it is an array - * of datum-tuples with 2 datums, modulus and remainder, corresponding to a - * given partition. - * - * The datums in datums array are arranged in increasing order as defined by - * functions qsort_partition_rbound_cmp(), qsort_partition_list_value_cmp() and - * qsort_partition_hbound_cmp() for range, list and hash partitioned tables - * respectively. For range and list partitions this simply means that the - * datums in the datums array are arranged in increasing order as defined by - * the partition key's operator classes and collations. - * - * In the case of list
Re: [HACKERS] path toward faster partition pruning
On 6 April 2018 at 12:02, David Rowleywrote: > On 6 April 2018 at 10:35, Alvaro Herrera wrote: > The only other thing I noted on this pass is that we could get rid of: > > + /* go check the next clause. */ > + if (unsupported_clause) > + break; > > and just "continue" instead of "break" in all cases apart from case > PARTCLAUSE_UNSUPPORTED: I should have said remove: + if (unsupported_clause) The "break" would still be required. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] path toward faster partition pruning
On 6 April 2018 at 10:35, Alvaro Herrerawrote: > I changed a lot of code also, but cosmetic changes only. > > I'll clean this up a bit more now, and try to commit shortly (or early > tomorrow); wanted to share current status now in case I have to rush > out. I made a complete pass over the patch you sent. I only noted down the following few things: 1. + * off < 0, meaning the look-up value is smaller that all bounds, that -> than 2. I guess this will be removed before commit. +#if 0 +#endif 3. This comment seems like a strange thing to write just before testing if the clause matches the partition key. + /* Clause does not match this partition key. */ + if (equal(leftop, partkey)) + *rightop = not_clause((Node *) clause) + ? (Expr *) makeBoolConst(false, false) + : (Expr *) makeBoolConst(true, false); 4. Comment needs removed. + * has_default_part - Whether the table has a default partition The only other thing I noted on this pass is that we could get rid of: + /* go check the next clause. */ + if (unsupported_clause) + break; and just "continue" instead of "break" in all cases apart from case PARTCLAUSE_UNSUPPORTED: it would save a few lines and a single condition. What's there works, but thought this might be better... -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] path toward faster partition pruning
> @@ -1717,8 +1691,8 @@ expand_partitioned_rtentry(PlannerInfo *root, > RangeTblEntry *parentrte, >* parentrte already has the root partrel's updatedCols translated to > match >* the attribute ordering of parentrel. >*/ > - if (!*part_cols_updated) > - *part_cols_updated = > + if (!root->partColsUpdated) > + root->partColsUpdated = > has_partition_attrs(parentrel, parentrte->updatedCols, > NULL); Hmm, surely this should be |= to avoid resetting a value set in a previous call to this function? In the previous coding it wasn't necessary because it was a local variable ... (though, isn't it a bit odd to have this in PlannerInfo? seems like it should be in resultRelInfo, but then you already have it there so I suppose this one does *more*) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] path toward faster partition pruning
Amit Langote wrote: > >> 1. Still not sure about RelOptInfo->has_default_part. This flag is > >> only looked at in generate_partition_pruning_steps. The RelOptInfo and > >> the boundinfo is available to look at, it's just that the > >> partition_bound_has_default macro is defined in partition.c rather > >> than partition.h. > > Hmm, it might not be such a bad idea to bring out the > PartitionBoundInfoData into partition.h. If we do that, we won't need the > has_default_part that the patch adds to RelOptInfo. > > In the Attached v50 set, 0002 does that. After looking at this for a moment, I again come to the conclusion that the overall layout of partitioning code and definitions is terrible. But we already know that, and there's a patch in commitfest to improve things. So my intention right now is to hold my nose and get this pushed; we'll fix it afterwards. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] path toward faster partition pruning
Hi. On 2018/04/05 0:45, Jesper Pedersen wrote: > Hi, > > On 04/04/2018 09:29 AM, David Rowley wrote: >> Thanks for updating. I've made a pass over v49 and I didn't find very >> much wrong with it. >> >> The only real bug I found was a missing IsA(rinfo->clause, Const) in >> the pseudoconstant check inside >> generate_partition_pruning_steps_internal. Fixed. >> Most of the changes are comment fixes with a few stylistic changes >> thrown which are pretty much all there just to try to shrink the code >> a line or two or reduce indentation. >> >> I feel pretty familiar with this code now and assuming the attached is >> included I'm happy for someone else, hopefully, a committer to take a >> look at it. Thank you, your changes look good to me. >> I'll leave the following notes: >> >> 1. Still not sure about RelOptInfo->has_default_part. This flag is >> only looked at in generate_partition_pruning_steps. The RelOptInfo and >> the boundinfo is available to look at, it's just that the >> partition_bound_has_default macro is defined in partition.c rather >> than partition.h. Hmm, it might not be such a bad idea to bring out the PartitionBoundInfoData into partition.h. If we do that, we won't need the has_default_part that the patch adds to RelOptInfo. In the Attached v50 set, 0002 does that. >> 2. Don't really like the new isopne variable name. It's not very >> simple to decode, perhaps something like is_not_eq is better? isopne does sound a bit unintelligible. I propose op_is_ne so that it sounds consistent with the preceding member of the struct that's called opno. I want to keep "ne" and not start calling it not_eq, as a few other places use the string "ne" to refer to a similar thing, like: /* inequality */ Datum range_ne(PG_FUNCTION_ARGS) Datum timestamptz_ne_date(PG_FUNCTION_ARGS) Since the field is local to partprune.c, I guess that it's fine as the comment where it's defined tells what it is. >> 3. The part of the code I'm least familiar with is >> get_steps_using_prefix_recurse(). I admit to not having had time to >> fully understand that and consider ways to break it. The purpose of that code is to generate *all* needed steps to be combined using COMBINE_INTERSECT such that the pruning will occur using the most restrictive set of clauses in cases where the same key is referenced in multiple restriction clauses containing non-equality operators. So, for a range partitioned table on (a, b): For a query like explain select * from foo a <= 1 and a <= 3 and b < 5 and b <= 10 Pruning steps generated to be combined with an enclosing INTERSECT step will be as follows: <= (1, 10) < (1, 5) <= (3, 10) < (3, 5) >> Marking as ready for committer. Thank you! > Passes check-world, and CommitFest app has been updated to reflect the > current patch set. Trivial changes attached. Merged these changes. Thanks again Jesper. Attached v50. Thanks, Amit
Re: [HACKERS] path toward faster partition pruning
Hi, On 04/04/2018 09:29 AM, David Rowley wrote: Thanks for updating. I've made a pass over v49 and I didn't find very much wrong with it. The only real bug I found was a missing IsA(rinfo->clause, Const) in the pseudoconstant check inside generate_partition_pruning_steps_internal. Most of the changes are comment fixes with a few stylistic changes thrown which are pretty much all there just to try to shrink the code a line or two or reduce indentation. I feel pretty familiar with this code now and assuming the attached is included I'm happy for someone else, hopefully, a committer to take a look at it. I'll leave the following notes: 1. Still not sure about RelOptInfo->has_default_part. This flag is only looked at in generate_partition_pruning_steps. The RelOptInfo and the boundinfo is available to look at, it's just that the partition_bound_has_default macro is defined in partition.c rather than partition.h. 2. Don't really like the new isopne variable name. It's not very simple to decode, perhaps something like is_not_eq is better? 3. The part of the code I'm least familiar with is get_steps_using_prefix_recurse(). I admit to not having had time to fully understand that and consider ways to break it. Marking as ready for committer. Passes check-world, and CommitFest app has been updated to reflect the current patch set. Trivial changes attached. Best regards, Jesper >From 82f718579dc8e06ab77d76df4ed72f0f03ed4a4e Mon Sep 17 00:00:00 2001 From: jesperpedersenDate: Wed, 4 Apr 2018 11:27:59 -0400 Subject: [PATCH] Trivial changes --- src/backend/catalog/partition.c| 10 +- src/backend/optimizer/util/partprune.c | 8 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index d6bce9f348..7a268e05dc 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -146,7 +146,7 @@ typedef struct PruneStepResult { /* * This contains the offsets of the bounds in a table's boundinfo, each of - * which is a bound whose corresponding partition is selected by a a given + * which is a bound whose corresponding partition is selected by a given * pruning step. */ Bitmapset *bound_offsets; @@ -2026,7 +2026,7 @@ get_matching_hash_bounds(PartitionPruneContext *context, int opstrategy, Datum *values, int nvalues, FmgrInfo *partsupfunc, Bitmapset *nullkeys) { - PruneStepResult *result = palloc0(sizeof(PruneStepResult)); + PruneStepResult *result = (PruneStepResult *) palloc0(sizeof(PruneStepResult)); PartitionBoundInfo boundinfo = context->boundinfo; int *partindices = boundinfo->indexes; int partnatts = context->partnatts; @@ -2093,7 +2093,7 @@ get_matching_list_bounds(PartitionPruneContext *context, int opstrategy, Datum value, int nvalues, FmgrInfo *partsupfunc, Bitmapset *nullkeys) { - PruneStepResult *result = palloc0(sizeof(PruneStepResult)); + PruneStepResult *result = (PruneStepResult *) palloc0(sizeof(PruneStepResult)); PartitionBoundInfo boundinfo = context->boundinfo; int off, minoff, @@ -2147,7 +2147,7 @@ get_matching_list_bounds(PartitionPruneContext *context, return result; } - /* Speical case handling of values coming from a <> operator clause. */ + /* Special case handling of values coming from a <> operator clause. */ if (opstrategy == InvalidStrategy) { /* @@ -2295,7 +2295,7 @@ get_matching_range_bounds(PartitionPruneContext *context, int opstrategy, Datum *values, int nvalues, FmgrInfo *partsupfunc, Bitmapset *nullkeys) { - PruneStepResult *result = palloc0(sizeof(PruneStepResult)); + PruneStepResult *result = (PruneStepResult *) palloc0(sizeof(PruneStepResult)); PartitionBoundInfo boundinfo = context->boundinfo; Oid *partcollation = context->partcollation; int partnatts = context->partnatts; diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c index 75b7232f5d..2d06c1a519 100644 --- a/src/backend/optimizer/util/partprune.c +++ b/src/backend/optimizer/util/partprune.c @@ -433,8 +433,8 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel, } /* - * Fall-through for a NOT clause, which if it's a Boolean clause - * clause, will be handled in match_clause_to_partition_key(). We + * Fall-through for a NOT clause, which if it's a Boolean clause, + * will be handled in match_clause_to_partition_key(). We * currently don't perform any pruning for more complex NOT * clauses. */ @@ -665,7 +665,7 @@ match_clause_to_partition_key(RelOptInfo *rel, */ if (match_boolean_partition_clause(partopfamily, clause, partkey, )) { - *pc = palloc(sizeof(PartClauseInfo)); + *pc = (PartClauseInfo *) palloc(sizeof(PartClauseInfo)); (*pc)->keyno = partkeyidx; /* Do pruning with the Boolean equality operator. */ (*pc)->opno = BooleanEqualOperator; @@
Re: [HACKERS] path toward faster partition pruning
On 4 April 2018 at 19:04, Amit Langotewrote: > On 2018/04/04 14:42, Amit Langote wrote: >> Attached v48. > > I had forgotten to remove the static_pruning parameter I had added in the > v47, because it is no longer used. Static pruning now occurs even if a > step contains all Params, in which case each of > get_matching_hash/list/range_bounds() functions returns offsets of all > non-null datums, because the Params cannot be resolved to actual values > during static pruning. Thanks for updating. I've made a pass over v49 and I didn't find very much wrong with it. The only real bug I found was a missing IsA(rinfo->clause, Const) in the pseudoconstant check inside generate_partition_pruning_steps_internal. Most of the changes are comment fixes with a few stylistic changes thrown which are pretty much all there just to try to shrink the code a line or two or reduce indentation. I feel pretty familiar with this code now and assuming the attached is included I'm happy for someone else, hopefully, a committer to take a look at it. I'll leave the following notes: 1. Still not sure about RelOptInfo->has_default_part. This flag is only looked at in generate_partition_pruning_steps. The RelOptInfo and the boundinfo is available to look at, it's just that the partition_bound_has_default macro is defined in partition.c rather than partition.h. 2. Don't really like the new isopne variable name. It's not very simple to decode, perhaps something like is_not_eq is better? 3. The part of the code I'm least familiar with is get_steps_using_prefix_recurse(). I admit to not having had time to fully understand that and consider ways to break it. Marking as ready for committer. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services v49_fixes_drowley.patch Description: Binary data
Re: [HACKERS] path toward faster partition pruning
On 4 April 2018 at 19:04, Amit Langotewrote: > Attached v49. Thank for including the changes. I'll look now. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] path toward faster partition pruning
On 4 April 2018 at 17:42, Amit Langotewrote: > I'm not sure about the following change in your patch: > > - if (!result->scan_null) > - result->scan_null = step_result->scan_null; > - if (!result->scan_default) > - result->scan_default = step_result->scan_default; > + result->scan_null |= step_result->scan_null; > + result->scan_default |= step_result->scan_default; > > Afaik, |= does bitwise OR, which even if it might give the result we want, > is not a logical operation. I had written the original code using the > following definition of logical OR. > > a OR b = if a then true else b Ok, no problem. I only changed that to make it more compact. For the record we do the same in plenty of over places over the code base: E.g. parse->hasSubLinks |= subquery->hasSubLinks; /* If subquery had any RLS conditions, now main query does too */ parse->hasRowSecurity |= subquery->hasRowSecurity; -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] path toward faster partition pruning
On 4 April 2018 at 16:00, Tom Lanewrote: > David Rowley writes: >> It's true that the const simplification code will generally rewrite >> most NOT(clause) to use the negator operator, but if the operator does >> not have a negator it can't do this. >> ... >> At the moment pruning does not work for this case at all. Perhaps it should? > > It's hard to see why we'd expend extra effort to optimize such situations. > The right answer would invariably be to fix the inadequate operator > definition, because missing the negator link would hobble many other > cases besides this. > > Now if you can show a case where the extra smarts would be useful > without presuming a badly-written opclass, it's a different matter. Okay, well that certainly sounds like less work. In that case, the comment which claims we handle the NOT clauses needs to be updated to mention that we only handle boolean NOT clauses and don't optimize the remainder. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] path toward faster partition pruning
David Rowleywrites: > It's true that the const simplification code will generally rewrite > most NOT(clause) to use the negator operator, but if the operator does > not have a negator it can't do this. > ... > At the moment pruning does not work for this case at all. Perhaps it should? It's hard to see why we'd expend extra effort to optimize such situations. The right answer would invariably be to fix the inadequate operator definition, because missing the negator link would hobble many other cases besides this. Now if you can show a case where the extra smarts would be useful without presuming a badly-written opclass, it's a different matter. regards, tom lane
Re: [HACKERS] path toward faster partition pruning
On 4 April 2018 at 11:22, David Rowleywrote: > On 4 April 2018 at 09:47, David Rowley wrote: >> I think it would be better to just have special handling in >> get_matching_list_bound so that it knows it's performing <> >> elimination. I'd thought about passing some other opstrategy but the >> only safe one I thought to use was InvalidStrategy, which is already >> used by NULL handling. > > I'm currently working up a patch to do this the way I think is best. > > I'll submit it soon and we can review and get your thoughts on it. I've attached a rough cut version of what I think is a good solution for this. It's based on v46, not your latest v47, sorry. This makes get_matching_list_bounds() aware that it's performing the not-equal pruning via the opstrategy which allows it to not return all partitions when there are no values in this case. Instead, we return the NULL partition, so that we later invert that and return everything apart from the NULL partition. A strict clause will allow us that much, even if we can't get the actual value being compared to, at the time. There's also a bunch of other changes in there: 1. Adding missing step_id in copyfuncs.c 2. Simplified including the default partition in a bunch of cases. 3. Made it so scan_default and scan_null are only ever set to true if there's a partition for that. 4. Changed get_matching_*_bounds to return the entire result struct instead of the Bitmapset and pass the remaining bool values back through params. I didn't really like how you'd change this to pass all the bool flags back as params. There's a perfectly good struct there to provide the entire result in a single return value. I know you've disagreed with this already, so would be nice to get a 3rd opinion. 5. Rename get_matching_hash_bound to get_matching_hash_bounds. The LIST and RANGE version of this function both had a plural name. I didn't see any reason for the hash case to be different. Let me know what you think. I've patched the run-time pruning v18 against this and it now passes regression. I need to do a bit more testing on this to ensure it works for all cases, but thought I'd send now as I suspect you're currently around to look. There might be another issue with the patch too, but I'll send a separate email about that. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services v46_fixes_drowley.patch Description: Binary data
Re: [HACKERS] path toward faster partition pruning
On 4 April 2018 at 09:47, David Rowleywrote: > On 4 April 2018 at 00:02, Amit Langote wrote: >> But actually, the presence of only Params in the pruning steps should >> result in the pruning not being invoked at all (at least for the static >> pruning case), thus selecting all partitions containing non-null data. It >> is better to implement that instead of a workaround like scan_all_nonnulls >> side-channel I was talking about. > > I don't think this is quite true. Since we're only using strict > clauses, a list of quals with just Params still means that NULLs can't > match. If you skip the step altogether then won't you have you've lost > the chance at pruning away any NULL-only partition? > > I think it would be better to just have special handling in > get_matching_list_bound so that it knows it's performing <> > elimination. I'd thought about passing some other opstrategy but the > only safe one I thought to use was InvalidStrategy, which is already > used by NULL handling. I'm currently working up a patch to do this the way I think is best. I'll submit it soon and we can review and get your thoughts on it. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] path toward faster partition pruning
On 4 April 2018 at 00:02, Amit Langotewrote: > But actually, the presence of only Params in the pruning steps should > result in the pruning not being invoked at all (at least for the static > pruning case), thus selecting all partitions containing non-null data. It > is better to implement that instead of a workaround like scan_all_nonnulls > side-channel I was talking about. I don't think this is quite true. Since we're only using strict clauses, a list of quals with just Params still means that NULLs can't match. If you skip the step altogether then won't you have you've lost the chance at pruning away any NULL-only partition? I think it would be better to just have special handling in get_matching_list_bound so that it knows it's performing <> elimination. I'd thought about passing some other opstrategy but the only safe one I thought to use was InvalidStrategy, which is already used by NULL handling. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] path toward faster partition pruning
On 2 April 2018 at 17:18, Amit Langotewrote: > On 2018/03/31 0:55, David Rowley wrote: >> explain (analyze, costs off, summary off, timing off) execute q1 (1,2,2,0); >>QUERY PLAN >> >> Result (actual rows=0 loops=1) >>One-Time Filter: false >> (2 rows) > > Hmm. It is the newly added inversion step that's causing this. When > creating a generic plan (that is when the planning happens via > BuildCachedPlan called with boundParams set to NULL), the presence of > Params will cause an inversion step's source step to produce > scan-all-partitions sort of result, which the inversion step dutifully > inverts to a scan-no-partitions result. > > I have tried to attack that problem by handling the > no-values-to-prune-with case using a side-channel to propagate the > scan-all-partitions result through possibly multiple steps. That is, a > base pruning step will set datum_offsets in a PruneStepResult only if > pruning is carried out by actually comparing values with the partition > bounds. If no values were provided (like in the generic plan case), it > will set a scan_all_nonnull flag instead and return without setting > datum_offsets. Combine steps perform their combining duty only if > datum_offset contains a valid value, that is, if scan_all_nonnulls is not set. I'm afraid this is still not correct :-( The following code is not doing the right thing: + case COMBINE_UNION: + foreach(lc1, cstep->source_stepids) + { + int step_id = lfirst_int(lc1); + PruneStepResult *step_result; + + /* + * step_results[step_id] must contain a valid result, + * which is confirmed by the fact that cstep's step_id is + * greater than step_id and the fact that results of the + * individual steps are evaluated in sequence of their + * step_ids. + */ + if (step_id >= cstep->step.step_id) + elog(ERROR, "invalid pruning combine step argument"); + step_result = step_results[step_id]; + Assert(step_result != NULL); + + result->scan_all_nonnull = step_result->scan_all_nonnull; The last line there is not properly performing a union, it just sets the result_scan_all_nonnull to whatever the last step's value was. At the very least it should be |= but I don't really like this new code. Why did you move away from just storing the matching partitions in a Bitmapset? If you want to store all non-null partitions, then why not just set the bits for all non-null partitions? That would cut down on bugs like this since the combining of step results would just be simple unions or intersects. Also, the following code could be made a bit nicer + result = (PruneStepResult *) palloc0(sizeof(PruneStepResult)); + + switch (context->strategy) + { + case PARTITION_STRATEGY_HASH: + result->bound_offsets = get_matching_hash_bound(context, + opstep->opstrategy, + values, nvalues, + partsupfunc, + opstep->nullkeys, + >scan_all_nonnull); Why not allocate the PruneStepResult inside the get_matching_*_bound, that way you wouldn't need all those out parameters to set the bool fields. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] path toward faster partition pruning
On 2018/03/30 22:41, David Rowley wrote: > On 31 March 2018 at 02:00, David Rowleywrote: >> On 31 March 2018 at 01:18, David Rowley wrote: >>> I've noticed that there are no outfuncs or readfuncs for all the new >>> Step types you've added. >>> >>> Also, the copy func does not properly copy the step_id in the base >>> node type. This will remain at 0 after a copyObject() >> >> Attaching it as it may save you some time from doing it yourself. >> Please check it though. > > The attached might be slightly easier to apply. The previous version > was based on top of some other changes I'd been making. Thanks David. I have merged this. Regards, Amit
Re: [HACKERS] path toward faster partition pruning
On 30 March 2018 at 18:38, Amit Langotewrote: > Please find attached the updated patches. There's a bit of a strange case with v45 around prepared statements. I've not debugged this yet, but in case you get there first, here's the case: create table listp (a int, b int) partition by list (a); create table listp_1 partition of listp for values in(1) partition by list (b); create table listp_1_1 partition of listp_1 for values in(1); create table listp_2 partition of listp for values in(2) partition by list (b); create table listp_2_1 partition of listp_2 for values in(2); explain select * from listp where b in(1,2) and 2<>b and 0<>b; -- this one looks fine. QUERY PLAN Append (cost=0.00..49.66 rows=22 width=8) -> Seq Scan on listp_1_1 (cost=0.00..49.55 rows=22 width=8) Filter: ((b = ANY ('{1,2}'::integer[])) AND (2 <> b) AND (0 <> b)) (3 rows) prepare q1 (int,int,int,int) as select * from listp where b in($1,$2) and $3 <> b and $4 <> b; execute q1 (1,2,3,4); execute q1 (1,2,3,4); execute q1 (1,2,3,4); execute q1 (1,2,3,4); execute q1 (1,2,3,4); explain (analyze, costs off, summary off, timing off) execute q1 (1,2,2,0); QUERY PLAN Result (actual rows=0 loops=1) One-Time Filter: false (2 rows) My best guess is that something ate the bits out of a Bitmapset of the matching partitions somewhere. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services