Hi!
Attached a rebased version of the patch. Regards Arne ________________________________ From: Arne Roland Sent: Monday, January 31, 2022 19:14 To: Amit Langote Cc: Zhihong Yu; Alvaro Herrera; Julien Rouhaud; pgsql-hackers Subject: Re: missing indexes in indexlist with partitioned tables Hi! From: Amit Langote <amitlangot...@gmail.com> Sent: Tuesday, January 25, 2022 09:04 Subject: Re: missing indexes in indexlist with partitioned tables > [...] > "partindexlist" really made me think about a list of "partial indexes" > for some reason. I think maybe "partedindexlist" is what you are > looking for; "parted" is commonly used as short for "partitioned" when > naming variables. > > The comment only mentions "further pruning" as to what partitioned > indexes are to be remembered in RelOptInfo, but it's not clear what > that means. It may help to be more specific. Thanks for the feedback! I've changed that. The current version is attached. > Finally, I don't understand why we need a separate field to store > indexes found in partitioned base relations. AFAICS, nothing but the > sites you are interested in (relation_has_unique_index_for() and > rel_supports_distinctness()) would ever bother to look at a > partitioned base relation's indexlist. Do you think putting them into > in indexlist might break something? I have thought about that before. AFAICT there is nothing in core, which breaks. However I am not sure, I want to mix those two kinds of index nodes. First of all the structure is different, partedIndexes don't have physical attributes after all. This is technical implementation detail relating to the current promise, that entries of the indexlist are indexes we can use. And by use, I mean use for statistics or the executor. I'm more concerned about future changes regarding the order and optimization of processing harder here. The order in which we do things in the planner is a bit messy, and I wouldn't mind seeing details about that change. Looking at the current wacky order in the optimizer, I'm not convinced, that nothing will want to have a look at the indexlist, before partitioned tables are unpacked. Since it would be easy to introduce this new variable later, wouldn't mind adding it to the indexlist directly for now. But changing the underlying promise of what it contains, seems noteworthy and more intrusive to me. > > Side note: I personally think the name inhparent is mildly confusing, since > > it's not really about inheritance. I don't have a significantly better idea > > though. > > Partitioned tables are "inheritance parent", so share the same code as > what traditional inheritance parents have always used for planning. I recall that manual partitioning via inheritance, that was cumbersome. Though that minor historical detail was not, what I was referring to. There are a lot of other cases, that cause us to set inhparent. IIRC we use this flag in some ddl commands, which have nothing to do with inheritance. It essentially is used as a variant to skip the indexlist creation. If such hacks weren't there, we could simply check for the relkind and indisunique. Regards Arne
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 7d176e7b00..31cdc3a7f7 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -3502,7 +3502,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel, Assert(list_length(exprlist) == list_length(oprlist)); /* Short-circuit if no indexes... */ - if (rel->indexlist == NIL) + if (rel->indexlist == NIL && rel->partedIndexlist == NIL) return false; /* @@ -3547,7 +3547,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel, return false; /* Examine each index of the relation ... */ - foreach(ic, rel->indexlist) + foreach(ic, list_concat(rel->indexlist, rel->partedIndexlist)) { IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic); int c; diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c index 337f470d58..16ce443ec9 100644 --- a/src/backend/optimizer/plan/analyzejoins.c +++ b/src/backend/optimizer/plan/analyzejoins.c @@ -23,6 +23,7 @@ #include "postgres.h" #include "nodes/nodeFuncs.h" +#include "nodes/nodes.h" #include "optimizer/clauses.h" #include "optimizer/joininfo.h" #include "optimizer/optimizer.h" @@ -598,7 +599,7 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel) */ ListCell *lc; - foreach(lc, rel->indexlist) + foreach(lc, list_concat(rel->indexlist, rel->partedIndexlist)) { IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc); diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index 5012bfe142..7cf85875d5 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -116,8 +116,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, { Index varno = rel->relid; Relation relation; - bool hasindex; List *indexinfos = NIL; + List *partedIndexinfos = NIL; /* * We need not lock the relation since it was already locked, either by @@ -154,17 +154,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, /* Retrieve the parallel_workers reloption, or -1 if not set. */ rel->rel_parallel_workers = RelationGetParallelWorkers(relation, -1); - /* - * Make list of indexes. Ignore indexes on system catalogs if told to. - * Don't bother with indexes for an inheritance parent, either. - */ - if (inhparent || - (IgnoreSystemIndexes && IsSystemRelation(relation))) - hasindex = false; - else - hasindex = relation->rd_rel->relhasindex; - - if (hasindex) + /* Make list of indexes. Ignore indexes on system catalogs if told to. */ + if (!(IgnoreSystemIndexes && IsSystemRelation(relation)) && relation->rd_rel->relhasindex) { List *indexoidlist; LOCKMODE lmode; @@ -213,10 +204,13 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, } /* - * Ignore partitioned indexes, since they are not usable for - * queries. + * Don't add partitioned indexes to the indexlist, since they are + * not usable by the executor. If they are unique add them to the + * partedIndexlist instead, to use for further pruning. That is + * relevant for the join pruning, if the outer relation is partitioned. + * If they aren't that either, simply skip them. */ - if (indexRelation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX) + if (inhparent && (!index->indisunique || indexRelation->rd_rel->relkind != RELKIND_PARTITIONED_INDEX)) { index_close(indexRelation, NoLock); continue; @@ -264,7 +258,40 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, info->indexcollations[i] = indexRelation->rd_indcollation[i]; } - info->relam = indexRelation->rd_rel->relam; + /* + * Fetch the index expressions and predicate, if any. We must + * modify the copies we obtain from the relcache to have the + * correct varno for the parent relation, so that they match up + * correctly against qual clauses. + */ + info->indexprs = RelationGetIndexExpressions(indexRelation); + info->indpred = RelationGetIndexPredicate(indexRelation); + if (info->indexprs && varno != 1) + ChangeVarNodes((Node *) info->indexprs, 1, varno, 0); + if (info->indpred && varno != 1) + ChangeVarNodes((Node *) info->indpred, 1, varno, 0); + + info->unique = index->indisunique; + info->immediate = index->indimmediate; + + /* + * Don't add partitioned indexes to the indexlist, add them to the + * partedIndexlist instead, since they are not usable by the + * executor. + */ + if (indexRelation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX) + { + index_close(indexRelation, NoLock); + partedIndexinfos = lappend(partedIndexinfos, info); + continue; + } + + info->hypothetical = false; + info->indrestrictinfo = NIL; /* set later, in indxpath.c */ + info->predOK = false; /* set later, in indxpath.c */ + + /* Build targetlist using the completed indexprs data */ + info->indextlist = build_index_tlist(root, info, relation); /* We copy just the fields we need, not all of rd_indam */ amroutine = indexRelation->rd_indam; @@ -284,6 +311,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, /* Fetch index opclass options */ info->opclassoptions = RelationGetIndexAttOptions(indexRelation, true); + info->relam = indexRelation->rd_rel->relam; + /* * Fetch the ordering information for the index, if any. */ @@ -370,28 +399,6 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, info->nulls_first = NULL; } - /* - * Fetch the index expressions and predicate, if any. We must - * modify the copies we obtain from the relcache to have the - * correct varno for the parent relation, so that they match up - * correctly against qual clauses. - */ - info->indexprs = RelationGetIndexExpressions(indexRelation); - info->indpred = RelationGetIndexPredicate(indexRelation); - if (info->indexprs && varno != 1) - ChangeVarNodes((Node *) info->indexprs, 1, varno, 0); - if (info->indpred && varno != 1) - ChangeVarNodes((Node *) info->indpred, 1, varno, 0); - - /* Build targetlist using the completed indexprs data */ - info->indextlist = build_index_tlist(root, info, relation); - - info->indrestrictinfo = NIL; /* set later, in indxpath.c */ - info->predOK = false; /* set later, in indxpath.c */ - info->unique = index->indisunique; - info->immediate = index->indimmediate; - info->hypothetical = false; - /* * Estimate the index size. If it's not a partial index, we lock * the number-of-tuples estimate to equal the parent table; if it @@ -441,6 +448,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, } rel->indexlist = indexinfos; + rel->partedIndexlist = partedIndexinfos; rel->statlist = get_relation_statistics(rel, relation); diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index 3b065139e6..b23bbe2097 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -871,6 +871,8 @@ typedef struct RelOptInfo Relids lateral_referencers; /* list of IndexOptInfo */ List *indexlist; + /* list of IndexOptInfo for partioned indexes */ + List *partedIndexlist; /* list of StatisticExtInfo */ List *statlist; /* size estimates derived from pg_class */ diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out index 03926a8413..522cf3ec94 100644 --- a/src/test/regress/expected/partition_join.out +++ b/src/test/regress/expected/partition_join.out @@ -4855,14 +4855,42 @@ SELECT t1.*, t2.* FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a AND t1.b = t2 CREATE TABLE fract_t (id BIGINT, PRIMARY KEY (id)) PARTITION BY RANGE (id); CREATE TABLE fract_t0 PARTITION OF fract_t FOR VALUES FROM ('0') TO ('1000'); CREATE TABLE fract_t1 PARTITION OF fract_t FOR VALUES FROM ('1000') TO ('2000'); +CREATE TABLE fract_x (id BIGINT, PRIMARY KEY (id)) PARTITION BY RANGE (id); +CREATE TABLE fract_x0 PARTITION OF fract_x FOR VALUES FROM ('0') TO ('1000'); +CREATE TABLE fract_x1 PARTITION OF fract_x FOR VALUES FROM ('1000') TO ('2000'); -- insert data INSERT INTO fract_t (id) (SELECT generate_series(0, 1999)); ANALYZE fract_t; --- verify plan; nested index only scans +INSERT INTO fract_x (id) (SELECT generate_series(0, 1999)); +ANALYZE fract_x; +SET max_parallel_workers_per_gather = 0; +SET enable_partitionwise_join = on; +-- verify partition pruning SET max_parallel_workers_per_gather = 0; SET enable_partitionwise_join = on; EXPLAIN (COSTS OFF) -SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10; +SELECT x.id FROM fract_x x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10; + QUERY PLAN +----------------------------------------------------------------- + Limit + -> Append + -> Index Only Scan using fract_x0_pkey on fract_x0 x_1 + -> Index Only Scan using fract_x1_pkey on fract_x1 x_2 +(4 rows) + +EXPLAIN (COSTS OFF) +SELECT x.id FROM fract_x x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10; + QUERY PLAN +----------------------------------------------------------------- + Limit + -> Append + -> Index Only Scan using fract_x0_pkey on fract_x0 x_1 + -> Index Only Scan using fract_x1_pkey on fract_x1 x_2 +(4 rows) + +-- verify plan; nested index only scans +EXPLAIN (COSTS OFF) +SELECT x.id, y.id FROM fract_x x LEFT JOIN fract_t y USING (id) ORDER BY x.id ASC LIMIT 10; QUERY PLAN ----------------------------------------------------------------------- Limit @@ -4870,32 +4898,33 @@ SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10; Sort Key: x.id -> Merge Left Join Merge Cond: (x_1.id = y_1.id) - -> Index Only Scan using fract_t0_pkey on fract_t0 x_1 + -> Index Only Scan using fract_x0_pkey on fract_x0 x_1 -> Index Only Scan using fract_t0_pkey on fract_t0 y_1 -> Merge Left Join Merge Cond: (x_2.id = y_2.id) - -> Index Only Scan using fract_t1_pkey on fract_t1 x_2 + -> Index Only Scan using fract_x1_pkey on fract_x1 x_2 -> Index Only Scan using fract_t1_pkey on fract_t1 y_2 (11 rows) EXPLAIN (COSTS OFF) -SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id DESC LIMIT 10; +SELECT x.id, y.id FROM fract_x x LEFT JOIN fract_t y USING (id) ORDER BY x.id DESC LIMIT 10; QUERY PLAN -------------------------------------------------------------------------------- Limit -> Merge Append Sort Key: x.id DESC -> Nested Loop Left Join - -> Index Only Scan Backward using fract_t0_pkey on fract_t0 x_1 + -> Index Only Scan Backward using fract_x0_pkey on fract_x0 x_1 -> Index Only Scan using fract_t0_pkey on fract_t0 y_1 Index Cond: (id = x_1.id) -> Nested Loop Left Join - -> Index Only Scan Backward using fract_t1_pkey on fract_t1 x_2 + -> Index Only Scan Backward using fract_x1_pkey on fract_x1 x_2 -> Index Only Scan using fract_t1_pkey on fract_t1 y_2 Index Cond: (id = x_2.id) (11 rows) -- cleanup DROP TABLE fract_t; +DROP TABLE fract_x; RESET max_parallel_workers_per_gather; RESET enable_partitionwise_join; diff --git a/src/test/regress/sql/partition_join.sql b/src/test/regress/sql/partition_join.sql index 67f506361f..39e23911d0 100644 --- a/src/test/regress/sql/partition_join.sql +++ b/src/test/regress/sql/partition_join.sql @@ -1148,22 +1148,38 @@ CREATE TABLE fract_t (id BIGINT, PRIMARY KEY (id)) PARTITION BY RANGE (id); CREATE TABLE fract_t0 PARTITION OF fract_t FOR VALUES FROM ('0') TO ('1000'); CREATE TABLE fract_t1 PARTITION OF fract_t FOR VALUES FROM ('1000') TO ('2000'); +CREATE TABLE fract_x (id BIGINT, PRIMARY KEY (id)) PARTITION BY RANGE (id); +CREATE TABLE fract_x0 PARTITION OF fract_x FOR VALUES FROM ('0') TO ('1000'); +CREATE TABLE fract_x1 PARTITION OF fract_x FOR VALUES FROM ('1000') TO ('2000'); -- insert data INSERT INTO fract_t (id) (SELECT generate_series(0, 1999)); ANALYZE fract_t; +INSERT INTO fract_x (id) (SELECT generate_series(0, 1999)); +ANALYZE fract_x; --- verify plan; nested index only scans SET max_parallel_workers_per_gather = 0; SET enable_partitionwise_join = on; +-- verify partition pruning +SET max_parallel_workers_per_gather = 0; +SET enable_partitionwise_join = on; + +EXPLAIN (COSTS OFF) +SELECT x.id FROM fract_x x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10; + +EXPLAIN (COSTS OFF) +SELECT x.id FROM fract_x x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10; + +-- verify plan; nested index only scans EXPLAIN (COSTS OFF) -SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10; +SELECT x.id, y.id FROM fract_x x LEFT JOIN fract_t y USING (id) ORDER BY x.id ASC LIMIT 10; EXPLAIN (COSTS OFF) -SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id DESC LIMIT 10; +SELECT x.id, y.id FROM fract_x x LEFT JOIN fract_t y USING (id) ORDER BY x.id DESC LIMIT 10; -- cleanup DROP TABLE fract_t; +DROP TABLE fract_x; RESET max_parallel_workers_per_gather; RESET enable_partitionwise_join;