Re: RFC: Logging plan of the running query
On 28/9/2023 09:04, torikoshia wrote: On 2023-09-25 18:49, Andrey Lepikhov wrote: On 25/9/2023 14:21, torikoshia wrote: On 2023-09-20 14:39, Lepikhov Andrei wrote: Hmm, as a test, I made sure to call ProcessLogQueryPlanInterrupt() on all CFI using v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch, and then ran the following query but did not cause any problems. ``` =# CREATE TABLE test(); =# CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$ BEGIN EXECUTE format('ALTER TABLE test ADD COLUMN x integer;'); PERFORM pg_sleep(5); END; $$ LANGUAGE plpgsql VOLATILE; =# SELECT ddl(); ``` Is this the case you're worrying about? I didn't find a problem either. I just feel uncomfortable if, at the moment of interruption, we have a descriptor of another query than the query have been executing and holding resources. I think that "descriptor" here refers to ActiveQueryDesc, in which case it is updated at the beginning of ExecutorRun(), so I am wondering if the situation you're worried about would not occur. As you can see, in my example we have the only DDL and no queries with plans. In this case postgres doesn't call ExecutorRun() just because it doesn't have a plan. But locks will be obtained. -- regards, Andrey Lepikhov Postgres Professional
Re: POC: GROUP BY optimization
On 20/7/2023 18:46, Tomas Vondra wrote: 2) estimating quicksort comparisons - This relies on ndistinct estimates, and I'm not sure how much more reliable we can make those. Probably not much :-( Not sure what to do about this, the only thing I can think of is to track "reliability" of the estimates and only do the reordering if we have high confidence in the estimates. That means we'll miss some optimization opportunities, but it should limit the risk. According to this issue, I see two options: 1. Go through the grouping column list and find the most reliable one. If we have a unique column or column with statistics on the number of distinct values, which is significantly more than ndistincts for other grouping columns, we can place this column as the first in the grouping. It should guarantee the reliability of such a decision, isn't it? 2. If we have extended statistics on distinct values and these statistics cover some set of first columns in the grouping list, we can optimize these positions. It also looks reliable. Any thoughts? -- regards, Andrey Lepikhov Postgres Professional
Re: RFC: Logging plan of the running query
On 25/9/2023 14:21, torikoshia wrote: On 2023-09-20 14:39, Lepikhov Andrei wrote: Hmm, as a test, I made sure to call ProcessLogQueryPlanInterrupt() on all CFI using v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch, and then ran the following query but did not cause any problems. ``` =# CREATE TABLE test(); =# CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$ BEGIN EXECUTE format('ALTER TABLE test ADD COLUMN x integer;'); PERFORM pg_sleep(5); END; $$ LANGUAGE plpgsql VOLATILE; =# SELECT ddl(); ``` Is this the case you're worrying about? I didn't find a problem either. I just feel uncomfortable if, at the moment of interruption, we have a descriptor of another query than the query have been executing and holding resources. -- regards, Andrey Lepikhov Postgres Professional
Re: POC: GUC option for skipping shared buffers in core dumps
Hi, The current approach could be better because we want to use it on Windows/MacOS and other systems. So, I've tried to develop another strategy - detaching shmem and DSM blocks before executing the abort() routine. As I can see, it helps and reduces the size of the core file. -- regards, Andrey Lepikhov Postgres Professional diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c index 5810f8825e..4d7bf2c0e4 100644 --- a/src/backend/bootstrap/bootstrap.c +++ b/src/backend/bootstrap/bootstrap.c @@ -325,7 +325,7 @@ BootstrapModeMain(int argc, char *argv[], bool check_only) { SetProcessingMode(NormalProcessing); CheckerModeMain(); - abort(); + pg_abort(); } /* diff --git a/src/backend/main/main.c b/src/backend/main/main.c index ed11e8be7f..34ac874ad0 100644 --- a/src/backend/main/main.c +++ b/src/backend/main/main.c @@ -197,7 +197,7 @@ main(int argc, char *argv[]) else PostmasterMain(argc, argv); /* the functions above should not return */ - abort(); + pg_abort(); } diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 54e9bfb8c4..fc32a6bb1b 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -1469,7 +1469,7 @@ PostmasterMain(int argc, char *argv[]) */ ExitPostmaster(status != STATUS_OK); - abort();/* not reached */ + pg_abort(); /* not reached */ } diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index e250b0567e..3123b388ab 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -385,7 +385,7 @@ WalSndShutdown(void) whereToSendOutput = DestNone; proc_exit(0); - abort();/* keep the compiler quiet */ + pg_abort(); /* keep the compiler quiet */ } /* diff --git a/src/backend/utils/error/assert.c b/src/backend/utils/error/assert.c index 719dd7b309..f422d42440 100644 --- a/src/backend/utils/error/assert.c +++ b/src/backend/utils/error/assert.c @@ -19,6 +19,32 @@ #include #endif +#include +#include + +int core_dump_no_shared_buffers = COREDUMP_INCLUDE_ALL; + +/* + * Remember, at the same time someone can work with shared memory, write them to + * disk and so on. + */ +void +pg_abort(void) +{ + if (core_dump_no_shared_buffers != COREDUMP_INCLUDE_ALL) + { + if (core_dump_no_shared_buffers == COREDUMP_EXCLUDE_ALL || + core_dump_no_shared_buffers == COREDUMP_EXCLUDE_DSM) + dsm_detach_all(); + + if (core_dump_no_shared_buffers == COREDUMP_EXCLUDE_ALL || + core_dump_no_shared_buffers == COREDUMP_EXCLUDE_SHMEM) + PGSharedMemoryDetach(); + } + + abort(); +} + /* * ExceptionalCondition - Handles the failure of an Assert() * @@ -63,5 +89,5 @@ ExceptionalCondition(const char *conditionName, sleep(100); #endif - abort(); + pg_abort(); } diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c index 8e1f3e8521..f6c863ca68 100644 --- a/src/backend/utils/error/elog.c +++ b/src/backend/utils/error/elog.c @@ -601,7 +601,7 @@ errfinish(const char *filename, int lineno, const char *funcname) * children... */ fflush(NULL); - abort(); + pg_abort(); } /* diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index bdb26e2b77..95e205e8d1 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -427,6 +427,14 @@ static const struct config_enum_entry debug_logical_replication_streaming_option {NULL, 0, false} }; +static const struct config_enum_entry core_dump_no_shared_buffers_options[] = { + {"none", COREDUMP_INCLUDE_ALL, false}, + {"shmem", COREDUMP_EXCLUDE_SHMEM, false}, + {"dsm", COREDUMP_EXCLUDE_DSM, false}, + {"all", COREDUMP_EXCLUDE_ALL, false}, + {NULL, 0, false} +}; + StaticAssertDecl(lengthof(ssl_protocol_versions_info) == (PG_TLS1_3_VERSION + 2), "array length mismatch"); @@ -4971,6 +4979,17 @@ struct config_enum ConfigureNamesEnum[] = NULL, NULL, NULL }, + { + {"core_dump_no_shared_buffers", PGC_POSTMASTER, DEVELOPER_OPTIONS, + NULL, + NULL, + GUC_NOT_IN_SAMPLE + }, + _dump_no_shared_buffers, + COREDUMP_INCLUDE_ALL, cor
Re: Postgres picks suboptimal index after building of an extended statistics
On 12/8/2021 06:26, Tomas Vondra wrote: On 8/11/21 2:48 AM, Peter Geoghegan wrote: On Wed, Jun 23, 2021 at 7:19 AM Andrey V. Lepikhov wrote: Ivan Frolkov reported a problem with choosing a non-optimal index during a query optimization. This problem appeared after building of an extended statistics. Any thoughts on this, Tomas? Thanks for reminding me, I missed / forgot about this thread. I agree the current behavior is unfortunate, but I'm not convinced the proposed patch is fixing the right place - doesn't this mean the index costing won't match the row estimates displayed by EXPLAIN? I wonder if we should teach clauselist_selectivity about UNIQUE indexes, and improve the cardinality estimates directly, not just costing for index scans. Also, is it correct that the patch calculates num_sa_scans only when (numIndexTuples >= 0.0)? I can't stop thinking about this issue. It is bizarre when Postgres chooses a non-unique index if a unique index gives us proof of minimum scan. I don't see a reason to teach the clauselist_selectivity() routine to estimate UNIQUE indexes. We add some cycles, but it will work with btree indexes only. Maybe to change compare_path_costs_fuzzily() and add some heuristic, for example: "If selectivity of both paths gives us no more than 1 row, prefer to use a unique index or an index with least selectivity." -- regards, Andrey Lepikhov Postgres Professional
Re: [PATCH] Add extra statistics to explain for Nested Loop
On 31/7/2022 10:49, Julien Rouhaud wrote: On Sat, Jul 30, 2022 at 08:54:33PM +0800, Julien Rouhaud wrote: Anyway, 1% is in my opinion still too much overhead for extensions that won't get any extra information. I have read all the thread and still can't understand something. What valuable data can I find with these extra statistics if no one parameterized node in the plan exists? Also, thinking about min/max time in the explain, I guess it would be necessary in rare cases. Usually, the execution time will correlate to the number of tuples scanned, won't it? So, maybe skip the time boundaries in the instrument structure? In my experience, it is enough to know the total number of tuples bubbled up from a parameterized node to decide further optimizations. Maybe simplify this feature up to the one total_rows field in the case of nloops > 1 and in the presence of parameters? And at the end. If someone wants a lot of additional statistics, why not give them that by extension? It is only needed to add a hook into the point of the node explanation and some efforts to make instrumentation extensible. But here, honestly, I don't have code/ideas so far. -- regards, Andrey Lepikhov Postgres Professional
Re: disfavoring unparameterized nested loops
On 29/9/2022 21:32, Benjamin Coutu wrote: I'd like to revamp this important discussion. As is well described in this fairly recent paper here https://www.vldb.org/pvldb/vol9/p204-leis.pdf (which also looks at Postgres) "estimation errors quickly grow as the number of joins increases, and that these errors are usually the reason for bad plans" - I think we can all get behind that statement. While nested loop joins work great when cardinality estimates are correct, they are notoriously bad when the optimizer underestimates and they degrade very fast in such cases - the good vs. bad here is very asymmetric. On the other hand hash joins degrade much more gracefully - they are considered very robust against underestimation. The above mentioned paper illustrates that all mayor DBMS (including Postgres) tend to underestimate and usually that underestimation increases drastically with the number of joins (see Figures 3+4 of the paper). Now, a simple approach to guarding against bad plans that arise from underestimation could be to use what I would call a nested-loop-conviction-multiplier based on the current depth of the join tree, e.g. for a base table that multiplier would obviously be 1, but would then grow (e.g.) quadratically. That conviction-multiplier would *NOT* be used to skew the cardinality estimates themselves, but rather be applied to the overall nested loop join cost at each particular stage of the plan when comparing it to other more robust join strategies like hash or sort-merge joins. That way when we can be sure to have a good estimate at the bottom of the join tree we treat all things equal, but favor nested loops less and less as we move up the join tree for the sake of robustness. Also, we can expand the multiplier whenever we fall back to using the default cardinality constant as surely all bets are off at that point - we should definitely treat nested loop joins as out of favor in this instance and that could easily be incorporated by simply increasing the conviction-mutliplier. What are your thoughts on this simple idea - is it perhaps too simple? In my practice, parameterized nested loop reduces, sometimes drastically, execution time. If your query touches a lot of tables but extracts only a tiny part of the data, and you have good coverage by indexes, PNL works great. Moreover, I have pondered extending parameterization through subqueries and groupings. What could you say about a different way: hybrid join? In MS SQL Server, they have such a feature [1], and, according to their description, it requires low overhead. They start from HashJoin and switch to NestLoop if the inner input contains too small tuples. It solves the issue, Isn't it? [1] https://techcommunity.microsoft.com/t5/sql-server-blog/introducing-batch-mode-adaptive-joins/ba-p/385411 -- regards, Andrey Lepikhov Postgres Professional
Re: Oversight in reparameterize_path_by_child leading to executor crash
On 23/8/2023 12:37, Richard Guo wrote: If we go with the "tablesample scans can't be reparameterized" approach in the back branches, I'm a little concerned that what if we find more cases in the futrue where we need modify RTEs for reparameterization. So I spent some time seeking and have managed to find one: there might be lateral references in a scan path's restriction clauses, and currently reparameterize_path_by_child fails to adjust them. It may help you somehow: in [1], we designed a feature where the partitionwise join technique can be applied to a JOIN of partitioned and non-partitioned tables. Unfortunately, it is out of community discussions, but we still support it for sharding usage - it is helpful for the implementation of 'global' tables in a distributed configuration. And there we were stuck into the same problem with lateral relids adjustment. So you can build a more general view of the problem with this patch. [1] Asymmetric partition-wise JOIN https://www.postgresql.org/message-id/flat/CAOP8fzaVL_2SCJayLL9kj5pCA46PJOXXjuei6-3aFUV45j4LJQ%40mail.gmail.com -- regards, Andrey Lepikhov Postgres Professional
Re: [PoC] Reducing planning time when tables have many partitions
On 25/8/2023 14:39, Yuya Watari wrote: Hello, On Wed, Aug 9, 2023 at 8:54 PM David Rowley wrote: I think the best way to move this forward is to explore not putting partitioned table partitions in EMs and instead see if we can translate to top-level parent before lookups. This might just be too complex to translate the Exprs all the time and it may add overhead unless we can quickly determine somehow that we don't need to attempt to translate the Expr when the given Expr is already from the top-level parent. If that can't be made to work, then maybe that shows the current patch has merit. Based on your suggestion, I have experimented with not putting child EquivalenceMembers in an EquivalenceClass. I have attached a new patch, v20, to this email. The following is a summary of v20. Working on self-join removal in the thread [1] nearby, I stuck into the problem, which made an additional argument to work in this new direction than a couple of previous ones. With indexing positions in the list of equivalence members, we make some optimizations like join elimination more complicated - it may need to remove some clauses and equivalence class members. For changing lists of derives or ec_members, we should go through all the index lists and fix them, which is a non-trivial operation. [1] https://www.postgresql.org/message-id/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru -- regards, Andrey Lepikhov Postgres Professional
Re: POC: GROUP BY optimization
On 20/7/2023 18:46, Tomas Vondra wrote: On 7/20/23 08:37, Andrey Lepikhov wrote: On 3/10/2022 21:56, Tom Lane wrote: Revert "Optimize order of GROUP BY keys". This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and several follow-on fixes. ... Since we're hard up against the release deadline for v15, let's revert these changes for now. We can always try again later. It may be time to restart the project. As a first step, I rebased the patch on the current master. It wasn't trivial because of some latest optimizations (a29eab, 1349d27 and 8d83a5d). Now, Let's repeat the review and rewrite the current path according to the reasons uttered in the revert commit. 1) procost = 1.0 - I guess we could make this more realistic by doing some microbenchmarks and tuning the costs for the most expensive cases. Ok, some thoughts on this part of the task. As I see, we have not so many different operators: 26 with fixed width and 16 with variable width: SELECT o.oid,oprcode,typname,typlen FROM pg_operator o JOIN pg_type t ON (oprleft = t.oid) WHERE (oprname='<') AND oprleft=oprright AND typlen>0 ORDER BY o.oid; SELECT o.oid,oprcode,typname,typlen FROM pg_operator o JOIN pg_type t ON (oprleft = t.oid) WHERE (oprname='<') AND oprleft=oprright AND typlen<0 ORDER BY o.oid; Benchmarking procedure of types with fixed length can be something like below: CREATE OR REPLACE FUNCTION pass_sort(typ regtype) RETURNS TABLE ( nrows integer, exec_time float ) AS $$ DECLARE data json; step integer; BEGIN SET work_mem='1GB'; FOR step IN 0..3 LOOP SELECT pow(100, step)::integer INTO nrows; DROP TABLE IF EXISTS test CASCADE; EXECUTE format('CREATE TABLE test AS SELECT gs::%s AS x FROM generate_series(1,%s) AS gs;', typ, nrows); EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, FORMAT JSON) SELECT * FROM test ORDER BY (x) DESC INTO data; SELECT data->0->'Execution Time' INTO exec_time; RETURN NEXT; END LOOP; END; $$ LANGUAGE plpgsql VOLATILE; Execution of SELECT * FROM pass_sort('integer'); shows quite linear grow of the execution time. So, using '2.0 * cpu_operator_cost' as a cost for the integer type (as a basis) we can calculate costs for other operators. Variable-width types, i think, could require more complex technique to check dependency on the length. Does this way look worthwhile? -- regards, Andrey Lepikhov Postgres Professional
Re: POC: GROUP BY optimization
Hi, Here is the patch rebased on the current master. Also, I fixed some minor slips and one static analyzer warning. This is just for adding to the next commitfest and enforcing work with this patch. One extra difference in newly added postgres_fdw tests is caused by this patch - see changes in the query plan in attachment. -- regards, Andrey Lepikhov Postgres Professional From 33953655c9ac3f9ec64b80c9f2a2ff38bd178745 Mon Sep 17 00:00:00 2001 From: "Andrey V. Lepikhov" Date: Wed, 13 Sep 2023 11:20:03 +0700 Subject: [PATCH] Explore alternative orderings of group-by pathkeys during optimization. When evaluating a query with a multi-column GROUP BY clause using sort, the cost may depend heavily on the order in which the keys are compared when building the groups. Grouping does not imply any ordering, so we can compare the keys in arbitrary order, and a Hash Agg leverages this. But for Group Agg, we simply compared keys in the order specified in the query. This commit explores alternative ordering of the keys, trying to find a cheaper one. In principle, we might generate grouping paths for all permutations of the keys and leave the rest to the optimizer. But that might get very expensive, so we try to pick only a couple interesting orderings based on both local and global information. When planning the grouping path, we explore statistics (number of distinct values, cost of the comparison function) for the keys and reorder them to minimize comparison costs. Intuitively, it may be better to perform more expensive comparisons (for complex data types, etc.) last because maybe the cheaper comparisons will be enough. Similarly, the higher the cardinality of a key, the lower the probability we'll need to compare more keys. The patch generates and costs various orderings, picking the cheapest ones. The ordering of group keys may interact with other parts of the query, some of which may not be known while planning the grouping. For example, there may be an explicit ORDER BY clause or some other ordering-dependent operation higher up in the query, and using the same ordering may allow using either incremental sort or even eliminating the sort entirely. The patch generates orderings and picks those, minimizing the comparison cost (for various path keys), and then adds orderings that might be useful for operations higher up in the plan (ORDER BY, etc.). Finally, it always keeps the ordering specified in the query, assuming the user might have additional insights. This introduces a new GUC enable_group_by_reordering so that the optimization may be disabled if needed. --- .../postgres_fdw/expected/postgres_fdw.out| 36 +- src/backend/optimizer/path/costsize.c | 363 ++- src/backend/optimizer/path/equivclass.c | 13 +- src/backend/optimizer/path/pathkeys.c | 602 ++ src/backend/optimizer/plan/planner.c | 465 -- src/backend/optimizer/util/pathnode.c | 4 +- src/backend/utils/adt/selfuncs.c | 38 +- src/backend/utils/misc/guc_tables.c | 10 + src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/nodes/pathnodes.h | 10 + src/include/optimizer/cost.h | 4 +- src/include/optimizer/paths.h | 7 + src/include/utils/selfuncs.h | 5 + src/test/regress/expected/aggregates.out | 244 ++- .../regress/expected/incremental_sort.out | 2 +- src/test/regress/expected/join.out| 51 +- src/test/regress/expected/merge.out | 15 +- .../regress/expected/partition_aggregate.out | 44 +- src/test/regress/expected/partition_join.out | 75 +-- src/test/regress/expected/sysviews.out| 3 +- src/test/regress/expected/union.out | 60 +- src/test/regress/sql/aggregates.sql | 99 +++ src/test/regress/sql/incremental_sort.sql | 2 +- 23 files changed, 1771 insertions(+), 382 deletions(-) diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 144c114d0f..63af7feabe 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -2319,18 +2319,21 @@ SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT DISTINCT t2.c1, t3.c1 FROM -- join with pseudoconstant quals EXPLAIN (VERBOSE, COSTS OFF) SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1 AND CURRENT_USER = SESSION_USER) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10; -
Re: Removing unneeded self joins
On 5/7/2023 21:28, Andrey Lepikhov wrote: Hi, During the significant code revision in v.41 I lost some replacement operations. Here is the fix and extra tests to check this in the future. Also, Tom added the JoinDomain structure five months ago, and I added code to replace relids for that place too. One more thing, I found out that we didn't replace SJs, defined by baserestrictinfos if no one self-join clause have existed for the join. Now, it is fixed, and the test has been added. To understand changes readily, see the delta file in the attachment. Here is new patch in attachment. Rebased on current master and some minor gaffes are fixed. -- regards, Andrey Lepikhov Postgres Professional From 70bb5cf3d11b2797f1a9c7b04740435135229d29 Mon Sep 17 00:00:00 2001 From: "Andrey V. Lepikhov" Date: Tue, 12 Sep 2023 18:25:51 +0700 Subject: [PATCH] Remove self-joins. Self Join Elimination (SJE) feature removes an inner join of a plain table to itself in the query tree if is proved that the join can be replaced with a scan without impact to the query result. Self join and inner relation are replaced with the outer in query, equivalence classes and planner info structures. Also, inner restrictlist moves to the outer one with removing duplicated clauses. Thus, this optimization reduces length of range table list (especially it make sense for partitioned relations), reduces number of restriction clauses === selectivity estimations and potentially can improve total planner prediction for the query. The SJE proof based on innerrel_is_unique machinery. We can remove a self-join when for each outer row: 1. At most one inner row matches the join clause. 2. Each matched inner row must be (physically) the same row as the outer one. In this patch we use the next approach to identify a self-join: 1. Collect all mergejoinable join quals which look like a.x = b.x 2. Add to the list above baseretrictinfo of inner table. 3. Check innerrel_is_unique() for the qual list. If it returns false, skip this pair of joining tables. 4. Check uniqueness, proved by the baserestrictinfo clauses. To prove possibility of the self-join elimination inner and outer clauses must have exact match. Relation replacement procedure is not trivial and it is partly combined with the one, used to remove useless left joins. Tests, covering this feature, added to the join.sql. Some regression tests changed due to self-join removal logic. --- doc/src/sgml/config.sgml | 16 + src/backend/optimizer/path/indxpath.c | 38 + src/backend/optimizer/plan/analyzejoins.c | 1094 - src/backend/optimizer/plan/planmain.c |5 + src/backend/utils/misc/guc_tables.c | 22 + src/include/optimizer/paths.h |3 + src/include/optimizer/planmain.h |7 + src/test/regress/expected/equivclass.out | 32 + src/test/regress/expected/join.out| 824 - src/test/regress/expected/sysviews.out|3 +- src/test/regress/expected/updatable_views.out | 17 +- src/test/regress/sql/equivclass.sql | 16 + src/test/regress/sql/join.sql | 360 ++ src/tools/pgindent/typedefs.list |2 + 14 files changed, 2375 insertions(+), 64 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 6bc1b215db..43c07b0d3b 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -5299,6 +5299,22 @@ ANY num_sync ( + enable_self_join_removal (boolean) + + enable_self_join_removal configuration parameter + + + + + Enables or disables the query planner's optimization which analyses +query tree and replaces self joins with semantically equivalent single +scans. Take into consideration only plain tables. +The default is on. + + + + enable_seqscan (boolean) diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 6a93d767a5..508285d1ef 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -3494,6 +3494,21 @@ bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel, List *restrictlist, List *exprlist, List *oprlist) +{ + return relation_has_unique_index_ext(root, rel, restrictlist, + exprlist, oprlist, NULL); +} + +/* + * relation_has_unique_index_ext + * if extra_clauses isn't NULL, return baserestrictinfo clauses which were + * used to derive uniqueness. + */ +bool +relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel, + List *re
Re: Report planning memory in EXPLAIN ANALYZE
On 14/8/2023 06:53, David Rowley wrote: On Thu, 10 Aug 2023 at 20:33, Ashutosh Bapat wrote: My point is what's relevant here is how much net memory planner asked for. But that's not what your patch is reporting. All you're reporting is the difference in memory that's *currently* palloc'd from before and after the planner ran. If we palloc'd 600 exabytes then pfree'd it again, your metric won't change. I'm struggling a bit to understand your goals here. If your goal is to make a series of changes that reduces the amount of memory that's palloc'd at the end of planning, then your patch seems to suit that goal, but per the quote above, it seems you care about how many bytes are palloc'd during planning and your patch does not seem track that. With your patch as it is, to improve the metric you're reporting we could go off and do things like pfree Paths once createplan.c is done, but really, why would we do that? Just to make the "Planning Memory" metric looks better doesn't seem like a worthy goal. Instead, if we reported the context's mem_allocated, then it would give us the flexibility to make changes to the memory context code to have the metric look better. It might also alert us to planner inefficiencies and problems with new code that may cause a large spike in the amount of memory that gets allocated. Now, I'm not saying we should add a patch that shows mem_allocated. I'm just questioning if your proposed patch meets the goals you're trying to achieve. I just suggested that you might want to consider something else as a metric for your memory usage reduction work. Really, the current approach with the final value of consumed memory smooths peaks of memory consumption. I recall examples likewise massive million-sized arrays or reparameterization with many partitions where the optimizer consumes much additional memory during planning. Ideally, to dive into the planner issues, we should have something like a report-in-progress in the vacuum, reporting on memory consumption at each subquery and join level. But it looks too much for typical queries. -- regards, Andrey Lepikhov Postgres Professional
Re: Report planning memory in EXPLAIN ANALYZE
On 10/8/2023 15:33, Ashutosh Bapat wrote: On Wed, Aug 9, 2023 at 8:56 PM David Rowley wrote: On Thu, 10 Aug 2023 at 03:12, Ashutosh Bapat wrote: I guess it depends on the problem you're trying to solve. I had thought you were trying to do some work to reduce the memory used by the planner, so I imagined you wanted this so you could track your progress and also to help ensure we don't make too many mistakes in the future which makes memory consumption worse again. Another way we might go about reducing planner memory is to make changes to the allocators themselves. For example, aset rounds up to the next power of 2. If we decided to do something like add more freelists to double the number so we could add a mid-way point freelist between each power of 2, then we might find it reduces the planner memory by, say 12.5%. If we just tracked what was consumed by palloc() that would appear to save us nothing, but it might actually save us several malloced blocks. This won't just affect planner but every subsystem of PostgreSQL, not just planner, unless we device a new context type for planner. My point is what's relevant here is how much net memory planner asked for. Internally how much memory PostgreSQL allocated seems relevant for the memory context infra as such but not so much for planner alone. If you think that memory allocated is important, I will add both used and (net) allocated memory to EXPLAIN output. As a developer, I like having as much internal info in my EXPLAIN as possible. But this command is designed mainly for users, not core developers. The size of memory allocated usually doesn't make sense for users. On the other hand, developers can watch it easily in different ways, if needed. So, I vote for the 'memory used' metric only. The patch looks good, passes the tests. -- regards, Andrey Lepikhov Postgres Professional
Re: [PoC] Reducing planning time when tables have many partitions
On 7/8/2023 19:15, Ashutosh Bapat wrote: On Mon, Aug 7, 2023 at 2:21 PM Andrey Lepikhov mailto:a.lepik...@postgrespro.ru>> wrote: >> Do you think that the memory measurement patch I have shared in those threads is useful in itself? If so, I will start another proposal to address it. > > For me, who is developing the planner in this thread, the memory > measurement patch is useful. However, most users do not care about > memory usage, so there is room for consideration. For example, making > the metrics optional in EXPLAIN ANALYZE outputs might be better. > +1. Any memory-related info in the output of EXPLAIN ANALYZE makes tests more complex because of architecture dependency. As far as the tests go, the same is the case with planning time and execution time. They change even without changing the architecture. But we have tests which mask the actual values. Something similar will be done to the planning memory. It is a positive thing to access some planner internals from the console, of course. My point is dedicated to the structuration of an EXPLAIN output and is caused by two reasons: 1. I use the EXPLAIN command daily to identify performance issues and the optimiser's weak points. According to the experience, when you have an 'explain analyze' containing more than 100 strings, you try removing unnecessary information to improve observability. It would be better to have the possibility to see an EXPLAIN with different levels of the output details. Flexibility here reduces a lot of manual work, sometimes. 2. Writing extensions and having an explain analyze in the regression test, we must create masking functions just to make the test more stable. That additional work can be avoided with another option, like MEMUSAGE ON/OFF. So, in my opinion, it would be better to introduce this new output data guarded by additional option. I will propose it as a separate patch in the next commitfest and will seek opinions from other hackers. Cool, good news. -- regards, Andrey Lepikhov Postgres Professional
Re: [PoC] Reducing planning time when tables have many partitions
On 7/8/2023 15:19, Yuya Watari wrote: Hello, Thank you for your reply. On Thu, Aug 3, 2023 at 10:29 PM Ashutosh Bapat wrote: If you think that the verification is important to catch bugs, you may want to encapsulate it with an #ifdef .. #endif such that the block within is not compiled by default. See OPTIMIZER_DEBUG for example. In my opinion, verifying the iteration results is only necessary to avoid introducing bugs while developing this patch. The verification is too excessive for regular development of PostgreSQL. I agree that we should avoid a significant degradation in assert enabled builds, so I will consider removing it. I should admit, these checks has helped me during backpatching this feature to pg v.13 (users crave speed up of query planning a lot). Maybe it is a sign of a lack of tests, but in-fact, it already has helped. One more thing: I think, you should add comments to add_child_rel_equivalences() and add_child_join_rel_equivalences() on replacing of: if (bms_is_subset(cur_em->em_relids, top_parent_relids) && !bms_is_empty(cur_em->em_relids)) and if (bms_overlap(cur_em->em_relids, top_parent_relids)) with different logic. What was changed? It will be better to help future developers realize this part of the code more easily by adding some comments. Do you think that the memory measurement patch I have shared in those threads is useful in itself? If so, I will start another proposal to address it. For me, who is developing the planner in this thread, the memory measurement patch is useful. However, most users do not care about memory usage, so there is room for consideration. For example, making the metrics optional in EXPLAIN ANALYZE outputs might be better. +1. Any memory-related info in the output of EXPLAIN ANALYZE makes tests more complex because of architecture dependency. -- regards, Andrey Lepikhov Postgres Professional
Re: [PoC] Reducing planning time when tables have many partitions
On 2/8/2023 13:40, Yuya Watari wrote: As seen from the above, verifying iteration results was the cause of the performance degradation. I agree that we should avoid such degradation because it negatively affects the development of PostgreSQL. Removing the verification when committing this patch is one possible option. You introduced list_ptr_cmp as an extern function of a List, but use it the only under USE_ASSERT_CHECKING ifdef. Maybe you hide it under USE_ASSERT_CHECKING or remove all the stuff? -- regards, Andrey Lepikhov Postgres Professional
Re: [PoC] Reducing planning time when tables have many partitions
On 5/7/2023 16:57, Yuya Watari wrote: Hello, On Fri, Mar 10, 2023 at 5:38 PM Yuya Watari wrote: Thank you for pointing it out. I have attached the rebased version to this email. Recent commits, such as a8c09daa8b [1], have caused conflicts and compilation errors in these patches. I have attached the fixed version to this email. The v19-0004 adds an 'em_index' field representing the index within root->eq_members of the EquivalenceMember. This field is needed to delete EquivalenceMembers when iterating them using the ec_members list instead of the ec_member_indexes. [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a8c09daa8bb1d741bb8b3d31a12752448eb6fb7c Discovering quality of partition pruning at the stage of execution initialization and using your set of patches I have found some dubious results with performance degradation. Look into the test case in attachment. Here is three queries. Execution times: 1 - 8s; 2 - 30s; 3 - 131s (with your patch set). 1 - 5s; 2 - 10s; 3 - 33s (current master). Maybe it is a false alarm, but on my laptop I see this degradation at every launch. -- regards, Andrey Lepikhov Postgres Professional parts-problem.sql Description: application/sql
Re: POC: GROUP BY optimization
On 24/7/2023 16:56, Tomas Vondra wrote: On 7/24/23 04:10, Andrey Lepikhov wrote: On 20/7/2023 18:46, Tomas Vondra wrote: On 7/20/23 08:37, Andrey Lepikhov wrote: On 3/10/2022 21:56, Tom Lane wrote: Revert "Optimize order of GROUP BY keys". This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and several follow-on fixes. ... Since we're hard up against the release deadline for v15, let's revert these changes for now. We can always try again later. It may be time to restart the project. As a first step, I rebased the patch on the current master. It wasn't trivial because of some latest optimizations (a29eab, 1349d27 and 8d83a5d). Now, Let's repeat the review and rewrite the current path according to the reasons uttered in the revert commit. I think the fundamental task is to make the costing more reliable, and the commit message 443df6e2db points out a couple challenges in this area. Not sure how feasible it is to address enough of them ... 1) procost = 1.0 - I guess we could make this more realistic by doing some microbenchmarks and tuning the costs for the most expensive cases. 2) estimating quicksort comparisons - This relies on ndistinct estimates, and I'm not sure how much more reliable we can make those. Probably not much :-( Not sure what to do about this, the only thing I can think of is to track "reliability" of the estimates and only do the reordering if we have high confidence in the estimates. That means we'll miss some optimization opportunities, but it should limit the risk. I read up on the history of this thread. As I see, all the problems mentioned above can be beaten by excluding the new cost model at all. We can sort GROUP BY columns according to the 'ndistinct' value. I see the reason for introducing the cost model in [1]. The main supporting point here is that with this patch, people couldn't optimize the query by themselves, organizing the order of the columns in a more optimal way. But now we have at least the GUC to switch off the behaviour introduced here. Also, some extensions, like the well-known pg_hint_plan, can help with automation. I think the main concern is that if we reorder the group keys and get it wrong, it's a regression. If that happens often (due to costing based on poor stats), it's a problem. Yes, there's a GUC, but that's a rather blunt instrument, unfortunately. I see. My point here is if the ndistinct of one column is much more than the ndistinct of another, it is more probable that this correlation will be kept in the grouping phase. Here we can get some regression, which can be overweighed by the possibility below. So, how about committing of the undoubted part of the feature and working on the cost model in a new thread? But Tom's commit message says this: Worse, to arrive at estimates of the number of calls made to the lower-order-column comparison functions, the code needs to make estimates of the numbers of distinct values of multiple columns, which are necessarily even less trustworthy than per-column stats. so I'm not sure this really counts as "undoubted". Don't try to estimate multiple columns - just sort columns according to the value of ndistinct as a heuristic. I think we should estimate the number of values of multiple columns only if we have extended statistics on these columns. And this can extend the applicability of extended statistics. The suggestion above is just an attempt to gather low-hanging fruit right now. If it is not applicable, we will go a long way. -- regards, Andrey Lepikhov Postgres Professional
Re: POC: GROUP BY optimization
On 20/7/2023 18:46, Tomas Vondra wrote: On 7/20/23 08:37, Andrey Lepikhov wrote: On 3/10/2022 21:56, Tom Lane wrote: Revert "Optimize order of GROUP BY keys". This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and several follow-on fixes. ... Since we're hard up against the release deadline for v15, let's revert these changes for now. We can always try again later. It may be time to restart the project. As a first step, I rebased the patch on the current master. It wasn't trivial because of some latest optimizations (a29eab, 1349d27 and 8d83a5d). Now, Let's repeat the review and rewrite the current path according to the reasons uttered in the revert commit. I think the fundamental task is to make the costing more reliable, and the commit message 443df6e2db points out a couple challenges in this area. Not sure how feasible it is to address enough of them ... 1) procost = 1.0 - I guess we could make this more realistic by doing some microbenchmarks and tuning the costs for the most expensive cases. 2) estimating quicksort comparisons - This relies on ndistinct estimates, and I'm not sure how much more reliable we can make those. Probably not much :-( Not sure what to do about this, the only thing I can think of is to track "reliability" of the estimates and only do the reordering if we have high confidence in the estimates. That means we'll miss some optimization opportunities, but it should limit the risk. I read up on the history of this thread. As I see, all the problems mentioned above can be beaten by excluding the new cost model at all. We can sort GROUP BY columns according to the 'ndistinct' value. I see the reason for introducing the cost model in [1]. The main supporting point here is that with this patch, people couldn't optimize the query by themselves, organizing the order of the columns in a more optimal way. But now we have at least the GUC to switch off the behaviour introduced here. Also, some extensions, like the well-known pg_hint_plan, can help with automation. So, how about committing of the undoubted part of the feature and working on the cost model in a new thread? [1] https://www.postgresql.org/message-id/6d1e0cdb-dde3-f62a-43e2-e90bbd9b0f42%402ndquadrant.com -- regards, Andrey Lepikhov Postgres Professional
Re: POC: GROUP BY optimization
On 20/7/2023 18:46, Tomas Vondra wrote: On 7/20/23 08:37, Andrey Lepikhov wrote: On 3/10/2022 21:56, Tom Lane wrote: Revert "Optimize order of GROUP BY keys". This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and several follow-on fixes. ... Since we're hard up against the release deadline for v15, let's revert these changes for now. We can always try again later. It may be time to restart the project. As a first step, I rebased the patch on the current master. It wasn't trivial because of some latest optimizations (a29eab, 1349d27 and 8d83a5d). Now, Let's repeat the review and rewrite the current path according to the reasons uttered in the revert commit. I think the fundamental task is to make the costing more reliable, and the commit message 443df6e2db points out a couple challenges in this area. Not sure how feasible it is to address enough of them ... 1) procost = 1.0 - I guess we could make this more realistic by doing some microbenchmarks and tuning the costs for the most expensive cases. 2) estimating quicksort comparisons - This relies on ndistinct estimates, and I'm not sure how much more reliable we can make those. Probably not much :-( Not sure what to do about this, the only thing I can think of is to track "reliability" of the estimates and only do the reordering if we have high confidence in the estimates. That means we'll miss some optimization opportunities, but it should limit the risk. For me personally, the most challenging issue is: 3. Imbalance, induced by the cost_sort() changes. It may increase or decrease the contribution of sorting to the total cost compared to other factors and change choice of sorted paths. -- regards, Andrey Lepikhov Postgres Professional
Re: POC: GROUP BY optimization
On 3/10/2022 21:56, Tom Lane wrote: > Revert "Optimize order of GROUP BY keys". > > This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and > several follow-on fixes. > ... > Since we're hard up against the release deadline for v15, let's > revert these changes for now. We can always try again later. It may be time to restart the project. As a first step, I rebased the patch on the current master. It wasn't trivial because of some latest optimizations (a29eab, 1349d27 and 8d83a5d). Now, Let's repeat the review and rewrite the current path according to the reasons uttered in the revert commit. -- regards, Andrey Lepikhov Postgres Professional From 913d55ee887dccfeba360f5f44ed347dd1ba9044 Mon Sep 17 00:00:00 2001 From: "Andrey V. Lepikhov" Date: Fri, 14 Jul 2023 10:29:36 +0700 Subject: [PATCH] When evaluating a query with a multi-column GROUP BY clause using sort, the cost may be heavily dependent on the order in which the keys are compared when building the groups. Grouping does not imply any ordering, so we're allowed to compare the keys in arbitrary order, and a Hash Agg leverages this. But for Group Agg, we simply compared keys in the order as specified in the query. This commit explores alternative ordering of the keys, trying to find a cheaper one. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In principle, we might generate grouping paths for all permutations of the keys, and leave the rest to the optimizer. But that might get very expensive, so we try to pick only a couple interesting orderings based on both local and global information. When planning the grouping path, we explore statistics (number of distinct values, cost of the comparison function) for the keys and reorder them to minimize comparison costs. Intuitively, it may be better to perform more expensive comparisons (for complex data types etc.) last, because maybe the cheaper comparisons will be enough. Similarly, the higher the cardinality of a key, the lower the probability we’ll need to compare more keys. The patch generates and costs various orderings, picking the cheapest ones. The ordering of group keys may interact with other parts of the query, some of which may not be known while planning the grouping. E.g. there may be an explicit ORDER BY clause, or some other ordering-dependent operation, higher up in the query, and using the same ordering may allow using either incremental sort or even eliminate the sort entirely. The patch generates orderings and picks those minimizing the comparison cost (for various pathkeys), and then adds orderings that might be useful for operations higher up in the plan (ORDER BY, etc.). Finally, it always keeps the ordering specified in the query, on the assumption the user might have additional insights. This introduces a new GUC enable_group_by_reordering, so that the optimization may be disabled if needed. --- .../postgres_fdw/expected/postgres_fdw.out| 15 +- src/backend/optimizer/path/costsize.c | 363 ++- src/backend/optimizer/path/equivclass.c | 13 +- src/backend/optimizer/path/pathkeys.c | 602 ++ src/backend/optimizer/plan/planner.c | 467 -- src/backend/optimizer/util/pathnode.c | 4 +- src/backend/utils/adt/selfuncs.c | 38 +- src/backend/utils/misc/guc_tables.c | 10 + src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/nodes/pathnodes.h | 10 + src/include/optimizer/cost.h | 4 +- src/include/optimizer/paths.h | 7 + src/include/utils/selfuncs.h | 5 + src/test/regress/expected/aggregates.out | 244 ++- .../regress/expected/incremental_sort.out | 2 +- src/test/regress/expected/join.out| 51 +- src/test/regress/expected/merge.out | 15 +- .../regress/expected/partition_aggregate.out | 44 +- src/test/regress/expected/partition_join.out | 75 +-- src/test/regress/expected/sysviews.out| 3 +- src/test/regress/expected/union.out | 60 +- src/test/regress/sql/aggregates.sql | 99 +++ src/test/regress/sql/incremental_sort.sql | 2 +- 23 files changed, 1760 insertions(+), 374 deletions(-) diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 852b5b4707..3f3c181ac8 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -9593,13 +9593,16 @@ SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER J -- left outer join + nullable clause EXPLAIN (VERBOSE, COSTS OFF) SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and
Re: POC, WIP: OR-clause support for indexes
On 10/7/2023 15:38, Alena Rybakina wrote: I agreed with the changes. Thank you for your work. I updated patch and added you to the authors. I specified Ranier Vilela as a reviewer. This patch looks much better than earlier. But it definitely needs some covering with tests. As a first simple approximation, here you can see the result of regression tests, where the transformation limit is set to 0. See in the attachment some test changes induced by these diffs. Also, I see some impact of the transformation to other queries: create_view.out: (NOT x > z) > (x <= z) inherit.out: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[]))) to (((a)::text = ANY ('{NULL,cd}'::text[])) OR ((a)::text = 'ab'::text)) Transformations, mentioned above, are correct, of course. But it can be a sign of possible unstable behavior. -- regards, Andrey Lepikhov Postgres Professional diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index 0ddcb880ef..3d9b385c1f 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -43,6 +43,7 @@ /* GUC parameters */ bool Transform_null_equals = false; +intor_transform_limit = 500; static Node *transformExprRecurse(ParseState *pstate, Node *expr); @@ -104,8 +105,6 @@ typedef struct OrClauseGroupEntry Expr *expr; } OrClauseGroupEntry; -static int const_transform_or_limit = 500; - static Node * transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig) { @@ -115,7 +114,7 @@ transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig) /* If this is not an 'OR' expression, skip the transformation */ if (expr_orig->boolop != OR_EXPR || - list_length(expr_orig->args) < const_transform_or_limit) + list_length(expr_orig->args) < or_transform_limit) return transformBoolExpr(pstate, (BoolExpr *) expr_orig); foreach(lc, expr_orig->args) diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index c14456060c..c7ac73ebe0 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -2046,6 +2046,16 @@ struct config_int ConfigureNamesInt[] = 100, 1, MAX_STATISTICS_TARGET, NULL, NULL, NULL }, + { + {"or_transform_limit", PGC_USERSET, QUERY_TUNING_OTHER, + gettext_noop("Transform a sequence of OR clauses to an IN expression."), + gettext_noop("The planner will replace clauses like 'x=c1 OR x=c2 .." +"to the clause 'x IN (c1,c2,...)'") + }, + _transform_limit, + 500, 0, INT_MAX, + NULL, NULL, NULL + }, { {"from_collapse_limit", PGC_USERSET, QUERY_TUNING_OTHER, gettext_noop("Sets the FROM-list size beyond which subqueries " diff --git a/src/include/parser/parse_expr.h b/src/include/parser/parse_expr.h index 7d38ca75f7..891e6a462b 100644 --- a/src/include/parser/parse_expr.h +++ b/src/include/parser/parse_expr.h @@ -17,6 +17,7 @@ /* GUC parameters */ extern PGDLLIMPORT bool Transform_null_equals; +extern PGDLLIMPORT int or_transform_limit; extern Node *transformExpr(ParseState *pstate, Node *expr, ParseExprKind exprKind); diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out index acfd9d1f4f..60e053d217 100644 --- a/src/test/regress/expected/create_index.out +++ b/src/test/regress/expected/create_index.out @@ -1883,6 +1883,46 @@ SELECT count(*) FROM tenk1 10 (1 row) +SET or_transform_limit = 0; +EXPLAIN (COSTS OFF) +SELECT * FROM tenk1 + WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR tenthous = 42); + QUERY PLAN +-- + Index Scan using tenk1_thous_tenthous on tenk1 + Index Cond: ((thousand = 42) AND (tenthous = ANY ('{1,3,42}'::integer[]))) +(2 rows) + +SELECT * FROM tenk1 + WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR tenthous = 42); + unique1 | unique2 | two | four | ten | twenty | hundred | thousand | twothousand | fivethous | tenthous | odd | even | stringu1 | stringu2 | string4 +-+-+-+--+-++-+--+-+---+--+-+--+--+--+- + 42 |5530 | 0 |2 | 2 | 2 | 42 | 42 | 42 |42 | 42 | 84 | 85 | QB | SEIAAA | xx +(1 row) + +EXPLAIN (COSTS OFF) +SELECT count(*) FROM tenk1 + WHERE hundred = 42
Re: Generating code for query jumbling through gen_node_support.pl
On 11/7/2023 12:35, Michael Paquier wrote: On Tue, Jul 11, 2023 at 12:29:29PM +0700, Andrey Lepikhov wrote: I vote for only one method based on a query tree structure. Noted BTW, did you think about different algorithms of queryId generation? Not really, except if you are referring to the possibility of being able to handle differently different portions of the nodes depending on a context given by the callers willing to do a query jumbling computation. (For example, choose to *not* silence the Const nodes, etc.) Yes, I have two requests on different queryId algorithms: 1. With suppressed Const nodes. 2. With replacement of Oids with full names - to give a chance to see the same queryId at different instances for the same query. It is quite trivial to implement, but not easy in support. Auto-generated queryId code can open a way for extensions to have easy-supporting custom queryIds. Extensions can control that at some extent, already. -- Michael -- regards, Andrey Lepikhov Postgres Professional
Re: Generating code for query jumbling through gen_node_support.pl
On 11/7/2023 05:35, Michael Paquier wrote: On Mon, Jan 30, 2023 at 11:48:45AM +0100, Peter Eisentraut wrote: On 27.01.23 03:59, Michael Paquier wrote: At the end, that would be unnoticeable for the average user, I guess, but here are the numbers I get on my laptop :) Personally, I think we do not want the two jumble methods in parallel. Maybe there are other opinions. (Thanks Jonathan for the poke.) Now that we are in mid-beta for 16, it would be a good time to conclude on this open item: "Reconsider a utility_query_id GUC to control if query jumbling of utilities can go through the past string-only mode and the new mode?" In Postgres ~15, utility commands used a hash of the query string to compute their query ID. The current query jumbling code uses a Query instead, like any other queries. I have registered this open item as a self-reminder, mostly in case there would be an argument to have a GUC where users could switch from one mode to another. See here as well for some computation times for each method (table is in ns, wiht millions of iterations): https://www.postgresql.org/message-id/y9eeyindb1acp...@paquier.xyz I still don't think that we need both methods based on these numbers, but there may be more opinions about that? Are people OK if this open item is discarded? I vote for only one method based on a query tree structure. BTW, did you think about different algorithms of queryId generation? Auto-generated queryId code can open a way for extensions to have easy-supporting custom queryIds. -- regards, Andrey Lepikhov Postgres Professional
Re: POC, WIP: OR-clause support for indexes
On 7/7/2023 15:20, Alena Rybakina wrote: because we will provide similar manipulation in this: foreach(l, gentry->consts) { Node *rexpr = (Node *) lfirst(l); rexpr = coerce_to_common_type(pstate, rexpr, scalar_type, "IN"); aexprs = lappend(aexprs, rexpr); } I'm not sure that it should be replaced. In attachment - a bit more corrections to the patch. The most important change - or_list contains already transformed expression subtree. So, I think we don't need to free it at all. -- regards, Andrey Lepikhov Postgres Professional diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index 961ca3e482..f0fd63f05c 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -112,9 +112,6 @@ transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig) List *or_list = NIL; List *groups_list = NIL; ListCell *lc; - Node *result; - BoolExpr *expr; - boolchange_apply = false; /* If this is not an 'OR' expression, skip the transformation */ if (expr_orig->boolop != OR_EXPR || @@ -131,16 +128,7 @@ transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig) OrClauseGroupEntry *gentry; boolconst_is_left = true; - /* -* The first step: checking that the expression consists only of equality. -* We can only do this here, while arg is still row data type, namely A_Expr. -* After applying transformExprRecurce, we already know that it will be OpExr type, -* but checking the expression for equality is already becoming impossible for us. -* Sometimes we have the chance to devide expression into the groups on -* equality and inequality. This is possible if all list items are not at the -* same level of a single BoolExpr expression, otherwise all of them cannot be converted. -*/ - + /* At first, transform the arg and evaluate constant expressions. */ orqual = transformExprRecurse(pstate, (Node *) arg); orqual = coerce_to_boolean(pstate, orqual, "OR"); orqual = eval_const_expressions(NULL, orqual); @@ -151,7 +139,13 @@ transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig) continue; } - /* Detect constant side of the clause */ + /* +* Detect the constant side of the clause. Recall non-constant +* expression can be made not only with Vars, but also with Params, +* which is not bonded with any relation. Thus, we detect the const +* side - if another side is constant too, the orqual couldn't be +* an OpExpr. +*/ if (IsA(get_leftop(orqual), Const)) const_is_left = true; else if (IsA(get_rightop(orqual), Const)) @@ -175,26 +169,14 @@ transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig) } /* -* The next step: transform all the inputs, and detect whether -* any contain Vars. -*/ - if (!contain_vars_of_level(orqual, 0)) - { - /* Again, it's not the expr we can transform */ - or_list = lappend(or_list, orqual); - continue; - } - - ; - - /* - * At this point we definitely have a transformable clause. - * Classify it and add into specific group of clauses, or create new - * group. - * TODO: to manage complexity in the case of many different clauses - * (X1=C1) OR (X2=C2 OR) ... (XN = CN) we could invent something - * like a hash table (htab key ???). - */ + * At this point we definitely have a transformable clause. + * Classify it and add into specific group of clauses, or create new + * group. + * TODO: to manage complexity in the case of many different clauses + * (X1=C1) OR (X2=C2 OR) ... (XN = CN) we could invent something + * like a hash table. But also we believe, that the case of many + * different variable sides is very rare. + */ foreach(lc_groups, groups_list) { OrClauseGroupEntry *v = (OrClauseGroupEntry *) lfirst(lc_groups); @@ -220,20 +202
Re: POC, WIP: OR-clause support for indexes
On 6/7/2023 03:06, Alena Rybakina wrote: The test was performed on the same benchmark database generated by 2 billion values. I corrected this constant in the patch. In attempt to resolve some issues had mentioned in my previous letter I used op_mergejoinable to detect mergejoinability of a clause. Constant side of the expression is detected by call of eval_const_expressions() and check each side on the Const type of node. See 'diff to diff' in attachment. -- regards, Andrey Lepikhov Postgres Professional diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index 26648b0876..67d6f85010 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -111,38 +111,27 @@ transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig) { List *or_list = NIL; List *groups_list = NIL; - ListCell *lc_eargs; + ListCell *lc; Node *result; BoolExpr *expr; boolchange_apply = false; - /* If this is not expression "Or", then will do it the old way. */ + /* If this is not an 'OR' expression, skip the transformation */ if (expr_orig->boolop != OR_EXPR || list_length(expr_orig->args) < const_transform_or_limit) - return transformBoolExpr(pstate, (BoolExpr *)expr_orig); + return transformBoolExpr(pstate, (BoolExpr *) expr_orig); expr = (BoolExpr *)copyObject(expr_orig); - /* - * NOTE: - * It is an OR-clause. So, rinfo->orclause is a BoolExpr node, contains - * a list of sub-restrictinfo args, and rinfo->clause - which is the - * same expression, made from bare clauses. To not break selectivity - * caches and other optimizations, use both: - * - use rinfos from orclause if no transformation needed - * - use bare quals from rinfo->clause in the case of transformation, - * to create new RestrictInfo: in this case we have no options to avoid - * selectivity estimation procedure. - */ - foreach(lc_eargs, expr->args) + foreach(lc, expr->args) { - A_Expr *arg = (A_Expr *) lfirst(lc_eargs); - Node *bare_orarg; + Node *arg = lfirst(lc); + Node *orqual; Node *const_expr; - Node *non_const_expr; + Node *nconst_expr; ListCell *lc_groups; OrClauseGroupEntry *gentry; - boolallow_transformation; + boolconst_is_left = true; /* * The first step: checking that the expression consists only of equality. @@ -154,33 +143,51 @@ transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig) * same level of a single BoolExpr expression, otherwise all of them cannot be converted. */ - if (!arg) - break; + orqual = transformExprRecurse(pstate, (Node *) arg); + orqual = coerce_to_boolean(pstate, orqual, "OR"); + orqual = eval_const_expressions(NULL, orqual); - allow_transformation = (arg->type == T_A_Expr && (arg)->kind == AEXPR_OP && - list_length((arg)->name) >=1 && strcmp(strVal(linitial((arg)->name)), "=") == 0 - ); + if (!IsA(orqual, OpExpr)) + { + or_list = lappend(or_list, orqual); + continue; + } + /* Detect constant side of the clause */ + if (IsA(get_leftop(orqual), Const)) + const_is_left = true; + else if (IsA(get_rightop(orqual), Const)) + const_is_left = false; + else + { + or_list = lappend(or_list, orqual); + continue; + } - bare_orarg = transformExprRecurse(pstate, (Node *)arg); - bare_orarg = coerce_to_boolean(pstate, bare_orarg, "OR"); + /* Get pointers to constant and expression sides of the qual */ + nconst_expr = (const_is_left) ? get_rightop(orqual) : + get_leftop(orqual); + const_expr = (const_is_left) ? get_leftop(orqual
Re: POC, WIP: OR-clause support for indexes
On 6/7/2023 03:06, Alena Rybakina wrote: I corrected this constant in the patch. The patch don't apply cleanly: it contains some trailing spaces. Also, quick glance into the code shows some weak points; 1. transformBoolExprOr should have input type BoolExpr. 2. You can avoid the switch operator at the beginning of the function, because you only need one option. 3. Stale comments: RestrictIinfos definitely not exists at this point. 4. I don't know, you really need to copy the expr or not, but it is better to do as late, as possible. 5. You assume, that leftop is non-constant and rightop - constant. Why? 6.I doubt about equivalence operator. Someone can invent a custom '=' operator with another semantics, than usual. May be better to check mergejoinability? 7. I don't know how to confidently identify constant expressions at this level. So, I guess, You can only merge here expressions like "F(X)=Const", not an 'F(X)=ConstExpression'. See delta.diff with mentioned changes in attachment. -- regards, Andrey Lepikhov Postgres Professional diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index c9193d826f..26648b0876 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -107,41 +107,22 @@ typedef struct OrClauseGroupEntry static int const_transform_or_limit = 500; static Node * -transformBoolExprOr(ParseState *pstate, Expr *expr_orig) +transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig) { List *or_list = NIL; List *groups_list = NIL; ListCell *lc_eargs; Node *result; - BoolExpr *expr = (BoolExpr *)copyObject(expr_orig); - const char *opname; + BoolExpr *expr; boolchange_apply = false; - boolor_statement = false; - - Assert(IsA(expr, BoolExpr)); /* If this is not expression "Or", then will do it the old way. */ - switch (expr->boolop) - { - case AND_EXPR: - opname = "AND"; - break; - case OR_EXPR: - opname = "OR"; - or_statement = true; - break; - case NOT_EXPR: - opname = "NOT"; - break; - default: - elog(ERROR, "unrecognized boolop: %d", (int) expr->boolop); - opname = NULL; /* keep compiler quiet */ - break; - } - - if (!or_statement || list_length(expr->args) < const_transform_or_limit) + if (expr_orig->boolop != OR_EXPR || + list_length(expr_orig->args) < const_transform_or_limit) return transformBoolExpr(pstate, (BoolExpr *)expr_orig); + expr = (BoolExpr *)copyObject(expr_orig); + /* * NOTE: * It is an OR-clause. So, rinfo->orclause is a BoolExpr node, contains @@ -176,15 +157,13 @@ transformBoolExprOr(ParseState *pstate, Expr *expr_orig) if (!arg) break; - allow_transformation = ( - or_statement && - arg->type == T_A_Expr && (arg)->kind == AEXPR_OP && + allow_transformation = (arg->type == T_A_Expr && (arg)->kind == AEXPR_OP && list_length((arg)->name) >=1 && strcmp(strVal(linitial((arg)->name)), "=") == 0 ); bare_orarg = transformExprRecurse(pstate, (Node *)arg); - bare_orarg = coerce_to_boolean(pstate, bare_orarg, opname); + bare_orarg = coerce_to_boolean(pstate, bare_orarg, "OR"); /* * The next step: transform all the inputs, and detect whether any contain @@ -357,14 +336,10 @@ transformBoolExprOr(ParseState *pstate, Expr *expr_orig) } } - if (!change_apply) - { - or_statement = false; - } list_free_deep(groups_list); } - if (or_statement) + if (change_apply) { /* One more trick: assemble correct clause */ expr = list_length(or_list) > 1 ? makeBoolExpr(OR_EXPR, list_copy(or_list), -1) : linitial(or_list); @@ -376,9 +351,8 @@ transformBoolExprOr(ParseState *pstate, Expr *expr_orig) * transformBoolExpr function */ else - { result = transformB
Re: Removing unneeded self joins
Hi, During the significant code revision in v.41 I lost some replacement operations. Here is the fix and extra tests to check this in the future. Also, Tom added the JoinDomain structure five months ago, and I added code to replace relids for that place too. One more thing, I found out that we didn't replace SJs, defined by baserestrictinfos if no one self-join clause have existed for the join. Now, it is fixed, and the test has been added. To understand changes readily, see the delta file in the attachment. -- regards, Andrey Lepikhov Postgres Professional From 8f5a432f6fbbcad1fd2937f33af09e9328690b6b Mon Sep 17 00:00:00 2001 From: "Andrey V. Lepikhov" Date: Tue, 4 Jul 2023 16:07:50 +0700 Subject: [PATCH] Add lost arrangements of relids and varnos. Add the test to check it. Add one more cleaning procedure on JoinDomain relids which was introduced recently with commit 3bef56e. Fix the corner case when we haven't removed SJ if the selfjoinquals list was empty. --- src/backend/optimizer/plan/analyzejoins.c | 15 ++- src/test/regress/expected/join.out| 26 --- src/test/regress/expected/updatable_views.out | 17 +--- src/test/regress/sql/join.sql | 9 +++ 4 files changed, 53 insertions(+), 14 deletions(-) diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c index a93e4ce05c..15234b7a3b 100644 --- a/src/backend/optimizer/plan/analyzejoins.c +++ b/src/backend/optimizer/plan/analyzejoins.c @@ -424,6 +424,8 @@ remove_rel_from_query_common(PlannerInfo *root, RelOptInfo *rel, sjinf->commute_above_r = replace_relid(sjinf->commute_above_r, ojrelid, subst); sjinf->commute_below_l = replace_relid(sjinf->commute_below_l, ojrelid, subst); sjinf->commute_below_r = replace_relid(sjinf->commute_below_r, ojrelid, subst); + + replace_varno((Node *) sjinf->semi_rhs_exprs, relid, subst); } /* @@ -465,6 +467,8 @@ remove_rel_from_query_common(PlannerInfo *root, RelOptInfo *rel, /* ph_needed might or might not become empty */ phv->phrels = replace_relid(phv->phrels, relid, subst); phv->phrels = replace_relid(phv->phrels, ojrelid, subst); + phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst); + phinfo->ph_var->phrels = replace_relid(phinfo->ph_var->phrels, relid, subst); Assert(!bms_is_empty(phv->phrels)); Assert(phv->phnullingrels == NULL); /* no need to adjust */ } @@ -1545,6 +1549,7 @@ update_eclass(EquivalenceClass *ec, int from, int to) } em->em_relids = replace_relid(em->em_relids, from, to); + em->em_jdomain->jd_relids = replace_relid(em->em_jdomain->jd_relids, from, to); /* We only process inner joins */ replace_varno((Node *) em->em_expr, from, to); @@ -2101,7 +2106,7 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids) */ restrictlist = generate_join_implied_equalities(root, joinrelids, inner->relids, - outer, 0); + outer, NULL); /* * Process restrictlist to seperate the self join quals out of @@ -2111,6 +2116,14 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids) split_selfjoin_quals(root, restrictlist, , , inner->relid, outer->relid); + /* +* To enable SJE for the only degenerate case without any self join +* clauses at all, add baserestrictinfo to this list. +* Degenerate case works only if both sides have the same clause. So +* doesn't matter which side to add. +*/ + selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo); + /* * Determine if the inner table can duplicate outer rows. We must * bypass the unique rel cache here since we're possibly using a diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out index b1f43f6ff8..027c356bcc 100644 --- a/src/test/regress/expected/joi
Re: Removing unneeded self joins
On 4/4/2023 02:30, Gregory Stark (as CFM) wrote: On Mon, 6 Mar 2023 at 00:30, Michał Kłeczek wrote: Hi All, I just wanted to ask about the status and plans for this patch. I can see it being stuck at “Waiting for Author” status in several commit tests. Sadly it seems to now be badly in need of a rebase. There are large hunks failing in the guts of analyzejoins.c as well as minor failures elsewhere and lots of offsets which need to be reviewed. I think given the lack of activity it's out of time for this release at this point. I'm moving it ahead to the next CF. Hi, Version 41 is heavily remade of the feature: 1. In previous versions, I tried to reuse remove_rel_from_query() for both left and self-join removal. But for now, I realized that it is a bit different procedures which treat different operations. In this patch, only common stages of the PlannerInfo fixing process are united in one function. 2. Transferring clauses from the removing table to keeping one is more transparent now and contains comments. 3. Equivalence classes update procedure was changed according to David's commit 3373c71. As I see, Tom has added remove_rel_from_eclass since the last v.40 version, and it looks pretty similar to the update_eclass routine in this patch. It passes regression tests, but some questions are still open: 1. Should we look for duplicated or redundant clauses (the same for eclasses) during the clause transfer procedure? On the one side, we reduce the length of restrict lists that can impact planning or executing time. Additionally, we improve the accuracy of cardinality estimation. On the other side, it is one more place that can make planning time much longer in specific cases. It would have been better to avoid calling the equal() function here, but it's the only way to detect duplicated inequality expressions. 2. Could we reuse ChangeVarNodes instead of sje_walker(), merge remove_rel_from_restrictinfo with replace_varno? 3. Also, I still don't finish with the split_selfjoin_quals: some improvements could be made. -- regards, Andrey Lepikhov Postgres Professional From 4a342b9789f5be209318c13fb7ec336fcbd2aee5 Mon Sep 17 00:00:00 2001 From: Andrey Lepikhov Date: Mon, 15 May 2023 09:04:51 +0500 Subject: [PATCH] Remove self-joins. A Self Join Elimination (SJE) feature removes inner join of plain table to itself in a query tree if can be proved that the join can be replaced with a scan. The proof based on innerrel_is_unique machinery. We can remove a self-join when for each outer row: 1. At most one inner row matches the join clause. 2. If the join target list contains any inner vars, an inner row must be (physically) the same row as the outer one. In this patch we use the next approach to identify a self-join: 1. Collect all mergejoinable join quals which look like a.x = b.x 2. Check innerrel_is_unique() for the qual list from (1). If it returns true, then outer row matches only the same row from the inner relation. 3. If uniqueness of outer relation is deduced from baserestrictinfo clause like 'x=const' on unique field(s), check what the inner has the same clause with the same constant(s). 4. Compare RowMarks of inner and outer relations. They must have the same strength. Some regression tests changed due to self-join removal logic. --- doc/src/sgml/config.sgml | 16 + src/backend/optimizer/path/indxpath.c | 38 + src/backend/optimizer/plan/analyzejoins.c | 1077 - src/backend/optimizer/plan/planmain.c |5 + src/backend/utils/misc/guc_tables.c | 22 + src/include/optimizer/paths.h |3 + src/include/optimizer/planmain.h |7 + src/test/regress/expected/equivclass.out | 32 + src/test/regress/expected/join.out| 798 +++ src/test/regress/expected/sysviews.out|3 +- src/test/regress/sql/equivclass.sql | 16 + src/test/regress/sql/join.sql | 351 +++ src/tools/pgindent/typedefs.list |2 + 13 files changed, 2319 insertions(+), 51 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 6262cb7bb2..68215e1093 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -5419,6 +5419,22 @@ ANY num_sync ( + enable_self_join_removal (boolean) + + enable_self_join_removal configuration parameter + + + + + Enables or disables the query planner's optimization which analyses +query tree and replaces self joins with semantically equivalent single +scans. Take into consideration only plain tables. +The default is on. + + + + enable_seqscan (boolean) diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 0065c8992b..57bdc6811f 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -3491,6 +3491,21
Re: Problems with estimating OR conditions, IS NULL on LEFT JOINs
On 24/6/2023 17:23, Tomas Vondra wrote: I really hope what I just wrote makes at least a little bit of sense. Throw in one more example: SELECT i AS id INTO l FROM generate_series(1,10) i; CREATE TABLE r (id int8, v text); INSERT INTO r (id, v) VALUES (1, 't'), (-1, 'f'); ANALYZE l,r; EXPLAIN ANALYZE SELECT * FROM l LEFT OUTER JOIN r ON (r.id = l.id) WHERE r.v IS NULL; Here you can see the same kind of underestimation: Hash Left Join (... rows=500 width=14) (... rows=9 ...) So the eqjoinsel_unmatch_left() function should be modified for the case where nd1 -- regards, Andrey Lepikhov Postgres Professional
Re: eqjoinsel_semi still sucks ...
On 2/5/2012 20:34, Tom Lane wrote: On reflection I think that the idea of clamping ndistinct beforehand is just wrong, and what we ought to do instead is apply a multiplier to the selectivity estimate afterwards. In the case of a base rel we could just multiply by the selectivity of its baserestrictinfo list. For join rels it's a bit harder to guess how much a given input relation might have been decimated, but if the join's estimated size is smaller than the output size of the base rel the correlation var came from, we could multiply by that ratio (on top of whatever correction came from the base rel's restriction clauses). I got stuck in some cases where (due to a tree of filters) the planner underestimates the JOIN just because the ndistinct conveys a huge number to the selectivity estimation formula. However, the estimation of both input relations is made correctly and is limited. I've tried to understand the logic through commits 0d3b231eebf, 97930cf578e and 7f3eba30c9d. But it is still not clear. So, why the idea of clamping ndistinct is terrible in general? Could you explain your reasons a bit more? -- regards, Andrey Lepikhov Postgres Professional
MergeJoin beats HashJoin in the case of multiple hash clauses
Hi, all. Some of my clients use JOIN's with three - four clauses. Quite frequently, I see complaints on unreasonable switch of JOIN algorithm to Merge Join instead of Hash Join. Quick research have shown one weak place - estimation of an average bucket size in final_cost_hashjoin (see q2.sql in attachment) with very conservative strategy. Unlike estimation of groups, here we use smallest ndistinct value across all buckets instead of multiplying them (or trying to make multivariate analysis). It works fine for the case of one clause. But if we have many clauses, and if each has high value of ndistinct, we will overestimate average size of a bucket and, as a result, prefer to use Merge Join. As the example in attachment shows, it leads to worse plan than possible, sometimes drastically worse. I assume, this is done with fear of functional dependencies between hash clause components. But as for me, here we should go the same way, as estimation of groups. The attached patch shows a sketch of the solution. -- regards, Andrey Lepikhov Postgres Professional q2.sql Description: application/sql diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index ef475d95a1..26f26d6a40 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -4033,11 +4033,12 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path, thismcvfreq = restrictinfo->left_mcvfreq; } - if (innerbucketsize > thisbucketsize) - innerbucketsize = thisbucketsize; - if (innermcvfreq > thismcvfreq) - innermcvfreq = thismcvfreq; + innerbucketsize *= thisbucketsize; + innermcvfreq *= thismcvfreq; } + + if (innerbucketsize > virtualbuckets) + innerbucketsize = 1.0 / virtualbuckets; } /*
Re: Removing unneeded self joins
On 3/6/23 10:30, Michał Kłeczek wrote: > Hi All, > > I just wanted to ask about the status and plans for this patch. > I can see it being stuck at “Waiting for Author” status in several > commit tests. > > I think this patch would be really beneficial for us as we heavily use > views to structure out code. > Each view is responsible for providing some calculated values and they > are joined in a query to retrieve the full set of information. > > Not sure how the process works and how I could help (I am absolutely > not capable of helping with coding I am afraid - but could sponsor a > (small :) ) bounty to speed things up). Yes, I am still working on this feature. Because of significant changes in the optimizer code which Tom & Richard had been doing last months, I didn't touch it for a while. But now this work can be continued. Current patch is rebased on current master. Because of the nullable_rels logic, introduced recently, ojrelids were highly spreaded across planner bitmapsets. So, JE logic was changed. But now, I'm less happy with the code. It seems we need to refactor it: 1. According to reports of some performance engineers, the feature can cause overhead ~0.5% on trivial queries without joins at all. We should discover the patch and find the way for quick and cheap return, if the statement contains no one join or, maybe stronger, no one self join. 2. During join elimination we replace clauses like 'x=x' with 'x IS NOT NULL'. It is a weak point because we change clause semantic (mergejoinable to non-mergejoinable, in this example) and could forget consistently change some RestrictInfo fields. 3. In the previous versions we changed the remove_rel_from_query routine trying to use it in both 'Useless outer join' and 'Self join' elimination optimizations. Now, because of the 'ojrelid' field it looks too complicated. Do we need to split this routine again? -- Regards Andrey Lepikhov Postgres Professional From cb4340577dab0e8cf5531e9934f5734fda178490 Mon Sep 17 00:00:00 2001 From: Andrey Lepikhov Date: Mon, 15 May 2023 09:04:51 +0500 Subject: [PATCH] Remove self-joins. A Self Join Elimination (SJE) feature removes inner join of plain table to itself in a query tree if can be proved that the join can be replaced with a scan. The proof based on innerrel_is_unique machinery. We can remove a self-join when for each outer row: 1. At most one inner row matches the join clause. 2. If the join target list contains any inner vars, an inner row must be (physically) the same row as the outer one. In this patch we use the next approach to identify a self-join: 1. Collect all mergejoinable join quals which look like a.x = b.x 2. Check innerrel_is_unique() for the qual list from (1). If it returns true, then outer row matches only the same row from the inner relation. 3. If uniqueness of outer relation is deduced from baserestrictinfo clause like 'x=const' on unique field(s), check what the inner has the same clause with the same constant(s). 4. Compare RowMarks of inner and outer relations. They must have the same strength. Some regression tests changed due to self-join removal logic. --- doc/src/sgml/config.sgml | 16 + src/backend/optimizer/path/indxpath.c | 38 + src/backend/optimizer/plan/analyzejoins.c | 1093 - src/backend/optimizer/plan/planmain.c |5 + src/backend/utils/misc/guc_tables.c | 22 + src/include/optimizer/paths.h |3 + src/include/optimizer/planmain.h |7 + src/test/regress/expected/equivclass.out | 32 + src/test/regress/expected/join.out| 774 +++ src/test/regress/expected/sysviews.out|3 +- src/test/regress/sql/equivclass.sql | 16 + src/test/regress/sql/join.sql | 340 +++ src/tools/pgindent/typedefs.list |2 + 13 files changed, 2313 insertions(+), 38 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 5da74b3c40..b47d11759e 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -5437,6 +5437,22 @@ ANY num_sync ( + enable_self_join_removal (boolean) + + enable_self_join_removal configuration parameter + + + + + Enables or disables the query planner's optimization which analyses +query tree and replaces self joins with semantically equivalent single +scans. Take into consideration only plain tables. +The default is on. + + + + enable_seqscan (boolean) diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 1436dbc2f2..1b1aa9083c 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -3491,6 +3491,21 @@ bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel, List *restrictlist,
Re: [POC] Allow an extension to add data into Query and PlannedStmt nodes
On 30/3/2023 12:57, Julien Rouhaud wrote: Extensions could need to pass some query-related data through all stages of the query planning and execution. As a trivial example, pg_stat_statements uses queryid at the end of execution to save some statistics. One more reason - extensions now conflict on queryid value and the logic of its changing. With this patch, it can be managed. I just had a quick lookc at the patch, and IIUC it doesn't really help on that side, as there's still a single official "queryid" that's computed, stored everywhere and later used by pg_stat_statements (which does then store in additionally to that official queryid). Thank you for the attention! This patch doesn't try to solve the problem of oneness of queryId. In this patch we change pg_stat_statements and it doesn't set 0 into queryId field according to its internal logic. And another extension should do the same - use queryId on your own but not erase it - erase your private copy in the ext_field. Ruthless pgbench benchmark shows that we got some overhead: 1.6% - in default mode 4% - in prepared mode ~0.1% in extended mode. That's a quite significant overhead. But the only reason to accept such a change is to actually use it to store additional data, so it would be interesting to see what the overhead is like once you store at least 2 different values there. Yeah, but as I said earlier, it can be reduced to much smaller value just with simple optimization. Here I intentionally avoid it to discuss the core concept. - Are we need to invent a registration procedure to do away with the names of entries and use some compact integer IDs? Note that the patch as proposed doesn't have any defense for two extensions trying to register something with the same name, or update a stored value, as AddExtensionDataToNode() simply prepend the new value to the list. You can actually update the value by just storing the new value, but it will add a significant overhead to every other extension that want to read another value. Thanks a lot! Patch in attachment implements such an idea - extension can allocate some entries and use these private IDs to add entries. I hope, an extension would prefer to use only one entry for all the data to manage overhead, but still. The API is also quite limited as each stored value has a single identifier. What if your extension needs to store multiple things? Since it's all based on Node you can't really store some custom struct, so you probably have to end up with things like "my_extension.my_val1", "my_extension.my_val2" which isn't great. Main idea here - if an extension holds custom struct and want to pass it along all planning and execution stages it should use extensible node with custom read/write/copy routines. -- regards, Andrey Lepikhov Postgres Professional From ab101322330684e9839e46c26f70ad5462e40dac Mon Sep 17 00:00:00 2001 From: "Andrey V. Lepikhov" Date: Thu, 30 Mar 2023 21:49:32 +0500 Subject: [PATCH 2/2] Add ExtendedData entry registration routines to use entryId instead of symbolic name. --- .../pg_stat_statements/pg_stat_statements.c | 7 +- src/backend/commands/extension.c | 69 +++ src/include/commands/extension.h | 6 +- src/include/nodes/parsenodes.h| 2 +- 4 files changed, 64 insertions(+), 20 deletions(-) diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index fc47661c86..5b21163365 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -109,11 +109,11 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100; !IsA(n, DeallocateStmt)) #define GET_QUERYID(node) \ - (Bigint *) GetExtensionData(node->ext_field, "pg_stat_statements") + (Bigint *) GetExtensionData(node->ext_field, extendedEntryID) #define INSERT_QUERYID(node, queryid) \ castNode(Bigint, AddExtensionDataToNode((Node *) node, \ - "pg_stat_statements", \ + extendedEntryID, \ (Node *) makeBigint((int64) queryid), \ true)) /* @@ -298,6 +298,7 @@ static bool pgss_track_utility = true; /* whether to track utility commands */ static bool pgss_track_planning = false; /* whether to track planning * duration */ static bool pgss_save = true; /* whether to save stats across shutdown */ +static int
[POC] Allow an extension to add data into Query and PlannedStmt nodes
Hi, Previously, we read int this mailing list some controversial opinions on queryid generation and Jumbling technique. Here we don't intend to solve these problems but help an extension at least don't conflict with others on the queryId value. Extensions could need to pass some query-related data through all stages of the query planning and execution. As a trivial example, pg_stat_statements uses queryid at the end of execution to save some statistics. One more reason - extensions now conflict on queryid value and the logic of its changing. With this patch, it can be managed. This patch introduces the structure 'ExtensionData' which allows to manage of a list of entries with a couple of interface functions addExtensionDataToNode() and GetExtensionData(). Keep in mind the possible future hiding of this structure from the public interface. An extension should invent a symbolic key to identify its data. It may invent as many additional keys as it wants but the best option here - is no more than one entry for each extension. Usage of this machinery is demonstrated by the pg_stat_statements example - here we introduced Bigint node just for natively storing of queryId value. Ruthless pgbench benchmark shows that we got some overhead: 1.6% - in default mode 4% - in prepared mode ~0.1% in extended mode. An optimization that avoids copying of queryId by storing it into the node pointer field directly allows to keep this overhead in a range of %0.5 for all these modes but increases complexity. So here we demonstrate not optimized variant. Some questions still cause doubts: - QueryRewrite: should we copy extension fields from the parent parsetree to the rewritten ones? - Are we need to invent a registration procedure to do away with the names of entries and use some compact integer IDs? - Do we need to optimize this structure to avoid a copy for simple data types, for example, inventing something like A_Const? All in all, in our opinion, this issue is tend to grow with an increasing number of extensions that utilize planner and executor hooks for some purposes. So, any thoughts will be useful. -- Regards Andrey Lepikhov Postgres ProfessionalFrom 944ce61d7ff934727240d90ee7620bfb69ad3a5a Mon Sep 17 00:00:00 2001 From: Andrey Lepikhov Date: Wed, 22 Mar 2023 16:59:30 +0500 Subject: [PATCH] Add on more field into Query and PlannedStmt structures to allow an extension to pass some data across all the execution stages of specific instance of the query. --- .../pg_stat_statements/pg_stat_statements.c | 57 -- src/backend/commands/extension.c | 103 ++ src/backend/executor/execParallel.c | 1 + src/backend/nodes/value.c | 12 ++ src/backend/optimizer/plan/planner.c | 1 + src/backend/rewrite/rewriteHandler.c | 6 + src/backend/tcop/postgres.c | 2 + src/include/commands/extension.h | 4 + src/include/nodes/parsenodes.h| 23 src/include/nodes/plannodes.h | 2 + src/include/nodes/value.h | 10 ++ 11 files changed, 209 insertions(+), 12 deletions(-) diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 5285c3f7fa..fc47661c86 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -49,6 +49,7 @@ #include "access/parallel.h" #include "catalog/pg_authid.h" +#include "commands/extension.h" #include "common/hashfn.h" #include "executor/instrument.h" #include "funcapi.h" @@ -107,6 +108,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100; !IsA(n, PrepareStmt) && \ !IsA(n, DeallocateStmt)) +#define GET_QUERYID(node) \ + (Bigint *) GetExtensionData(node->ext_field, "pg_stat_statements") + +#define INSERT_QUERYID(node, queryid) \ + castNode(Bigint, AddExtensionDataToNode((Node *) node, \ + "pg_stat_statements", \ + (Node *) makeBigint((int64) queryid), \ + true)) /* * Extension version number, for supporting older extension versions' objects */ @@ -821,6 +830,13 @@ error: static void pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate) { + Bigint *queryId; + + if ((queryId = GET_QUERYID(query)) == NULL) + queryId = INSERT_QUERYID(query, query->queryId); + + Assert(queryId); + if (prev_post_parse_analyze_hook) prev_post_parse_analyze_hook(pstate, query, jstate); @@ -837,7 +853,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate) { if (pgss_track_utility && !PGSS_HANDLED_UTILITY(query->utilityStmt)) { - query->queryId = UINT64CONST(0); + queryId->val = UINT64CONST(0); return; } } @@ -851,7 +867,7
Re: [PoC] Reducing planning time when tables have many partitions
On 2/6/23 06:47, Yuya Watari wrote: Of course, I'm not sure if my approach in v16-0003 is ideal, but it may help solve your concern above. Since simple_rel_array[0] is no longer necessary with my patch, I removed the setup_append_rel_entry() function in v16-0004. However, to work the patch, I needed to change some assertions in v16-0005. For more details, please see the commit message of v16-0005. After these works, the attached patches passed all regression tests in my environment. Instead of my approach, imitating the following change to get_eclass_indexes_for_relids() is also a possible solution. Ignoring NULL RelOptInfos enables us to avoid the segfault, but we have to adjust EquivalenceMemberIterator to match the result, and I'm not sure if this idea is correct. As I see, You moved the indexes from RelOptInfo to PlannerInfo. May be better to move them into RangeTblEntry instead? -- Regards Andrey Lepikhov Postgres Professional
Re: [PATCH] random_normal function
On 1/19/23 11:01, Tom Lane wrote: Andrey Lepikhov writes: On 1/9/23 23:52, Tom Lane wrote: BTW, if this does bring the probability of failure down to the one-in-a-billion range, I think we could also nuke the whole "ignore:" business, simplifying pg_regress and allowing the random test to be run in parallel with others. We have used the pg_sleep() function to interrupt a query at certain execution phase. But on some platforms, especially in containers, the query can vary execution time in so widely that the pg_sleep() timeout, required to get rid of dependency on a query execution time, has become unacceptable. So, the "ignore" option was the best choice. But does such a test have any actual value? If your test infrastructure ignores the result, what makes you think you'd notice if the test did indeed detect a problem? Yes, it is good to catch SEGFAULTs and assertions which may be frequent because of a logic complexity in the case of timeouts. I think "ignore:" was a kluge we put in twenty-plus years ago when our testing standards were a lot lower, and it's way past time we got rid of it. Ok, I will try to invent alternative way for deep (and stable) testing of timeouts. Thank you for the answer. -- Regards Andrey Lepikhov Postgres Professional
Re: [PATCH] random_normal function
On 1/9/23 23:52, Tom Lane wrote: BTW, if this does bring the probability of failure down to the one-in-a-billion range, I think we could also nuke the whole "ignore:" business, simplifying pg_regress and allowing the random test to be run in parallel with others. With 'ignore' option we get used to cover by tests some of the time dependent features, such as "statement_timeout", "idle_in_transaction_session_timeout", usage of user timeouts in extensions and so on. We have used the pg_sleep() function to interrupt a query at certain execution phase. But on some platforms, especially in containers, the query can vary execution time in so widely that the pg_sleep() timeout, required to get rid of dependency on a query execution time, has become unacceptable. So, the "ignore" option was the best choice. For Now, Do we only have the "isolation tests" option to create stable execution time-dependent tests now? Or I'm not aware about some test machinery? -- Regards Andrey Lepikhov Postgres Professional
Re: POC, WIP: OR-clause support for indexes
On 12/26/15 23:04, Teodor Sigaev wrote: I'd like to present OR-clause support for indexes. Although OR-clauses could be supported by bitmapOR index scan it isn't very effective and such scan lost any order existing in index. We (with Alexander Korotkov) presented results on Vienna's conference this year. In short, it provides performance improvement: EXPLAIN ANALYZE SELECT count(*) FROM tst WHERE id = 5 OR id = 500 OR id = 5000; ... The problems on the way which I see for now: 1 Calculating cost. Right now it's just a simple transformation of costs computed for BitmapOr path. I'd like to hope that's possible and so index's estimation function could be non-touched. So, they could believe that all clauses are implicitly-ANDed 2 I'd like to add such support to btree but it seems that it should be a separated patch. Btree search algorithm doesn't use any kind of stack of pages and algorithm to walk over btree doesn't clear for me for now. 3 I could miss some places which still assumes implicitly-ANDed list of clauses although regression tests passes fine. I support such a cunning approach. But this specific case, you demonstrated above, could be optimized independently at an earlier stage. If to convert: (F(A) = ConstStableExpr_1) OR (F(A) = ConstStableExpr_2) to F(A) IN (ConstStableExpr_1, ConstStableExpr_2) it can be seen significant execution speedup. For example, using the demo.sql to estimate maximum positive effect we see about 40% of execution and 100% of planning speedup. To avoid unnecessary overhead, induced by the optimization, such transformation may be made at the stage of planning (we have cardinality estimations and have pruned partitions) but before creation of a relation scan paths. So, we can avoid planning overhead and non-optimal BitmapOr in the case of many OR's possibly aggravated by many indexes on the relation. For example, such operation can be executed in create_index_paths() before passing rel->indexlist. -- Regards Andrey Lepikhov Postgres Professional demo.sql Description: application/sql
Re: Optimization issue of branching UNION ALL
On 22/12/2022 06:50, Tom Lane wrote: 2. Iterative passes along the append_rel_list for replacing vars in the translated_vars field. I can't grasp real necessity of passing all the append_rel_list during flattening of an union all leaf subquery. No one can reference this leaf, isn't it? After thinking about that for awhile, I believe we can go further: the containing_appendrel is actually the *only* part of the upper query that needs to be adjusted. So we could do something like the attached. This passes check-world, but I don't have quite enough confidence in it to just commit it. Thanks, I have written the letter because of some doubts too. But only one weak point I could imagine - if someday sql standard will be changed. Your code looks better, than previous attempt. -- regards, Andrey Lepikhov Postgres Professional
Optimization issue of branching UNION ALL
Hi, Client report on a corner case have shown up possible minor non-optimality in procedure of transformation of simple UNION ALL statement tree. Complaint is about auto-generated query with 1E4 simple union all's (see t.sh to generate a demo script). The reason: in REL_11_STABLE it is planned and executed in a second, but REL_12_STABLE and beyond makes matters worse: planning of such a query needs tons of gigabytes of RAM. Superficial study revealed possibly unnecessary operations that could be avoided: 1. Walking across a query by calling substitute_phv_relids() even if lastPHId shows that no one phv is presented. 2. Iterative passes along the append_rel_list for replacing vars in the translated_vars field. I can't grasp real necessity of passing all the append_rel_list during flattening of an union all leaf subquery. No one can reference this leaf, isn't it? In attachment you can see some sketch that reduces a number of planner cycles/copyings. -- Regards Andrey Lepikhov Postgres Professional t.sh Description: application/shellscript diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c index 08a73fb9d86..3739e3fe7ba 100644 --- a/src/backend/optimizer/prep/prepjointree.c +++ b/src/backend/optimizer/prep/prepjointree.c @@ -131,7 +131,7 @@ static bool find_dependent_phvs_in_jointree(PlannerInfo *root, Node *node, int varno); static void substitute_phv_relids(Node *node, int varno, Relids subrelids); -static void fix_append_rel_relids(List *append_rel_list, int varno, +static void fix_append_rel_relids(PlannerInfo *root, int varno, Relids subrelids); static Node *find_jointree_node_for_rel(Node *jtnode, int relid); @@ -1156,8 +1156,9 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte, Relids subrelids; subrelids = get_relids_in_jointree((Node *) subquery->jointree, false); - substitute_phv_relids((Node *) parse, varno, subrelids); - fix_append_rel_relids(root->append_rel_list, varno, subrelids); + if (root->glob->lastPHId != 0) + substitute_phv_relids((Node *) parse, varno, subrelids); + fix_append_rel_relids(root, varno, subrelids); } /* @@ -2064,17 +2065,25 @@ perform_pullup_replace_vars(PlannerInfo *root, * use PHVs for safety. (This analysis could be made tighter but it seems * unlikely to be worth much trouble.) */ - foreach(lc, root->append_rel_list) + if (containing_appendrel) { - AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc); - bool save_need_phvs = rvcontext->need_phvs; + bool save_need_phvs = rvcontext->need_phvs; - if (appinfo == containing_appendrel) - rvcontext->need_phvs = false; - appinfo->translated_vars = (List *) - pullup_replace_vars((Node *) appinfo->translated_vars, rvcontext); + rvcontext->need_phvs = false; + containing_appendrel->translated_vars = (List *) + pullup_replace_vars((Node *) containing_appendrel->translated_vars, +rvcontext); rvcontext->need_phvs = save_need_phvs; } + else + { + foreach(lc, root->append_rel_list) + { + AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc); + appinfo->translated_vars = (List *) +pullup_replace_vars((Node *) appinfo->translated_vars, rvcontext); + } + } /* * Replace references in the joinaliasvars lists of join RTEs. @@ -3273,7 +3282,7 @@ remove_result_refs(PlannerInfo *root, int varno, Node *newjtloc) subrelids = get_relids_in_jointree(newjtloc, false); Assert(!bms_is_empty(subrelids)); substitute_phv_relids((Node *) root->parse, varno, subrelids); - fix_append_rel_relids(root->append_rel_list, varno, subrelids); + fix_append_rel_relids(root, varno, subrelids); } /* @@ -3492,7 +3501,7 @@ substitute_phv_relids(Node *node, int varno, Relids subrelids) * We assume we may modify the AppendRelInfo nodes in-place. */ static void -fix_append_rel_relids(List *append_rel_list, int varno, Relids subrelids) +fix_append_rel_relids(PlannerInfo *root, int varno, Relids subrelids) { ListCell *l; int subvarno = -1; @@ -3503,7 +3512,7 @@ fix_append_rel_relids(List *append_rel_list, int varno, Relids subrelids) * AppendRelInfo nodes refer to it. So compute it on first use. Note that * bms_singleton_member will complain if set is not singleton. */ - foreach(l, append_rel_list) + foreach(l, root->append_rel_list) { AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l); @@ -3518,8 +3527,9 @@ fix_append_rel_relids(List *append_rel_list, int varno, Relids subrelids) } /* Also fix up any PHVs in its translated vars */ - substitute_phv_relids((Node *) appinfo->translated_vars, - varno, subrelids); + if (root->glob->lastPHId != 0) + substitute_phv_relids((Node *) appinfo->translated_vars, + varno, subrelids); } }
Re: Removing unneeded self joins
On 12/6/22 23:46, Andres Freund wrote: This doesn't pass the main regression tests due to a plan difference. https://cirrus-ci.com/task/5537518245380096 https://api.cirrus-ci.com/v1/artifact/task/5537518245380096/testrun/build/testrun/regress/regress/regression.diffs diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/join.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out --- /tmp/cirrus-ci-build/src/test/regress/expected/join.out 2022-12-05 19:11:52.453920838 + +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out 2022-12-05 19:15:21.864183651 + @@ -5806,7 +5806,7 @@ Nested Loop Join Filter: (sj_t3.id = sj_t1.id) -> Nested Loop - Join Filter: (sj_t3.id = sj_t2.id) + Join Filter: (sj_t2.id = sj_t3.id) -> Nested Loop Semi Join -> Nested Loop -> HashAggregate This change in the test behaviour is induced by the a5fc4641 "Avoid making commutatively-duplicate clauses in EquivalenceClasses." Nothing special, as I see. Attached patch fixes this. -- Regards Andrey Lepikhov Postgres Professional From 3e546637561bf4c6d195bc7c95b1e05263e797e2 Mon Sep 17 00:00:00 2001 From: "Andrey V. Lepikhov" Date: Wed, 5 Oct 2022 16:58:34 +0500 Subject: [PATCH] Remove self-joins. A Self Join Removal (SJR) feature removes inner join of plain table to itself in a query plan if can be proved that the join can be replaced with a scan. The proof based on innerrel_is_unique machinery. We can remove a self-join when for each outer row: 1. At most one inner row matches the join clauses. 2. If the join target list contains any inner vars, an inner row must be (physically) the same row as the outer one. In this patch we use Rowley's [1] approach to identify a self-join: 1. Collect all mergejoinable join quals which look like a.x = b.x 2. Check innerrel_is_unique() for the qual list from (1). If it returns true, then outer row matches only the same row from the inner relation. 3. If uniqueness of outer relation deduces from baserestrictinfo clause like 'x=const' on unique field(s), check that inner has the same clause with the same constant(s). 4. Compare RowMarks of inner and outer relations. They must have the same strength. Some regression tests changed due to self-join removal logic. [1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com --- doc/src/sgml/config.sgml | 16 + src/backend/optimizer/path/indxpath.c | 38 + src/backend/optimizer/plan/analyzejoins.c | 1046 - src/backend/optimizer/plan/planmain.c |5 + src/backend/utils/misc/guc_tables.c | 22 + src/include/optimizer/paths.h |3 + src/include/optimizer/planmain.h |7 + src/test/regress/expected/equivclass.out | 32 + src/test/regress/expected/join.out| 774 +++ src/test/regress/expected/sysviews.out|3 +- src/test/regress/sql/equivclass.sql | 16 + src/test/regress/sql/join.sql | 340 +++ src/tools/pgindent/typedefs.list |2 + 13 files changed, 2278 insertions(+), 26 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 8e4145979d..2f9948d5f8 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -5311,6 +5311,22 @@ ANY num_sync ( + enable_self_join_removal (boolean) + + enable_self_join_removal configuration parameter + + + + +Enables or disables the query planner's optimization which analyses +query tree and replaces self joins with semantically equivalent single +scans. Take into consideration only plain tables. +The default is on. + + + + enable_seqscan (boolean) diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 914bfd90bc..8d57c68b1f 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -3494,6 +3494,21 @@ bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel, List *restrictlist, List *exprlist, List *oprlist) +{ + return relation_has_unique_index_ext(root, rel, restrictlist, + exprlist, oprlist, NULL); +} + +/* + * relation_has_unique_index_ext + * if extra_clauses isn't NULL, return baserestrictinfo clauses which were + * used to derive uniqueness. + */ +bool +relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel, + List *restrictlist, + List *exprlist, List *oprlist, + List **extra_clauses) { ListCell *ic; @@ -3549,6 +3564,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel, { IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic); int c; + List *exprs = NIL; /* * If the index is not uni
Re: [PoC] Reducing planning time when tables have many partitions
On 2/11/2022 15:27, Yuya Watari wrote: I noticed that the previous patch does not apply to the current HEAD. I attached the rebased version to this email. Looking into find_em_for_rel() changes I see that you replaced if (bms_is_subset(em->em_relids, rel->relids) with assertion statement. According of get_ecmember_indexes(), the em_relids field of returned equivalence members can contain relids, not mentioned in the relation. I don't understand, why it works now? For example, we can sort by t1.x, but have an expression t1.x=t1.y*t2.z. Or I've missed something? If it is not a mistake, maybe to add a comment why assertion here isn't failed? -- regards, Andrey Lepikhov Postgres Professional
Re: [PoC] Reducing planning time when tables have many partitions
On 2/11/2022 15:27, Yuya Watari wrote: Hello, I noticed that the previous patch does not apply to the current HEAD. I attached the rebased version to this email. I'm still in review of your patch now. At most it seems ok, but are you really need both eq_sources and eq_derives lists now? As I see, everywhere access to these lists guides by eclass_source_indexes and eclass_derive_indexes correspondingly. Maybe to merge them? -- regards, Andrey Lepikhov Postgres Professional
Re: explain analyze rows=%.0f
On 22/7/2022 16:47, Amit Kapila wrote: I feel the discussion has slightly deviated which makes it unclear whether this patch is required or not? After quick review I want to express my thoughts. At first, We have been waiting for this feature for years. Often clients give an explain to us where we see something like: "rows=0, loops=100". Without verbose mode, I can't even understand whether this node produces any rows or not. So, I think this feature is useful for parameterized plans mostly. Also, printing two decimal digits or even three isn't meaningful - sometimes we have a plan where number of loops is about 1E6 or even 1E7, but number of real rows is equal 100 or 1000. To overcome this issue, I see two options: 1. Show the exact number of tuples without division by loops (fair case but invasive and bad for automation tools). 2. Show rows in scientific format like X.XXEXX. I vote for second option. -- regards, Andrey Lepikhov Postgres Professional
Re: A new strategy for pull-up correlated ANY_SUBLINK
On 2/11/2022 09:02, Andy Fan wrote: In the past we pull-up the ANY-sublink with 2 steps, the first step is to pull up the sublink as a subquery, and the next step is to pull up the subquery if it is allowed. The benefits of this method are obvious, pulling up the subquery has more requirements, even if we can just finish the first step, we still get huge benefits. However the bad stuff happens if varlevelsup = 1 involves, things fail at step 1. convert_ANY_sublink_to_join ... if (contain_vars_of_level((Node *) subselect, 1)) return NULL; In this patch we distinguish the above case and try to pull-up it within one step if it is helpful, It looks to me that what we need to do is just transform it to EXIST-SUBLINK. Maybe code [1] would be useful for your purposes/tests. We implemented flattening of correlated subqueries for simple N-J case, but found out that in some cases the flattening isn't obvious the best solution - we haven't info about cardinality/cost estimations and can do worse. I guess, for more complex flattening procedure (with aggregate function in a targetlist of correlated subquery) situation can be even worse. Maybe your idea has such corner cases too ? [1] https://www.postgresql.org/message-id/flat/CALNJ-vTa5VgvV1NPRHnypdnbx-fhDu7vWp73EkMUbZRpNHTYQQ%40mail.gmail.com -- regards, Andrey Lepikhov Postgres Professional
Re: Fast COPY FROM based on batch insert
On 28/10/2022 16:12, Etsuro Fujita wrote: On Thu, Oct 13, 2022 at 6:58 PM Etsuro Fujita wrote: I have committed the patch after tweaking comments a little bit further. I think there is another patch that improves performance of COPY FROM for foreign tables using COPY FROM STDIN, but if Andrey (or anyone else) want to work on it again, I think it would be better to create a new CF entry for it (and start a new thread for it). So I plan to close this in the November CF unless they think otherwise. Anyway, thanks for the patch, Andrey! Thanks for reviewing, Ian and Zhihong! Thanks, I studied performance of this code in comparison to bulk INSERTions. This patch seems to improve speed of insertion by about 20%. Also, this patch is very invasive. So, I don't have any plans to work on it now. -- regards, Andrey Lepikhov Postgres Professional
Re: Fast COPY FROM based on batch insert
On 10/12/22 07:56, Etsuro Fujita wrote: On Tue, Oct 11, 2022 at 3:06 PM Andrey Lepikhov wrote: I reviewed the patch one more time. Only one question: bistate and ri_FdwRoutine are strongly bounded. Maybe to add some assertion on (ri_FdwRoutine XOR bistate) ? Just to prevent possible errors in future. You mean the bistate member of CopyMultiInsertBuffer? Yes We do not use that member at all for foreign tables, so the patch avoids initializing that member in CopyMultiInsertBufferInit() when called for a foreign table. So we have bistate = NULL for foreign tables (and bistate != NULL for plain tables), as you mentioned above. I think it is a good idea to add such assertions. How about adding them to CopyMultiInsertBufferFlush() and CopyMultiInsertBufferCleanup() like the attached? In the attached I updated comments a bit further as well. Yes, quite enough. -- Regards Andrey Lepikhov Postgres Professional
Re: Fast COPY FROM based on batch insert
On 10/7/22 11:18, Etsuro Fujita wrote: On Tue, Sep 27, 2022 at 6:03 PM Etsuro Fujita wrote: I will review the patch a bit more, but I feel that it is in good shape. One thing I noticed is this bit added to CopyMultiInsertBufferFlush() to run triggers on the foreign table. + /* Run AFTER ROW INSERT triggers */ + if (resultRelInfo->ri_TrigDesc != NULL && + (resultRelInfo->ri_TrigDesc->trig_insert_after_row || +resultRelInfo->ri_TrigDesc->trig_insert_new_table)) + { + Oid relid = RelationGetRelid(resultRelInfo->ri_RelationDesc); + + for (i = 0; i < inserted; i++) + { + TupleTableSlot *slot = rslots[i]; + + /* +* AFTER ROW Triggers might reference the tableoid column, +* so (re-)initialize tts_tableOid before evaluating them. +*/ + slot->tts_tableOid = relid; + + ExecARInsertTriggers(estate, resultRelInfo, +slot, NIL, +cstate->transition_capture); + } + } Since foreign tables cannot have transition tables, we have trig_insert_new_table=false. So I simplified the if test and added an assertion ensuring trig_insert_new_table=false. Attached is a new version of the patch. I tweaked some comments a bit as well. I think the patch is committable. So I plan on committing it next week if there are no objections. I reviewed the patch one more time. Only one question: bistate and ri_FdwRoutine are strongly bounded. Maybe to add some assertion on (ri_FdwRoutine XOR bistate) ? Just to prevent possible errors in future. -- Regards Andrey Lepikhov Postgres Professional
Re: document the need to analyze partitioned tables
On 10/5/22 13:37, Laurenz Albe wrote: On Mon, 2022-03-28 at 15:05 +0200, Tomas Vondra wrote: I've pushed the last version, and backpatched it to 10 (not sure I'd call it a bugfix, but I certainly agree with Justin it's worth mentioning in the docs, even on older branches). I'd like to suggest an improvement to this. The current wording could be read to mean that dead tuples won't get cleaned up in partitioned tables. By the way, where are the statistics of a partitioned tables used? The actual tables scanned are always the partitions, and in the execution plans that I have seen, the optimizer always used the statistics of the partitions. For example, it is used to estimate selectivity of join clause: CREATE TABLE test (id integer, val integer) PARTITION BY hash (id); CREATE TABLE test_0 PARTITION OF test FOR VALUES WITH (modulus 2, remainder 0); CREATE TABLE test_1 PARTITION OF test FOR VALUES WITH (modulus 2, remainder 1); INSERT INTO test (SELECT q, q FROM generate_series(1,10) AS q); VACUUM ANALYZE test; INSERT INTO test (SELECT q, q%2 FROM generate_series(11,200) AS q); VACUUM ANALYZE test_0,test_1; EXPLAIN (ANALYZE, TIMING OFF, SUMMARY OFF) SELECT * FROM test t1, test t2 WHERE t1.id = t2.val; VACUUM ANALYZE test; EXPLAIN (ANALYZE, TIMING OFF, SUMMARY OFF) SELECT * FROM test t1, test t2 WHERE t1.id = t2.val; Here without actual statistics on parent table we make wrong prediction. -- Regards Andrey Lepikhov Postgres Professional
Re: Removing unneeded self joins
New version, rebased onto current master. Nothing special, just rebase. -- regards, Andrey Lepikhov Postgres Professional From 03aab7a2431032166c9ea5f52fbcccaf7168abec Mon Sep 17 00:00:00 2001 From: "Andrey V. Lepikhov" Date: Wed, 5 Oct 2022 16:58:34 +0500 Subject: [PATCH] Remove self-joins. A Self Join Removal (SJR) feature removes inner join of plain table to itself in a query plan if can be proved that the join can be replaced with a scan. The proof based on innerrel_is_unique machinery. We can remove a self-join when for each outer row: 1. At most one inner row matches the join clauses. 2. If the join target list contains any inner vars, an inner row must be (physically) the same row as the outer one. In this patch we use Rowley's [1] approach to identify a self-join: 1. Collect all mergejoinable join quals which look like a.x = b.x 2. Check innerrel_is_unique() for the qual list from (1). If it returns true, then outer row matches only the same row from the inner relation. 3. If uniqueness of outer relation deduces from baserestrictinfo clause like 'x=const' on unique field(s), check that inner has the same clause with the same constant(s). 4. Compare RowMarks of inner and outer relations. They must have the same strength. Some regression tests changed due to self-join removal logic. [1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com --- doc/src/sgml/config.sgml | 16 + src/backend/optimizer/path/indxpath.c | 38 + src/backend/optimizer/plan/analyzejoins.c | 1046 - src/backend/optimizer/plan/planmain.c |5 + src/backend/utils/misc/guc_tables.c | 22 + src/include/optimizer/paths.h |3 + src/include/optimizer/planmain.h |7 + src/test/regress/expected/equivclass.out | 32 + src/test/regress/expected/join.out| 774 +++ src/test/regress/expected/sysviews.out|3 +- src/test/regress/sql/equivclass.sql | 16 + src/test/regress/sql/join.sql | 340 +++ src/tools/pgindent/typedefs.list |2 + 13 files changed, 2278 insertions(+), 26 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index d750290f13..5ce2d4d2fa 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -5290,6 +5290,22 @@ ANY num_sync ( + enable_self_join_removal (boolean) + + enable_self_join_removal configuration parameter + + + + +Enables or disables the query planner's optimization which analyses +query tree and replaces self joins with semantically equivalent single +scans. Take into consideration only plain tables. +The default is on. + + + + enable_seqscan (boolean) diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index c31fcc917d..51f672a65c 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -3513,6 +3513,21 @@ bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel, List *restrictlist, List *exprlist, List *oprlist) +{ + return relation_has_unique_index_ext(root, rel, restrictlist, + exprlist, oprlist, NULL); +} + +/* + * relation_has_unique_index_ext + * if extra_clauses isn't NULL, return baserestrictinfo clauses which were + * used to derive uniqueness. + */ +bool +relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel, + List *restrictlist, + List *exprlist, List *oprlist, + List **extra_clauses) { ListCell *ic; @@ -3568,6 +3583,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel, { IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic); int c; + List *exprs = NIL; /* * If the index is not unique, or not immediately enforced, or if it's @@ -3616,6 +3632,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel, if (match_index_to_operand(rexpr, c, ind)) { matched = true; /* column is unique */ + + if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON) + { + MemoryContext oldMemCtx = + MemoryContext
Re: [POC] Allow flattening of subquery with a link to upper query
On 5/10/2022 02:45, Zhihong Yu wrote: Hi, For contain_placeholders(): + if (IsA(node, Query)) + return query_tree_walker((Query *) node, contain_placeholders, context, 0); + else if (IsA(node, PlaceHolderVar)) Fixed The `else` is not needed. For correlated_t struct, it would be better if the fields have comments. Ok, I've added some comments. + * (for grouping, as an example). So, revert its status to + * a full valued entry. full valued -> fully valued Fixed -- regards, Andrey Lepikhov Postgres Professional From da570914e53bc34e6bf3291649725a041a8b929c Mon Sep 17 00:00:00 2001 From: "Andrey V. Lepikhov" Date: Wed, 28 Sep 2022 13:42:12 +0300 Subject: [PATCH] Transform correlated subquery of type N-J [1] into ordinary join query. With many restrictions, check each subquery and pull up expressions, references upper query block. Works for operators '=' and 'IN'. Machinery of converting ANY subquery to JOIN stays the same with minor changes in walker function. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Pass a lower outer join through the pullup_sublink recursive procedure to find out some situations when qual references outer side of an outer join. [1] Kim, Won. “On optimizing an SQL-like nested query.” ACM Trans. Database Syst. 7 (1982): 443-469. --- src/backend/optimizer/plan/subselect.c| 328 +++- src/backend/optimizer/prep/prepjointree.c | 71 ++-- src/backend/optimizer/util/tlist.c| 2 +- src/backend/optimizer/util/var.c | 8 + src/backend/utils/misc/guc_tables.c | 10 + src/include/optimizer/optimizer.h | 1 + src/include/optimizer/subselect.h | 3 +- src/include/optimizer/tlist.h | 1 + src/test/regress/expected/prepare.out | 18 + src/test/regress/expected/subselect.out | 438 ++ src/test/regress/sql/prepare.sql | 7 + src/test/regress/sql/subselect.sql| 162 src/tools/pgindent/typedefs.list | 1 + 13 files changed, 1013 insertions(+), 37 deletions(-) diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index 92e3338584..be19d85524 100644 --- a/src/backend/optimizer/plan/subselect.c +++ b/src/backend/optimizer/plan/subselect.c @@ -32,6 +32,7 @@ #include "optimizer/planner.h" #include "optimizer/prep.h" #include "optimizer/subselect.h" +#include "optimizer/tlist.h" #include "parser/parse_relation.h" #include "rewrite/rewriteManip.h" #include "utils/builtins.h" @@ -65,6 +66,8 @@ typedef struct inline_cte_walker_context } inline_cte_walker_context; +bool optimize_correlated_subqueries = true; + static Node *build_subplan(PlannerInfo *root, Plan *plan, PlannerInfo *subroot, List *plan_params, SubLinkType subLinkType, int subLinkId, @@ -1229,6 +1232,296 @@ inline_cte_walker(Node *node, inline_cte_walker_context *context) return expression_tree_walker(node, inline_cte_walker, context); } +static bool +contain_placeholders(Node *node, inline_cte_walker_context *context) +{ + if (node == NULL) + return false; + + if (IsA(node, Query)) + return query_tree_walker((Query *) node, contain_placeholders, context, 0); + if (IsA(node, PlaceHolderVar)) + return true; + + return expression_tree_walker(node, contain_placeholders, context); +} + +/* + * To be pullable all clauses of flattening correlated subquery should be + * anded and mergejoinable (XXX: really necessary?) + */ +static bool +quals_is_pullable(Node *quals) +{ + if (!contain_vars_of_level(quals, 1)) + return true; + + if (quals && IsA(quals, OpExpr)) + { + OpExpr *expr = (OpExpr *) quals; + Node *leftarg; + + /* Contains only one expression */ + leftarg = linitial(expr->args); + if (!op_mergejoinable(expr->opno, exprType(leftarg))) /* Is it really necessary ? */ + return false; + + if (contain_placeholders(quals, NULL)) + return false; + + return true; + } + else if (is_andclause(quals)) + { + ListCell *l; + + foreach(l, ((BoolExpr *) quals)->args) + { + Node *andarg = (Node *) lfirst(l); + + if (!IsA(andarg, OpExpr)) + return false; + if (!quals_is_pullable(andarg)) + return false; + } + + return true; + } + + return false; +} + +/* + * Mutator context used
Re: [POC] Allow flattening of subquery with a link to upper query
On 9/13/22 16:40, Andrey Lepikhov wrote: On 5/9/2022 12:22, Richard Guo wrote: On Fri, Sep 2, 2022 at 7:09 PM Andrey Lepikhov mailto:a.lepik...@postgrespro.ru>> wrote: To resolve both issues, lower outer join passes through pull_sublinks_* into flattening routine (see attachment). I've added these cases into subselect.sql In attachment - new version of the patch, rebased onto current master. -- Regards Andrey Lepikhov Postgres Professional From 6462578f8789cb831f45ebfad65308d6afabb833 Mon Sep 17 00:00:00 2001 From: "Andrey V. Lepikhov" Date: Wed, 28 Sep 2022 13:42:12 +0300 Subject: [PATCH] Transform correlated subquery of type N-J [1] into ordinary join query. With many restrictions, check each subquery and pull up expressions, references upper query block. Works for operators '=' and 'IN'. Machinery of converting ANY subquery to JOIN stays the same with minor changes in walker function. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Pass a lower outer join through the pullup_sublink recursive procedure to find out some situations when qual references outer side of an outer join. [1] Kim, Won. “On optimizing an SQL-like nested query.” ACM Trans. Database Syst. 7 (1982): 443-469. --- src/backend/optimizer/plan/subselect.c| 320 +++- src/backend/optimizer/prep/prepjointree.c | 71 ++-- src/backend/optimizer/util/tlist.c| 2 +- src/backend/optimizer/util/var.c | 8 + src/backend/utils/misc/guc_tables.c | 10 + src/include/optimizer/optimizer.h | 1 + src/include/optimizer/subselect.h | 3 +- src/include/optimizer/tlist.h | 1 + src/test/regress/expected/prepare.out | 18 + src/test/regress/expected/subselect.out | 438 ++ src/test/regress/sql/prepare.sql | 7 + src/test/regress/sql/subselect.sql| 162 12 files changed, 1004 insertions(+), 37 deletions(-) diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index 92e3338584..29da43bb4d 100644 --- a/src/backend/optimizer/plan/subselect.c +++ b/src/backend/optimizer/plan/subselect.c @@ -32,6 +32,7 @@ #include "optimizer/planner.h" #include "optimizer/prep.h" #include "optimizer/subselect.h" +#include "optimizer/tlist.h" #include "parser/parse_relation.h" #include "rewrite/rewriteManip.h" #include "utils/builtins.h" @@ -65,6 +66,8 @@ typedef struct inline_cte_walker_context } inline_cte_walker_context; +bool optimize_correlated_subqueries = true; + static Node *build_subplan(PlannerInfo *root, Plan *plan, PlannerInfo *subroot, List *plan_params, SubLinkType subLinkType, int subLinkId, @@ -1229,6 +1232,288 @@ inline_cte_walker(Node *node, inline_cte_walker_context *context) return expression_tree_walker(node, inline_cte_walker, context); } +static bool +contain_placeholders(Node *node, inline_cte_walker_context *context) +{ + if (node == NULL) + return false; + + if (IsA(node, Query)) + return query_tree_walker((Query *) node, contain_placeholders, context, 0); + else if (IsA(node, PlaceHolderVar)) + { + return true; + } + + return expression_tree_walker(node, contain_placeholders, context); +} + +/* + * To be pullable all clauses of flattening correlated subquery should be + * anded and mergejoinable (XXX: really necessary?) + */ +static bool +quals_is_pullable(Node *quals) +{ + if (!contain_vars_of_level(quals, 1)) + return true; + + if (quals && IsA(quals, OpExpr)) + { + OpExpr *expr = (OpExpr *) quals; + Node *leftarg; + + /* Contains only one expression */ + leftarg = linitial(expr->args); + if (!op_mergejoinable(expr->opno, exprType(leftarg))) /* Is it really necessary ? */ + return false; + + if (contain_placeholders(quals, NULL)) + return false; + + return true; + } + else if (is_andclause(quals)) + { + ListCell *l; + + foreach(l, ((BoolExpr *) quals)->args) + { + Node *andarg = (Node *) lfirst(l); + + if (!IsA(andarg, OpExpr)) +return false; + if (!quals_is_pullable(andarg)) +return false; + } + + return true; + } + + return false; +} + +typedef struct +{ + Query *subquery; + int newvarno; + List *pulling_quals; + bool varlevel_up; +} correlated_t; + +static Node * +pull_subquery_clauses_mutator(Node *node, correlated_t *ctx) +{ + if (node == NULL) + return NULL; + + if (IsA(node, OpExpr) && !ctx->varlevel_up) + { + if (!contain_vars_of_level(node, 1)) + return node; + + /* + * The expression contains links to upper relation. It will be pulled to + * uplevel. All links into vars of upper levels must be changed. + */ + + ctx->varlevel_up = true; + ctx->pulling_quals = + lappend(ctx->pulling_quals, + expression_tree_mutator(node, + pull_subquery_clauses_mutator, + (void *) ctx)); + ctx->
Re: [POC] Allow flattening of subquery with a link to upper query
On 5/9/2022 12:22, Richard Guo wrote: On Fri, Sep 2, 2022 at 7:09 PM Andrey Lepikhov mailto:a.lepik...@postgrespro.ru>> wrote: > Hmm, I'm not sure this patch works correctly in all cases. It seems to > me this patch pulls up the subquery without checking the constraints > imposed by lateral references. If its quals contain any lateral > references to rels outside a higher outer join, we would need to > postpone quals from below an outer join to above it, which is probably > incorrect. Yeah, it's not easy-to-solve problem. If I correctly understand the code, to fix this problem we must implement the same logic, as pull_up_subqueries (lowest_outer_join/safe_upper_varnos). Yeah, I think we'd have to consider the restrictions from lateral references to guarantee correctness when we pull up subqueries. We need to avoid the situation where quals need to be postponed past outer join. However, even if we have taken care of that, there may be other issues with flattening direct-correlated ANY SubLink. The constraints imposed by LATERAL references may make it impossible for us to find any legal join orders, as discussed in [1]. To resolve both issues, lower outer join passes through pull_sublinks_* into flattening routine (see attachment). I've added these cases into subselect.sql -- regards, Andrey Lepikhov Postgres Professional From 883f9586acb5482bb88d5ca5123195c0efdd9cc7 Mon Sep 17 00:00:00 2001 From: Andrey Lepikhov Date: Mon, 5 Sep 2022 16:40:48 +0500 Subject: [PATCH] Pass a lower outer join through the pullup_sublink recursive procedure to find out some situations when qual references outer side of an outer join. --- src/backend/optimizer/plan/subselect.c| 19 -- src/backend/optimizer/prep/prepjointree.c | 71 ++- src/include/optimizer/subselect.h | 3 +- src/test/regress/expected/subselect.out | 52 + src/test/regress/sql/subselect.sql| 18 ++ 5 files changed, 131 insertions(+), 32 deletions(-) diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index 0e7ddc4a4e..29da43bb4d 100644 --- a/src/backend/optimizer/plan/subselect.c +++ b/src/backend/optimizer/plan/subselect.c @@ -1442,7 +1442,8 @@ pull_subquery_clauses_mutator(Node *node, correlated_t *ctx) } static List * -pull_correlated_clauses(PlannerInfo *root, SubLink *sublink) +pull_correlated_clauses(PlannerInfo *root, SubLink *sublink, + JoinExpr *lowest_outer_join) { Query *parse = root->parse; Query *subselect = (Query *) sublink->subselect; @@ -1451,6 +1452,7 @@ pull_correlated_clauses(PlannerInfo *root, SubLink *sublink) .newvarno = list_length(parse->rtable) + 1, /* Looks like a hack */ .pulling_quals = NIL, .varlevel_up = false}; + Relids safe_upper_varnos = NULL; Assert(IsA(subselect, Query)); @@ -1489,7 +1491,16 @@ pull_correlated_clauses(PlannerInfo *root, SubLink *sublink) f = subselect->jointree; if (!f || !f->quals || !quals_is_pullable(f->quals)) - /* Degenerate case */ + return NULL; + + if (lowest_outer_join) + safe_upper_varnos = get_relids_in_jointree( + (lowest_outer_join->jointype == JOIN_RIGHT) ? + lowest_outer_join->larg : lowest_outer_join->rarg, true); + + if (safe_upper_varnos && + !bms_is_subset(pull_varnos_of_level(root, f->quals, 1), + safe_upper_varnos)) return NULL; /* @@ -1540,7 +1551,7 @@ pull_correlated_clauses(PlannerInfo *root, SubLink *sublink) */ JoinExpr * convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, - Relids available_rels) + Relids available_rels, JoinExpr *lowest_outer_join) { JoinExpr *result; Query *parse = root->parse; @@ -1586,7 +1597,7 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, * pulled_clauses list, their places in the quals replaces by "true" value. */ if (contain_vars_of_level((Node *) subselect, 1) && - (pclauses = pull_correlated_clauses(root, sublink)) == NIL) + (pclauses = pull_correlated_clauses(root, sublink, lowest_outer_join)) == NIL) return NULL; /* Create a dummy ParseState for addRangeTableEntryForSubquery */ diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/
Re: [POC] Allow flattening of subquery with a link to upper query
On 9/5/22 12:22, Richard Guo wrote: On Fri, Sep 2, 2022 at 7:09 PM Andrey Lepikhov Yeah, it's not easy-to-solve problem. If I correctly understand the code, to fix this problem we must implement the same logic, as pull_up_subqueries (lowest_outer_join/safe_upper_varnos). Yeah, I think we'd have to consider the restrictions from lateral references to guarantee correctness when we pull up subqueries. We need to avoid the situation where quals need to be postponed past outer join. However, even if we have taken care of that, there may be other issues with flattening direct-correlated ANY SubLink. The constraints imposed by LATERAL references may make it impossible for us to find any legal join orders, as discussed in [1]. [1] https://www.postgresql.org/message-id/CAMbWs49cvkF9akbomz_fCCKS=D5TY=4kgheqcfhpzcxs1gv...@mail.gmail.com <https://www.postgresql.org/message-id/CAMbWs49cvkF9akbomz_fCCKS=D5TY=4kgheqcfhpzcxs1gv...@mail.gmail.com> The problem you mentioned under this link is about ineffective query plan - as I understand it. This is a problem, especially if we would think about more complex pull-ups of subqueries - with aggregate functions in the target list. I think about that problem as about next step - we already have an example - machinery of alternative plans. This problem may be solved in this way, or by a GUC, as usual. -- Regards Andrey Lepikhov Postgres Professional
Re: [POC] Allow flattening of subquery with a link to upper query
On 9/1/22 17:24, Richard Guo wrote: On Wed, Aug 31, 2022 at 2:35 PM Andrey Lepikhov mailto:a.lepik...@postgrespro.ru>> wrote: Before flattening procedure we just look through the quals of subquery, pull to the upper level OpExpr's containing variables from the upper relation and replace their positions in the quals with true expression. Further, the flattening machinery works as usual. Hmm, I'm not sure this patch works correctly in all cases. It seems to me this patch pulls up the subquery without checking the constraints imposed by lateral references. If its quals contain any lateral references to rels outside a higher outer join, we would need to postpone quals from below an outer join to above it, which is probably incorrect.Yeah, it's not easy-to-solve problem. If I correctly understand the code, to fix this problem we must implement the same logic, as pull_up_subqueries (lowest_outer_join/safe_upper_varnos). It looks ugly. But, more important, does this feature deserve such efforts/changes? -- Regards Andrey Lepikhov Postgres Professional
Re: [HACKERS] PoC: custom signal handler for extensions
On 1/12/18 20:51, Teodor Sigaev wrote: In perspective, this mechanism provides the low-level instrument to define remote procedure call on extension side. The simple RPC that defines effective userid on remote backend (remote_effective_user function) is attached for example. 7) Suppose, API allows to use different handlers in different processes for the same reason, it's could be reason of confusion. I suggest to restrict Register/Unregister call only for shared_preload_library, ie only during startup. +1 8) I'd like to see an example of usage this API somewhere in contrib in exsting modules. Ideas? I imagine, auto_explain could demonstrate usage of the API by implementing logging of current query state, triggered by a signal. Most of necessary code is already existed there. Of course, this option will be enabled if auto_explain loads on startup. But, maybe it breaks main concept of the auto_explain extension? -- Regards Andrey Lepikhov Postgres Professional
[POC] Allow flattening of subquery with a link to upper query
Hi, One of the most annoying things in the planner for me is unnesting of correlated queries [1]. A number papers on this subject were published starting 1980s, but only trivial optimizations exists in the Core. It means a lack of performance, especially when we use foreign tables in subquery. In the patch I'm trying to propose a sort of sketch of solution. Before flattening procedure we just look through the quals of subquery, pull to the upper level OpExpr's containing variables from the upper relation and replace their positions in the quals with true expression. Further, the flattening machinery works as usual. This patch is dedicated to simplest variant of correlated queries - without aggregate functions in the target list. It passes regression tests and contains some additional tests to demonstrate achievements. I'd like to get critics on the approach. [1] Kim, Won. “On optimizing an SQL-like nested query.” ACM Trans. Database Syst. 7 (1982): 443-469. -- Regards Andrey Lepikhov Postgres ProfessionalFrom 3f4247b23175388f8c6ee43740fb641d97e39d0b Mon Sep 17 00:00:00 2001 From: "Andrey V. Lepikhov" Date: Tue, 24 May 2022 15:59:02 +0500 Subject: [PATCH] Transform correlated subquery of type N-J [1] into ordinary join query. With many restrictions, check each subquery and pull up expressions, references upper query block. Works for operators '=' and 'IN'. Machinery of converting ANY subquery to JOIN stays the same with minor changes in walker function. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [1] Kim, Won. “On optimizing an SQL-like nested query.” ACM Trans. Database Syst. 7 (1982): 443-469. --- src/backend/optimizer/plan/subselect.c | 307 +++- src/backend/optimizer/util/tlist.c | 2 +- src/backend/optimizer/util/var.c| 8 + src/backend/utils/misc/guc.c| 10 + src/include/optimizer/optimizer.h | 1 + src/include/optimizer/tlist.h | 1 + src/test/regress/expected/prepare.out | 18 ++ src/test/regress/expected/subselect.out | 369 src/test/regress/sql/prepare.sql| 7 + src/test/regress/sql/subselect.sql | 140 + 10 files changed, 855 insertions(+), 8 deletions(-) diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index 92e3338584..0e7ddc4a4e 100644 --- a/src/backend/optimizer/plan/subselect.c +++ b/src/backend/optimizer/plan/subselect.c @@ -32,6 +32,7 @@ #include "optimizer/planner.h" #include "optimizer/prep.h" #include "optimizer/subselect.h" +#include "optimizer/tlist.h" #include "parser/parse_relation.h" #include "rewrite/rewriteManip.h" #include "utils/builtins.h" @@ -65,6 +66,8 @@ typedef struct inline_cte_walker_context } inline_cte_walker_context; +bool optimize_correlated_subqueries = true; + static Node *build_subplan(PlannerInfo *root, Plan *plan, PlannerInfo *subroot, List *plan_params, SubLinkType subLinkType, int subLinkId, @@ -1229,6 +1232,277 @@ inline_cte_walker(Node *node, inline_cte_walker_context *context) return expression_tree_walker(node, inline_cte_walker, context); } +static bool +contain_placeholders(Node *node, inline_cte_walker_context *context) +{ + if (node == NULL) + return false; + + if (IsA(node, Query)) + return query_tree_walker((Query *) node, contain_placeholders, context, 0); + else if (IsA(node, PlaceHolderVar)) + { + return true; + } + + return expression_tree_walker(node, contain_placeholders, context); +} + +/* + * To be pullable all clauses of flattening correlated subquery should be + * anded and mergejoinable (XXX: really necessary?) + */ +static bool +quals_is_pullable(Node *quals) +{ + if (!contain_vars_of_level(quals, 1)) + return true; + + if (quals && IsA(quals, OpExpr)) + { + OpExpr *expr = (OpExpr *) quals; + Node *leftarg; + + /* Contains only one expression */ + leftarg = linitial(expr->args); + if (!op_mergejoinable(expr->opno, exprType(leftarg))) /* Is it really necessary ? */ + return false; + + if (contain_placeholders(quals, NULL)) + return false; + + return true; + } + else if (is_andclause(quals)) + { + ListCell *l; + + foreach(l, ((BoolExpr *) quals)->args) + { + Node *andarg = (Node *) lfirst(l); + + if (!IsA(andarg, OpExpr)) +return false; + if (!quals_is_pullable(andarg)) +return false; + } + + return true; + } + + return false; +} + +typedef struct +{ + Query *subquery; + int newvarno; + List *pulling_quals; + bool varlevel_up; +} correlated_t; + +static Node * +pull_subquery_clauses_mutator(Node *node, correlated_t *ctx) +{ + if (node == NULL) + return NULL; + + if (IsA(node, OpExpr) && !ctx->varlevel_up) + { + if (!contain_vars_of_level(node, 1)) + return node; + + /* + * The expression contains link
Re: Removing unneeded self joins
On 8/29/22 04:39, Zhihong Yu wrote: On Fri, Aug 26, 2022 at 3:02 PM Zhihong Yu <mailto:z...@yugabyte.com>> wrote: Hi, For v36-0001-Remove-self-joins.patch : bq removes inner join of plane table to itself plane table -> plain table For relation_has_unique_index_ext(), it seems when extra_clauses is NULL, there is no need to compute `exprs`. Cheers Done For remove_self_joins_recurse(): + if (bms_num_members(relids) > join_collapse_limit) + break; The above line just comes out of the switch statement. This check should be done again between foreach and switch. Otherwise the above check wouldn't achieve what you want. Cheers Thanks for highlighting the problem. I guess, usage either of join_collapse_limit or from_collapse_limit isn't practical here. That we really afraid here - many senseless search cycles of self-joins. And it may have higher limit than GUCs above. So I introduced a guc, called "self_join_search_limit" (so far undocumented) that is an explicit limit for a set of plain relations in FROM-list to search self-joins. -- Regards Andrey Lepikhov Postgres ProfessionalFrom 6283d6e21214e34d3c1a6351fa9f6ac1aeb75ce8 Mon Sep 17 00:00:00 2001 From: "Andrey V. Lepikhov" Date: Fri, 26 Aug 2022 15:17:53 +0300 Subject: [PATCH] Remove self-joins. A Self Join Removal (SJR) feature removes inner join of plain table to itself in a query plan if can be proved that the join can be replaced with a scan. The proof based on innerrel_is_unique machinery. We can remove a self-join when for each outer row: 1. At most one inner row matches the join clauses. 2. If the join target list contains any inner vars, an inner row must be (physically) the same row as the outer one. In this patch we use Rowley's [1] approach to identify a self-join: 1. Collect all mergejoinable join quals which look like a.x = b.x 2. Check innerrel_is_unique() for the qual list from (1). If it returns true, then outer row matches only the same row from the inner relation. 3. If uniqueness of outer relation deduces from baserestrictinfo clause like 'x=const' on unique field(s), check that inner has the same clause with the same constant(s). 4. Compare RowMarks of inner and outer relations. They must have the same strength. Some regression tests changed due to self-join removal logic. [1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com --- doc/src/sgml/config.sgml | 16 + src/backend/optimizer/path/indxpath.c | 38 + src/backend/optimizer/plan/analyzejoins.c | 1046 - src/backend/optimizer/plan/planmain.c |5 + src/backend/utils/misc/guc.c | 22 + src/include/optimizer/paths.h |3 + src/include/optimizer/planmain.h |7 + src/test/regress/expected/equivclass.out | 32 + src/test/regress/expected/join.out| 774 +++ src/test/regress/expected/sysviews.out|3 +- src/test/regress/sql/equivclass.sql | 16 + src/test/regress/sql/join.sql | 340 +++ src/tools/pgindent/typedefs.list |2 + 13 files changed, 2278 insertions(+), 26 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index a5cd4e44c7..2cfb62f97f 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -5297,6 +5297,22 @@ ANY num_sync ( + enable_self_join_removal (boolean) + + enable_self_join_removal configuration parameter + + + + +Enables or disables the query planner's optimization which analyses +query tree and replaces self joins with semantically equivalent single +scans. Take into consideration only plain tables. +The default is on. + + + + enable_seqscan (boolean) diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 045ff2e487..c41e7256be 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -3495,6 +3495,21 @@ bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel, List *restrictlist, List *exprlist, List *oprlist) +{ + return relation_has_unique_index_ext(root, rel, restrictlist, + exprlist, oprlist, NULL); +} + +/* + * relation_has_unique_index_ext + * if extra_clauses isn't NULL, return baserestrictinfo clauses which were + * used to derive uniqueness. + */ +bool +relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel, + List *restrictlist, + List *exprlist, List *oprlist, + List **extra_clauses) { ListCell *ic; @@ -3550,6 +3565,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel, { IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic); int c; + List *exprs = NIL;
Re: Removing unneeded self joins
On 30/6/2022 17:11, Andrey Lepikhov wrote: On 19/5/2022 16:47, Ronan Dunklau wrote: I'll take a look at that one. New version of the patch, rebased on current master: 1. pgindent over the patch have passed. 2. number of changed files is reduced. 3. Some documentation and comments is added. New version rebased on new master, minor changes and tests added. -- regards, Andrey Lepikhov Postgres ProfessionalFrom 8d864515da68728ddee10d455f8bdb64d34277aa Mon Sep 17 00:00:00 2001 From: "Andrey V. Lepikhov" Date: Fri, 26 Aug 2022 15:17:53 +0300 Subject: [PATCH] Remove self-joins. A Self Join Removal (SJR) feature removes inner join of plane table to itself in a query plan if can be proved that the join can be replaced with a scan. The proof based on innerrel_is_unique machinery. We can remove a self-join when for each outer row: 1. At most one inner row matches the join clauses. 2. If the join target list contains any inner vars, an inner row must be (physically) the same row as the outer one. In this patch we use Rowley's [1] approach to identify a self-join: 1. Collect all mergejoinable join quals which look like a.x = b.x 2. Check innerrel_is_unique() for the qual list from (1). If it returns true, then outer row matches only the same row from the inner relation. 3. If uniqueness of outer relation deduces from baserestrictinfo clause like 'x=const' on unique field(s), check that inner has the same clause with the same constant(s). 4. Compare RowMarks of inner and outer relations. They must have the same strength. Some regression tests changed due to self-join removal logic. [1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com --- doc/src/sgml/config.sgml | 16 + src/backend/optimizer/path/indxpath.c | 39 + src/backend/optimizer/plan/analyzejoins.c | 1045 - src/backend/optimizer/plan/planmain.c |5 + src/backend/utils/misc/guc.c | 10 + src/include/optimizer/paths.h |3 + src/include/optimizer/planmain.h |6 + src/test/regress/expected/equivclass.out | 32 + src/test/regress/expected/join.out| 735 +++ src/test/regress/expected/sysviews.out|3 +- src/test/regress/sql/equivclass.sql | 16 + src/test/regress/sql/join.sql | 329 +++ src/tools/pgindent/typedefs.list |2 + 13 files changed, 2215 insertions(+), 26 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index a5cd4e44c7..2cfb62f97f 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -5297,6 +5297,22 @@ ANY num_sync ( + enable_self_join_removal (boolean) + + enable_self_join_removal configuration parameter + + + + +Enables or disables the query planner's optimization which analyses +query tree and replaces self joins with semantically equivalent single +scans. Take into consideration only plain tables. +The default is on. + + + + enable_seqscan (boolean) diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 045ff2e487..a9f8f89312 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -3495,8 +3495,24 @@ bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel, List *restrictlist, List *exprlist, List *oprlist) +{ + return relation_has_unique_index_ext(root, rel, restrictlist, + exprlist, oprlist, NULL); +} + +/* + * relation_has_unique_index_ext + * if extra_clauses isn't NULL, return baserestrictinfo clauses which were + * used to derive uniqueness. + */ +bool +relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel, + List *restrictlist, + List *exprlist, List *oprlist, + List **extra_clauses) { ListCell *ic; + List *exprs; Assert(list_length(exprlist) == list_length(oprlist)); @@ -3551,6 +3567,8 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel, IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic); int c; + exprs = NIL; + /* * If the index is not unique, or not immediately enforced, or if it's * a partial index that doesn't match the query, it's useless here. @@ -3598,6 +3616,23 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel, if (match_index_to_ope
Re: Fast COPY FROM based on batch insert
On 22/8/2022 11:44, Etsuro Fujita wrote: I think the latter is more consistent with the existing error context information when in CopyMultiInsertBufferFlush(). Actually, I thought this too, and I think this would be useful when the COPY FROM command is executed on a foreign table. My concern, however, is the case when the command is executed on a partitioned table containing foreign partitions; in that case the input data would not always be sorted in the partition order, so the range for an error-occurring foreign partition might contain many lines with rows from other partitions, which I think makes the range information less useful. Maybe I'm too worried about that, though. I got your point. Indeed, perharps such info doesn't really needed to be included into the core, at least for now. -- regards, Andrey Lepikhov Postgres Professional
Re: Fast COPY FROM based on batch insert
On 8/9/22 16:44, Etsuro Fujita wrote: -1 foo 1 bar \. ERROR: new row for relation "t1" violates check constraint "t1_f1positive" DETAIL: Failing row contains (-1, foo). CONTEXT: remote SQL command: INSERT INTO public.t1(f1, f2) VALUES ($1, $2), ($3, $4) COPY ft1, line 3 In single-insert mode the error context information is correct, but in batch-insert mode it isn’t (i.e., the line number isn’t correct). The error occurs on the remote side, so I'm not sure if there is a simple fix. What I came up with is to just suppress error context information other than the relation name, like the attached. What do you think about that? I've spent many efforts to this problem too. Your solution have a rationale and looks fine. I only think, we should add a bit of info into an error report to simplify comprehension why don't point specific line here. For example: 'COPY %s (buffered)' or 'COPY FOREIGN TABLE %s' or, if instead of relname_only field to save a MultiInsertBuffer pointer, we might add min/max linenos into the report: 'COPY %s, line between %llu and %llu' -- Regards Andrey Lepikhov Postgres Professional
Re: Fast COPY FROM based on batch insert
On 7/22/22 13:14, Etsuro Fujita wrote: On Fri, Jul 22, 2022 at 3:39 PM Andrey Lepikhov wrote: Analyzing multi-level heterogeneous partitioned configurations I realized, that single write into a partition with a trigger will flush buffers for all other partitions of the parent table even if the parent haven't any triggers. It relates to the code: else if (insertMethod == CIM_MULTI_CONDITIONAL && !CopyMultiInsertInfoIsEmpty()) { /* * Flush pending inserts if this partition can't use * batching, so rows are visible to triggers etc. */ CopyMultiInsertInfoFlush(, resultRelInfo); } Why such cascade flush is really necessary, especially for BEFORE and INSTEAD OF triggers? BEFORE triggers on the chosen partition might query the parent table, not just the partition, so I think we need to do this so that such triggers can see all the rows that have been inserted into the parent table until then. if you'll excuse me, I will add one more argument. It wasn't clear, so I've made an experiment: result of a SELECT in an INSERT trigger function shows only data, existed in the parent table before the start of COPY. So, we haven't tools to access newly inserting rows in neighboring partition and don't need to flush tuple buffers immediately. Where am I wrong? -- Regards Andrey Lepikhov Postgres Professional
Re: Fast COPY FROM based on batch insert
On 7/22/22 13:14, Etsuro Fujita wrote: On Fri, Jul 22, 2022 at 3:39 PM Andrey Lepikhov Why such cascade flush is really necessary, especially for BEFORE and INSTEAD OF triggers? BEFORE triggers on the chosen partition might query the parent table, not just the partition, so I think we need to do this so that such triggers can see all the rows that have been inserted into the parent table until then. Thanks for the explanation of your point of view. So, maybe switch status of this patch to 'Ready for committer'? -- Regards Andrey Lepikhov Postgres Professional
Re: Fast COPY FROM based on batch insert
On 7/20/22 13:10, Etsuro Fujita wrote: On Tue, Jul 19, 2022 at 6:35 PM Andrey Lepikhov wrote: On 18/7/2022 13:22, Etsuro Fujita wrote: I rewrote the decision logic to something much simpler and much less invasive, which reduces the patch size significantly. Attached is an updated patch. What do you think about that? maybe you forgot this code: /* * If a partition's root parent isn't allowed to use it, neither is the * partition. */ if (rootResultRelInfo->ri_usesMultiInsert) leaf_part_rri->ri_usesMultiInsert = ExecMultiInsertAllowed(leaf_part_rri); I think the patch accounts for that. Consider this bit to determine whether to use batching for the partition chosen by ExecFindPartition(): Agreed. Analyzing multi-level heterogeneous partitioned configurations I realized, that single write into a partition with a trigger will flush buffers for all other partitions of the parent table even if the parent haven't any triggers. It relates to the code: else if (insertMethod == CIM_MULTI_CONDITIONAL && !CopyMultiInsertInfoIsEmpty()) { /* * Flush pending inserts if this partition can't use * batching, so rows are visible to triggers etc. */ CopyMultiInsertInfoFlush(, resultRelInfo); } Why such cascade flush is really necessary, especially for BEFORE and INSTEAD OF triggers? AFTER Trigger should see all rows of the table, but if it isn't exists for parent, I think, we wouldn't obligate to guarantee order of COPY into two different tables. -- Regards Andrey Lepikhov Postgres Professional
Re: [PoC] Reducing planning time when tables have many partitions
On 7/5/22 13:57, Yuya Watari wrote: On Mon, Jul 4, 2022 at 6:28 AM Tom Lane wrote: Perhaps the bms_is_subset class could be handled in a similar way, ie do a one-time pass to make a List of all EquivalenceMembers that use a RelOptInfo. Thank you for giving your idea. I will try to polish up my algorithm based on your suggestion. This work has significant interest for highly partitioned configurations. Are you still working on this patch? According to the current state of the thread, changing the status to 'Waiting on author' may be better until the next version. Feel free to reverse the status if you need more feedback. -- Regards Andrey Lepikhov Postgres Professional
Re: Fast COPY FROM based on batch insert
On 18/7/2022 13:22, Etsuro Fujita wrote: On Thu, Mar 24, 2022 at 3:43 PM Andrey V. Lepikhov wrote: On 3/22/22 06:54, Etsuro Fujita wrote: * To allow foreign multi insert, the patch made an invasive change to the existing logic to determine whether to use multi insert for the target relation, adding a new member ri_usesMultiInsert to the ResultRelInfo struct, as well as introducing a new function ExecMultiInsertAllowed(). But I’m not sure we really need such a change. Isn’t it reasonable to *adjust* the existing logic to allow foreign multi insert when possible? Of course, such approach would look much better, if we implemented it. I'll ponder how to do it. I rewrote the decision logic to something much simpler and much less invasive, which reduces the patch size significantly. Attached is an updated patch. What do you think about that? While working on the patch, I fixed a few issues as well: + if (resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize != NULL) + resultRelInfo->ri_BatchSize = + resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo); When determining the batch size, I think we should check if the ExecForeignBatchInsert callback routine is also defined, like other places such as execPartition.c. For consistency I fixed this by copying-and-pasting the code from that file. +* Also, the COPY command requires a non-zero input list of attributes. +* Therefore, the length of the attribute list is checked here. +*/ + if (!cstate->volatile_defexprs && + list_length(cstate->attnumlist) > 0 && + !contain_volatile_functions(cstate->whereClause)) + target_resultRelInfo->ri_usesMultiInsert = + ExecMultiInsertAllowed(target_resultRelInfo); I think “list_length(cstate->attnumlist) > 0” in the if-test would break COPY FROM; it currently supports multi-inserting into *plain* tables even in the case where they have no columns, but this would disable the multi-insertion support in that case. postgres_fdw would not be able to batch into zero-column foreign tables due to the INSERT syntax limitation (i.e., the syntax does not allow inserting multiple empty rows into a zero-column table in a single INSERT statement). Which is the reason why this was added to the if-test? But I think some other FDWs might be able to, so I think we should let the FDW decide whether to allow batching even in that case, when called from GetForeignModifyBatchSize. So I removed the attnumlist test from the patch, and modified postgresGetForeignModifyBatchSize as such. I might miss something, though. Thanks a lot, maybe you forgot this code: /* * If a partition's root parent isn't allowed to use it, neither is the * partition. */ if (rootResultRelInfo->ri_usesMultiInsert) leaf_part_rri->ri_usesMultiInsert = ExecMultiInsertAllowed(leaf_part_rri); Also, maybe to describe in documentation, if the value of batch_size is more than 1, the ExecForeignBatchInsert routine have a chance to be called? -- regards, Andrey Lepikhov Postgres Professional
Re: Postgres picks suboptimal index after building of an extended statistics
On 7/8/22 03:07, Tom Lane wrote: Andrey Lepikhov writes: On 12/8/21 04:26, Tomas Vondra wrote: I wonder if we should teach clauselist_selectivity about UNIQUE indexes, and improve the cardinality estimates directly, not just costing for index scans. I tried to implement this in different ways. But it causes additional overhead and code complexity - analyzing a list of indexes and match clauses of each index with input clauses in each selectivity estimation. I don't like that way and propose a new patch in attachment. I looked at this briefly. I do not think that messing with btcostestimate/genericcostestimate is the right response at all. The problem can be demonstrated with no index whatever, as in the attached shortened version of the original example. I get I partly agree with you. Yes, I see the problem too. But also we have a problem that I described above: optimizer don't choose a path with minimal selectivity from a set selectivities which shows cardinality less than 1 (see badestimate2.sql). New patch (see in attachment), fixes this problem. -- Regards Andrey Lepikhov Postgres ProfessionalFrom 463c085852be921199b7a6d0987378d55145e41c Mon Sep 17 00:00:00 2001 From: Andrey Lepikhov Date: Mon, 11 Jul 2022 12:25:43 +0500 Subject: [PATCH] Use Index path with the best selectivity estimation --- src/backend/optimizer/util/pathnode.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index 483c4f4137..a8d7f22343 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -207,6 +207,20 @@ compare_path_costs_fuzzily(Path *path1, Path *path2, double fuzz_factor) /* ... but path2 fuzzily worse on startup, so path1 wins */ return COSTS_BETTER1; } + if (IsA(path1, IndexPath) && IsA(path2, IndexPath)) + { + /* + * Couldn't differ value of the paths. Last chance - if these paths + * are index paths - use the path with the lower selectivity value. + */ + if (((IndexPath *) path1)->indexselectivity < + ((IndexPath *) path2)->indexselectivity) + return COSTS_BETTER1; + + if (((IndexPath *) path1)->indexselectivity > + ((IndexPath *) path2)->indexselectivity) + return COSTS_BETTER2; + } /* fuzzily the same on both costs */ return COSTS_EQUAL; -- 2.34.1 badestimate2.sql Description: application/sql
Re: Fast COPY FROM based on batch insert
On 11/7/2022 04:12, Ian Barwick wrote: On 09/07/2022 00:09, Andrey Lepikhov wrote: On 8/7/2022 05:12, Ian Barwick wrote: ERROR: bind message supplies 0 parameters, but prepared statement "pgsql_fdw_prep_178" requires 6 CONTEXT: remote SQL command: INSERT INTO public.foo_part_1(t, v1, v2, v3, v4, v5) VALUES ($1, $2, $3, $4, $5, $6) COPY foo, line 88160 Thanks, I got it. MultiInsertBuffer are created on the first non-zero flush of tuples into the partition and isn't deleted from the buffers list until the end of COPY. And on a subsequent flush in the case of empty buffer we catch the error. Your fix is correct, but I want to propose slightly different change (see in attachment). LGTM. New version (with aforementioned changes) is attached. -- regards, Andrey Lepikhov Postgres ProfessionalFrom 976560f2ad406adba1aaf58a188b44302855ee12 Mon Sep 17 00:00:00 2001 From: "Andrey V. Lepikhov" Date: Fri, 4 Jun 2021 13:21:43 +0500 Subject: [PATCH] Implementation of a Bulk COPY FROM operation into foreign table. --- .../postgres_fdw/expected/postgres_fdw.out| 45 +++- contrib/postgres_fdw/sql/postgres_fdw.sql | 47 src/backend/commands/copyfrom.c | 211 -- src/backend/executor/execMain.c | 45 src/backend/executor/execPartition.c | 8 + src/include/commands/copyfrom_internal.h | 10 - src/include/executor/executor.h | 1 + src/include/nodes/execnodes.h | 5 +- 8 files changed, 238 insertions(+), 134 deletions(-) diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 44457f930c..aced9a6428 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -8435,6 +8435,7 @@ drop table loct2; -- === -- test COPY FROM -- === +alter server loopback options (add batch_size '2'); create table loc2 (f1 int, f2 text); alter table loc2 set (autovacuum_enabled = 'false'); create foreign table rem2 (f1 int, f2 text) server loopback options(table_name 'loc2'); @@ -8457,7 +8458,7 @@ copy rem2 from stdin; -- ERROR ERROR: new row for relation "loc2" violates check constraint "loc2_f1positive" DETAIL: Failing row contains (-1, xyzzy). CONTEXT: remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2) -COPY rem2, line 1: "-1 xyzzy" +COPY rem2, line 2 select * from rem2; f1 | f2 +- @@ -8468,6 +8469,19 @@ select * from rem2; alter foreign table rem2 drop constraint rem2_f1positive; alter table loc2 drop constraint loc2_f1positive; delete from rem2; +create table foo (a int) partition by list (a); +create table foo1 (like foo); +create foreign table ffoo1 partition of foo for values in (1) + server loopback options (table_name 'foo1'); +create table foo2 (like foo); +create foreign table ffoo2 partition of foo for values in (2) + server loopback options (table_name 'foo2'); +create function print_new_row() returns trigger language plpgsql as $$ + begin raise notice '%', new; return new; end; $$; +create trigger ffoo1_br_trig before insert on ffoo1 + for each row execute function print_new_row(); +copy foo from stdin; +NOTICE: (1) -- Test local triggers create trigger trig_stmt_before before insert on rem2 for each statement execute procedure trigger_func(); @@ -8576,6 +8590,34 @@ drop trigger rem2_trig_row_before on rem2; drop trigger rem2_trig_row_after on rem2; drop trigger loc2_trig_row_before_insert on loc2; delete from rem2; +alter table loc2 drop column f1; +alter table loc2 drop column f2; +copy rem2 from stdin; +ERROR: column "f1" of relation "loc2" does not exist +CONTEXT: remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2), ($3, $4) +COPY rem2, line 3 +alter table loc2 add column f1 int; +alter table loc2 add column f2 int; +select * from rem2; + f1 | f2 ++ +(0 rows) + +-- dropped columns locally and on the foreign server +alter table rem2 drop column f1; +alter table rem2 drop column f2; +copy rem2 from stdin; +select * from rem2; +-- +(2 rows) + +alter table loc2 drop column f1; +alter table loc2 drop column f2; +copy rem2 from stdin; +select * from rem2; +-- +(4 rows) + -- test COPY FROM with foreign table created in the same transaction create table loc3 (f1 int, f2 text); begin; @@ -8592,6 +8634,7 @@ select * from rem3; drop foreign table rem3; drop table loc3; +alter server loopback options (drop batch_size); -- === -- test for TRUNCATE -- === diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/
Re: Fast COPY FROM based on batch insert
On 8/7/2022 05:12, Ian Barwick wrote: ERROR: bind message supplies 0 parameters, but prepared statement "pgsql_fdw_prep_178" requires 6 CONTEXT: remote SQL command: INSERT INTO public.foo_part_1(t, v1, v2, v3, v4, v5) VALUES ($1, $2, $3, $4, $5, $6) COPY foo, line 88160 Thanks, I got it. MultiInsertBuffer are created on the first non-zero flush of tuples into the partition and isn't deleted from the buffers list until the end of COPY. And on a subsequent flush in the case of empty buffer we catch the error. Your fix is correct, but I want to propose slightly different change (see in attachment). -- regards, Andrey Lepikhov Postgres Professionaldiff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 245a260982..203289f7f2 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -329,7 +329,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo, resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize != NULL); /* Flush into foreign table or partition */ - do { + while(sent < nused) + { int size = (resultRelInfo->ri_BatchSize < nused - sent) ? resultRelInfo->ri_BatchSize : (nused - sent); int inserted = size; @@ -340,7 +341,7 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo, NULL, ); sent += size; - } while (sent < nused); + } } else {
Re: Fast COPY FROM based on batch insert
On 7/7/2022 06:14, Ian Barwick wrote: 2022年3月24日(木) 15:44 Andrey V. Lepikhov : > > On 3/22/22 06:54, Etsuro Fujita wrote: > > On Fri, Jun 4, 2021 at 5:26 PM Andrey Lepikhov > > wrote: > >> We still have slow 'COPY FROM' operation for foreign tables in current > >> master. > >> Now we have a foreign batch insert operation And I tried to rewrite the > >> patch [1] with this machinery. > > > > The patch has been rewritten to something essentially different, but > > no one reviewed it. (Tsunakawa-san gave some comments without looking > > at it, though.) So the right status of the patch is “Needs review”, > > rather than “Ready for Committer”? Anyway, here are a few review > > comments from me: > > > > * I don’t think this assumption is correct: > > > > @@ -359,6 +386,12 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo, > > (resultRelInfo->ri_TrigDesc->trig_insert_after_row || > > resultRelInfo->ri_TrigDesc->trig_insert_new_table)) > > { > > + /* > > + * AFTER ROW triggers aren't allowed with the foreign bulk insert > > + * method. > > + */ > > + Assert(resultRelInfo->ri_RelationDesc->rd_rel->relkind != > > RELKIND_FOREIGN_TABLE); > > + > > > > In postgres_fdw we disable foreign batch insert when the target table > > has AFTER ROW triggers, but the core allows it even in that case. No? > Agree > > > * To allow foreign multi insert, the patch made an invasive change to > > the existing logic to determine whether to use multi insert for the > > target relation, adding a new member ri_usesMultiInsert to the > > ResultRelInfo struct, as well as introducing a new function > > ExecMultiInsertAllowed(). But I’m not sure we really need such a > > change. Isn’t it reasonable to *adjust* the existing logic to allow > > foreign multi insert when possible? > Of course, such approach would look much better, if we implemented it. > I'll ponder how to do it. > > > I didn’t finish my review, but I’ll mark this as “Waiting on Author”. > I rebased the patch onto current master. Now it works correctly. I'll > mark it as "Waiting for review". I took a look at this patch as it would a useful optimization to have. It applies cleanly to current HEAD, but as-is, with a large data set, it reproducibly fails like this (using postgres_fdw): postgres=# COPY foo FROM '/tmp/fast-copy-from/test.csv' WITH (format csv); ERROR: bind message supplies 0 parameters, but prepared statement "pgsql_fdw_prep_19422" requires 6 CONTEXT: remote SQL command: INSERT INTO public.foo_part_1(t, v1, v2, v3, v4, v5) VALUES ($1, $2, $3, $4, $5, $6) COPY foo, line 17281589 This occurs because not all multi-insert buffers being flushed actually contain tuples; the fix is simply not to call ExecForeignBatchInsert() if that's the case, e.g: /* Flush into foreign table or partition */ do { int size = (resultRelInfo->ri_BatchSize < nused - sent) ? resultRelInfo->ri_BatchSize : (nused - sent); if (size) { int inserted = size; resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert(estate, resultRelInfo, [sent], NULL, ); sent += size; } } while (sent < nused); There might a case for arguing that the respective FDW should check that it has actually received some tuples to insert, but IMHO it's much preferable to catch this as early as possible and avoid a superfluous call. FWIW, with the above fix in place, with a simple local test the patch produces a consistent speed-up of about 8 times compared to the existing functionality. Thank you for the attention to the patch. I have a couple of questions: 1. It's a problem for me to reproduce the case you reported. Can you give more details on the reproduction? 2. Have you tried to use previous version, based on bulk COPY machinery, not bulk INSERT? Which approach looks better and have better performance in your opinion? -- regards, Andrey Lepikhov Postgres Professional
Re: Removing unneeded self joins
On 19/5/2022 16:47, Ronan Dunklau wrote: I'll take a look at that one. New version of the patch, rebased on current master: 1. pgindent over the patch have passed. 2. number of changed files is reduced. 3. Some documentation and comments is added. -- regards, Andrey Lepikhov Postgres ProfessionalFrom 9ce71f1d0ffefa9d77edfa30fc189bc0425ebbbe Mon Sep 17 00:00:00 2001 From: "Andrey V. Lepikhov" Date: Thu, 15 Jul 2021 15:26:13 +0300 Subject: [PATCH] Remove self-joins. A Self Join Removal (SJR) feature removes inner join of plane table to itself in a query plan if can be proved that the join can be replaced with a scan. The proof based on innerrel_is_unique machinery. We can remove a self-join when for each outer row: 1. At most one inner row matches the join clauses. 2. If the join target list contains any inner vars, an inner row must be (physically) the same row as the outer one. In this patch we use Rowley's [1] approach to identify a self-join: 1. Collect all mergejoinable join quals which look like a.x = b.x 2. Check innerrel_is_unique() for the qual list from (1). If it returns true, then outer row matches only the same row from the inner relation. 3. If uniqueness of outer relation deduces from baserestrictinfo clause like 'x=const' on unique field(s), check that inner has the same clause with the same constant(s). 4. Compare RowMarks of inner and outer relations. They must have the same strength. Some regression tests changed due to self-join removal logic. [1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com --- doc/src/sgml/config.sgml | 16 + src/backend/optimizer/path/indxpath.c | 39 + src/backend/optimizer/plan/analyzejoins.c | 1045 - src/backend/optimizer/plan/planmain.c |5 + src/backend/utils/misc/guc.c | 10 + src/include/optimizer/paths.h |3 + src/include/optimizer/planmain.h |6 + src/test/regress/expected/equivclass.out | 32 + src/test/regress/expected/join.out| 686 ++ src/test/regress/expected/sysviews.out|3 +- src/test/regress/sql/equivclass.sql | 16 + src/test/regress/sql/join.sql | 305 ++ src/tools/pgindent/typedefs.list |2 + 13 files changed, 2142 insertions(+), 26 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 48478b1024..2226117c62 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -5287,6 +5287,22 @@ ANY num_sync ( + enable_self_join_removal (boolean) + + enable_self_join_removal configuration parameter + + + + +Enables or disables the query planner's optimization which analyses +query tree and replaces self joins with semantically equivalent single +scans. Take into consideration only plain tables. +The default is on. + + + + enable_seqscan (boolean) diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 0ef70ad7f1..2eb05c79ce 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -3498,8 +3498,24 @@ bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel, List *restrictlist, List *exprlist, List *oprlist) +{ + return relation_has_unique_index_ext(root, rel, restrictlist, + exprlist, oprlist, NULL); +} + +/* + * relation_has_unique_index_ext + * if extra_clauses isn't NULL, return baserestrictinfo clauses which were + * used to derive uniqueness. + */ +bool +relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel, + List *restrictlist, + List *exprlist, List *oprlist, + List **extra_clauses) { ListCell *ic; + List *exprs; Assert(list_length(exprlist) == list_length(oprlist)); @@ -3554,6 +3570,8 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel, IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic); int c; + exprs = NIL; + /* * If the index is not unique, or not immediately enforced, or if it's * a partial index that doesn't match the query, it's useless here. @@ -3601,6 +3619,23 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel, if (match_index_to_operand(rexpr, c, ind)) { matched = true;
Re: Postgres do not allow to create many tables with more than 63-symbols prefix
On 6/27/22 06:38, Masahiko Sawada wrote: On Fri, Jun 24, 2022 at 2:12 PM Andrey Lepikhov wrote: On 6/23/22 07:03, Masahiko Sawada wrote: > On Sat, Jun 4, 2022 at 4:03 AM Andrey Lepikhov > wrote: >> It is very corner case, of course. But solution is easy and short. So, >> why not to fix? - See the patch in attachment. > > While this seems to be a good improvement, I think it's not a bug. > Probably we cannot backpatch it as it will end up having type names > defined by different naming rules. I'd suggest discussing it on > -hackers. Done. Thank for updating the patch. Please register this item to the next CF if not yet. Done [1]. > Regarding the patch, I think we can merge makeUniqueTypeName() to > makeArrayTypeName() as there is no caller of makeUniqueTypeName() who > pass tryOriginal = true. I partially agree with you. But I have one reason to leave makeUniqueTypeName() separated: It may be used in other codes with auto generated types. For example, I think, the DefineRelation routine should choose composite type instead of using the same name as the table. Okay. I have one comment on v2 patch: + for(;;) { - dest[i - 1] = '_'; - strlcpy(dest + i, typeName, NAMEDATALEN - i); - if (namelen + i >= NAMEDATALEN) - truncate_identifier(dest, NAMEDATALEN, false); - if (!SearchSysCacheExists2(TYPENAMENSP, - CStringGetDatum(dest), + CStringGetDatum(type_name), ObjectIdGetDatum(typeNamespace))) - return pstrdup(dest); + return type_name; + + /* Previous attempt was failed. Prepare a new one. */ + pfree(type_name); + snprintf(suffix, sizeof(suffix), "%d", ++pass); + type_name = makeObjectName("", typeName, suffix); } return NULL; I think it's better to break from the loop instead of returning from there. That way, we won't need "return NULL". Agree. Done. [1] https://commitfest.postgresql.org/38/3712/ -- Regards Andrey Lepikhov Postgres ProfessionalFrom 73b80676fff98e9edadab2576cf22778c6b5168b Mon Sep 17 00:00:00 2001 From: "Andrey V. Lepikhov" Date: Fri, 3 Jun 2022 21:40:01 +0300 Subject: [PATCH] Allow postgresql to generate more relations with the same 63 symbols long prefix. Rewrite the makeUniqueTypeName routine - generator of unique name is based on the same idea as the ChooseRelationName() routine. Authors: Dmitry Koval, Andrey Lepikhov --- src/backend/catalog/pg_type.c | 37 --- src/test/regress/expected/alter_table.out | 6 ++-- 2 files changed, 23 insertions(+), 20 deletions(-) diff --git a/src/backend/catalog/pg_type.c b/src/backend/catalog/pg_type.c index 03788cb975..405553013e 100644 --- a/src/backend/catalog/pg_type.c +++ b/src/backend/catalog/pg_type.c @@ -26,6 +26,7 @@ #include "catalog/pg_namespace.h" #include "catalog/pg_proc.h" #include "catalog/pg_type.h" +#include "commands/defrem.h" #include "commands/typecmds.h" #include "mb/pg_wchar.h" #include "miscadmin.h" @@ -936,17 +937,17 @@ makeMultirangeTypeName(const char *rangeTypeName, Oid typeNamespace) * makeUniqueTypeName * Generate a unique name for a prospective new type * - * Given a typeName, return a new palloc'ed name by prepending underscores - * until a non-conflicting name results. + * Given a typeName, return a new palloc'ed name by prepending underscore + * and (if needed) adding a suffix to the end of the type name. * * If tryOriginal, first try with zero underscores. */ static char * makeUniqueTypeName(const char *typeName, Oid typeNamespace, bool tryOriginal) { - int i; - int namelen; - char dest[NAMEDATALEN]; + int pass = 0; + char suffix[NAMEDATALEN]; + char *type_name; Assert(strlen(typeName) <= NAMEDATALEN - 1); @@ -956,23 +957,25 @@ makeUniqueTypeName(const char *typeName, Oid typeNamespace, bool tryOriginal) ObjectIdGetDatum(typeNamespace))) return pstrdup(typeName); + /* Prepare initial object name. Just for compatibility. */ + type_name = makeObjectName("", typeName, NULL); + /* - * The idea is to prepend underscores as needed until we make a name that - * doesn't collide with anything ... + * The idea of unique name generation is similar to ChooseRelationName() + * implementation logic. */ - namelen = strlen(typeName); - for (i = 1; i < NAMEDATALEN - 1; i++) + for(;;) { - dest[i - 1] = '_'; - strlcpy(dest + i, typeName, NAMEDATALEN - i); - if (namelen + i >= NAMEDATALEN) - truncate_identifier(dest, NAMEDATALEN, false); - if (!SearchSysCacheExists2(TYPENAMENSP, - CStringGetDatum(dest), + CStringGetDatum(type_name), ObjectIdGetDatum(typeName
Re: Implement hook for self-join simplification
On 24/6/2022 23:43, Leif Harald Karlsen wrote: Thank you for the quick answer, and for the pointer to the patch! This looks like just the thing I need! On a more general note: What would, in general, be the best way to implement such optimizations? Is there a good way to do this as an extension, or is a patch the preferred way? According to my experience, it depends on your needings. For example, self-join-removal feature, or my current project - flattening of nested subqueries - is much more optimal to implement as a patch, because you can do it so early as possible and can generalize parts of the core code and thus, reduce size of your code a lot. But if you want to use your code with many PG versions, even already working in production or you make just a research, without immediate practical result - your choice is an extension. -- regards, Andrey Lepikhov Postgres Professional
Re: Implement hook for self-join simplification
On 24/6/2022 18:58, Leif Harald Karlsen wrote: I have a made a small deductive database on top of PostgreSQL for educational/research purposes. In this setting, due to certain VIEW-constructions, queries often end up being self-joins on primary keys, e.g.: SELECT t1.id, t2.val FROM t AS t1 JOIN t AS t2 USING (id); where t(id) is a primary key. This query is equivalent to the much more efficient: SELECT id, val FROM t AS t1; However, PostgreSQL currently does not seem to implement this simplification. Therefore, I have looked into writing an extension that performs this, but I am struggling a bit with finding out when this simplification should be done, i.e. which hook I should implement. It is true, but you can use a proposed patch that adds such functionality [1]. I tried to reproduce your case: CREATE TABLE t(id int PRIMARY KEY, val text); explain verbose SELECT t1.id, t2.val FROM t AS t1 JOIN t AS t2 USING (id); With this patch you will get a plan: Seq Scan on public.t t2 Output: t2.id, t2.val Filter: (t2.id IS NOT NULL) The approach, implemented in this patch looks better because removes self-joins on earlier stage than the path generation stage. Feel free to use it in your research. [1] https://www.postgresql.org/message-id/a1d6290c-44e0-0dfc-3fca-66a68b310...@postgrespro.ru -- regards, Andrey Lepikhov Postgres Professional
Postgres do not allow to create many tables with more than 63-symbols prefix
Moved from the pgsql-bugs mailing list [1]. On 6/23/22 07:03, Masahiko Sawada wrote: > Hi, > > On Sat, Jun 4, 2022 at 4:03 AM Andrey Lepikhov > wrote: >> >> According to subj you can try to create many tables (induced by the case >> of partitioned table) with long prefix - see 6727v.sql for reproduction. >> But now it's impossible because of logic of the makeUniqueTypeName() >> routine. >> You get the error: >> ERROR: could not form array type name for type ... >> >> It is very corner case, of course. But solution is easy and short. So, >> why not to fix? - See the patch in attachment. > > While this seems to be a good improvement, I think it's not a bug. > Probably we cannot backpatch it as it will end up having type names > defined by different naming rules. I'd suggest discussing it on > -hackers. Done. > Regarding the patch, I think we can merge makeUniqueTypeName() to > makeArrayTypeName() as there is no caller of makeUniqueTypeName() who > pass tryOriginal = true. I partially agree with you. But I have one reason to leave makeUniqueTypeName() separated: It may be used in other codes with auto generated types. For example, I think, the DefineRelation routine should choose composite type instead of using the same name as the table. > Also looking at other ChooseXXXName() > functions, we don't care about integer overflow. Is it better to make > it consistent with them? Done. [1] https://www.postgresql.org/message-id/flat/121e286f-3796-c9d7-9eab-6fb8e0b9c701%40postgrespro.ru -- Regards Andrey Lepikhov Postgres ProfessionalFrom ae8552b8f9474a6fae893bce501c534cabcfaa13 Mon Sep 17 00:00:00 2001 From: "Andrey V. Lepikhov" Date: Fri, 3 Jun 2022 21:40:01 +0300 Subject: [PATCH] Allow postgresql to generate more relations with the same 63 symbols long prefix. Rewrite the makeUniqueTypeName routine - generator of unique name is based on the same idea as the ChooseRelationName() routine. Authors: Dmitry Koval, Andrey Lepikhov --- src/backend/catalog/pg_type.c | 35 --- src/test/regress/expected/alter_table.out | 6 ++-- 2 files changed, 22 insertions(+), 19 deletions(-) diff --git a/src/backend/catalog/pg_type.c b/src/backend/catalog/pg_type.c index 03788cb975..bae6998360 100644 --- a/src/backend/catalog/pg_type.c +++ b/src/backend/catalog/pg_type.c @@ -26,6 +26,7 @@ #include "catalog/pg_namespace.h" #include "catalog/pg_proc.h" #include "catalog/pg_type.h" +#include "commands/defrem.h" #include "commands/typecmds.h" #include "mb/pg_wchar.h" #include "miscadmin.h" @@ -936,17 +937,17 @@ makeMultirangeTypeName(const char *rangeTypeName, Oid typeNamespace) * makeUniqueTypeName * Generate a unique name for a prospective new type * - * Given a typeName, return a new palloc'ed name by prepending underscores - * until a non-conflicting name results. + * Given a typeName, return a new palloc'ed name by prepending underscore + * and (if needed) adding a suffix to the end of the type name. * * If tryOriginal, first try with zero underscores. */ static char * makeUniqueTypeName(const char *typeName, Oid typeNamespace, bool tryOriginal) { - int i; - int namelen; - char dest[NAMEDATALEN]; + int pass = 0; + char suffix[NAMEDATALEN]; + char *type_name; Assert(strlen(typeName) <= NAMEDATALEN - 1); @@ -956,22 +957,24 @@ makeUniqueTypeName(const char *typeName, Oid typeNamespace, bool tryOriginal) ObjectIdGetDatum(typeNamespace))) return pstrdup(typeName); + /* Prepare initial object name. Just for compatibility. */ + type_name = makeObjectName("", typeName, NULL); + /* - * The idea is to prepend underscores as needed until we make a name that - * doesn't collide with anything ... + * The idea of unique name generation is similar to ChooseRelationName() + * implementation logic. */ - namelen = strlen(typeName); - for (i = 1; i < NAMEDATALEN - 1; i++) + for(;;) { - dest[i - 1] = '_'; - strlcpy(dest + i, typeName, NAMEDATALEN - i); - if (namelen + i >= NAMEDATALEN) - truncate_identifier(dest, NAMEDATALEN, false); - if (!SearchSysCacheExists2(TYPENAMENSP, - CStringGetDatum(dest), + CStringGetDatum(type_name), ObjectIdGetDatum(typeNamespace))) - return pstrdup(dest); + return type_name; + + /* Previous attempt was failed. Prepare a new one. */ + pfree(type_name); + snprintf(suffix, sizeof(suffix), "%d", ++pass); + type_name = makeObjectName("", typeName, suffix); } return NULL; diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out index 5ede56d9b5..e634dbd86d 100644 --- a/src/test/regress/expected/alter_table.out +++ b/src/test/regress/expected/alter_table.out @@ -197
Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity
On 15/6/2022 21:45, Imseih (AWS), Sami wrote: Adding a plan_id to pg_stat_activity allows users to determine if a plan for a particular statement has changed and if the new plan is performing better or worse for a particular statement. There are several ways the plan_id in pg_stat_activity In general, your idea is quite useful. But, as discussed earlier [1] extensions would implement many useful things if they could add into a plan some custom data. Maybe implement your feature with some private list of nodes in plan structure instead of single-purpose plan_id field? [1] https://www.postgresql.org/message-id/flat/e0de3423-4bba-1e69-c55a-f76bf18dbd74%40postgrespro.ru -- regards, Andrey Lepikhov Postgres Professional
Re: Compare variables of composite type with slightly different column types
On 26/5/2022 14:25, Andrey Lepikhov wrote: I guess, here the compatible_oper() routine can be used to find a appropriate operator, or something like that can be invented. I looked into the 2cd7084 and a4424c5, but don't found any rationale. In accordance to this idea I prepared a code. For a demo only. -- regards, Andrey Lepikhov Postgres ProfessionalFrom 71cf1a9148b20921aa60d7ae2062620c515557f8 Mon Sep 17 00:00:00 2001 From: "Andrey V. Lepikhov" Date: Sat, 28 May 2022 22:01:01 +0300 Subject: [PATCH] Try to search compatible operator if columns of compared rows is differ. --- src/backend/utils/adt/rowtypes.c | 36 +++--- src/test/regress/expected/rowtypes.out | 11 +--- src/test/regress/sql/rowtypes.sql | 2 +- 3 files changed, 42 insertions(+), 7 deletions(-) diff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c index db843a0fbf..432b44ec8c 100644 --- a/src/backend/utils/adt/rowtypes.c +++ b/src/backend/utils/adt/rowtypes.c @@ -18,14 +18,17 @@ #include "access/detoast.h" #include "access/htup_details.h" +#include "catalog/pg_collation.h" #include "catalog/pg_type.h" #include "common/hashfn.h" #include "funcapi.h" #include "libpq/pqformat.h" #include "miscadmin.h" +#include "parser/parse_oper.h" #include "utils/builtins.h" #include "utils/datum.h" #include "utils/lsyscache.h" +#include "utils/syscache.h" #include "utils/typcache.h" @@ -1071,6 +1074,8 @@ record_eq(PG_FUNCTION_ARGS) int i1; int i2; int j; + Oid op_func = InvalidOid; + FmgrInfoop_func_finfo; check_stack_depth();/* recurses for record-type columns */ @@ -1173,12 +1178,27 @@ record_eq(PG_FUNCTION_ARGS) * Have two matching columns, they must be same type */ if (att1->atttypid != att2->atttypid) - ereport(ERROR, + { + Operator op; + + op = compatible_oper(NULL, list_make1(makeString("=")), + att1->atttypid, att2->atttypid, false, -1); + + if (!op) + ereport(ERROR, (errcode(ERRCODE_DATATYPE_MISMATCH), errmsg("cannot compare dissimilar column types %s and %s at record column %d", format_type_be(att1->atttypid), format_type_be(att2->atttypid), j + 1))); + else + { + op_func = oprfuncid(op); + fmgr_info(op_func, _func_finfo); + ReleaseSysCache(op); + } + + } /* * If they're not same collation, we don't complain here, but the @@ -1217,8 +1237,18 @@ record_eq(PG_FUNCTION_ARGS) } /* Compare the pair of elements */ - InitFunctionCallInfoData(*locfcinfo, >eq_opr_finfo, 2, - collation, NULL, NULL); + if (!OidIsValid(op_func)) + /* Compare the pair of elements */ + InitFunctionCallInfoData(*locfcinfo, >eq_opr_finfo, 2, + collation, NULL, NULL); + else + { + /* XXX: Some collation matching logic needed. */ + collation = (collation != InvalidOid) ? collation : DEFAULT_COLLATION_OID; + InitFunctionCallInfoData(*locfcinfo, _func_finfo, 2, + collation, NULL, NULL); + } + locfcinfo->args[0].value = values1[i1]; locfcinfo->args[0].isnull = false; locfcinfo->args[1].value = values2[i2]; diff --git a/src/test/regress/expected/rowtypes.out b/src/test/regress/expected/rowtypes.out index a4cc2d8c12..8a3c4e9454 100644 --- a/src/test/regress/expected/rowtypes.out +++ b/src/test/regress/expected/rowtypes.out @@ -423,8 +423,12 @@ select a,b from test_table where (a,b) > ('a','a') order by a,b; reset enable_sort; -- Check row compar
Compare variables of composite type with slightly different column types
Hi, Researching on join selectivity improvement I stuck into the code in rowtypes.c: /* * Have two matching columns, they must be same type */ if (att1->atttypid != att2->atttypid) ereport(ERROR, ... Why, for example, isn't allowed next trivial query: SELECT * FROM (SELECT ROW(1::integer, 'robert'::text)) AS s1, (SELECT ROW(1::bigint, 'robert'::name)) AS s2 WHERE s1 = s2; I guess, here the compatible_oper() routine can be used to find a appropriate operator, or something like that can be invented. I looked into the 2cd7084 and a4424c5, but don't found any rationale. -- Regards Andrey Lepikhov Postgres Professional
Re: Removing unneeded self joins
On 5/17/22 19:14, Ronan Dunklau wrote: Le vendredi 13 mai 2022, 07:07:47 CEST Andrey Lepikhov a écrit : New version of the feature. Here a minor bug with RowMarks is fixed. A degenerated case is fixed, when uniqueness of an inner deduced not from join quals, but from a baserestrictinfo clauses 'x=const', where x - unique field. Code, dedicated to solve second issue is controversial, so i attached delta.txt for quick observation. Maybe we should return to previous version of code, when we didn't split restriction list into join quals and base quals? Hello, I tried to find problematic cases, which would make the planning time grow unacceptably, and couldn't devise it. The worst case scenario I could think of was as follows: - a query with many different self joins - an abundance of unique indexes on combinations of this table columns to consider - additional predicates on the where clause on columns. Looking into the patch I can imagine, that the most difficult case is when a set of relations with the same OID is huge, but only small part of them (or nothing) can be removed. Also, removing a clause from restrictinfo list or from equivalence class adds non-linear complexity. So, you can dig this way ). The base table I used for this was a table with 40 integers. 39 unique indexes were defined on every combination of (c1, cX) with cX being columns c2 to c40. I turned geqo off, set from_collapse_limit and join_collapse_limit to unreasonably high values (30), and tried to run queries of the form: SELECT * FROM test_table t1 JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX ... JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX. So no self join can be eliminated in that case. I think, you should compare t1.cX with tX.cX to eliminate self-join. Cross-unique-index proof isn't supported now. The performance was very similar with or without the GUC enabled. I tested the same thing without the patch, since the test for uniqueness has been slightly altered and incurs a new allocation, but it doesn't seem to change. One interesting side effect of this patch, is that removing any unneeded self join cuts down the planification time very significantly, as we lower the number of combinations to consider. Even more - removing a join we improve cardinality estimation. As for the code: - Comments on relation_has_unique_index_ext and relation_has_unique_index_for should be rewritten, as relation_has_unique_index_for is now just a special case of relation_has_unique_index_ext. By the way, the comment would probably be better read as: "but if extra_clauses isn't NULL". - The whole thing about "extra_clauses", ie, baserestrictinfos which were used to determine uniqueness, is not very clear. Most functions where the new argument has been added have not seen an update in their comments, and the name itself doesn't really convey the intented meaning: perhaps required_non_join_clauses ? The way this works should be explained a bit more thoroughly, for example in remove_self_joins_one_group the purpose of uclauses should be explained. The fact that degenerate_case returns true when we don't have any additional base restrict info is also confusing, as well as the degenerate_case name. Agree, but after this case thoughts wander in my head: should we make one step back to pre-[1] approach? It looks like we have quite similar changes, but without special function for a 'degenerate case' detection and restrictlist splitting. I'll update if I think of more interesting things to add. Thank you for your efforts! See in attachment next version which fixes mistakes detected by z...@yugabyte.com. [1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com -- Regards Andrey Lepikhov Postgres ProfessionalFrom 1774b4bcfe337b151b76c7f357dc6748755bf1d9 Mon Sep 17 00:00:00 2001 From: "Andrey V. Lepikhov" Date: Thu, 15 Jul 2021 15:26:13 +0300 Subject: [PATCH] Remove self-joins. A Self Join Removal (SJR) feature removes inner join of plane table to itself in a query plan if can be proved that the join can be replaced with a scan. The proof based on innerrel_is_unique machinery. We can remove a self-join when for each outer row: 1. At most one inner row matches the join clauses. 2. If the join target list contains any inner vars, an inner row must be (physically) the same row as the outer one. In this patch we use Rowley's [1] approach to identify a self-join: 1. Collect all mergejoinable join quals which look like a.x = b.x 2. Check innerrel_is_unique() for the qual list from (1). If it returns true, then outer row matches only the same row from the inner relation. 3. If uniqueness of outer relation deduces from baserestrictinfo clause like 'x=const' on unique field(s), check that inner has the same clause with the same constant(s). 4. Compare RowMarks of inner and outer relations. They must
Re: Removing unneeded self joins
On 1/3/2022 03:03, Greg Stark wrote: On Thu, 1 Jul 2021 at 02:38, Ronan Dunklau wrote: Well in some cases they can't, when the query is not emitting redundant predicates by itself but they are added by something else like a view or a RLS policy. Maybe it would be worth it to allow spending a bit more time planning for those cases ? Yeah, I'm generally in favour of doing more work in the optimizer to save query authors work writing queries. My question is whether it handles cases like: select b.x,c.y from t join t2 as b on (b.id = t.id) join t2 as c on (c.id = t.id) That is, if you join against the same table twice on the same qual. Does the EC mechanism turn this into a qual on b.id = c.id and then turn this into a self-join that can be removed? Yes, the self-join removal machinery uses EC mechanism as usual to get all join clauses. So, this case works (See demo in attachment). Also, in new version of the patch I fixed one stupid bug: checking a self-join candidate expression operator - we can remove only expressions like F(arg1) = G(arg2). -- regards, Andrey Lepikhov Postgres ProfessionalFrom 70398361a0a0d9c6c3c7ddd1fd305ac11138e7b1 Mon Sep 17 00:00:00 2001 From: "Andrey V. Lepikhov" Date: Thu, 15 Jul 2021 15:26:13 +0300 Subject: [PATCH] Remove self-joins. Remove inner joins of a relation to itself if could prove that the join can be replaced with a scan. We can prove the uniqueness using the existing innerrel_is_unique machinery. We can remove a self-join when for each outer row: 1. At most one inner row matches the join clauses. 2. If the join target list contains any inner vars, an inner row must be (physically) the same row as the outer one. In this patch we use Rowley's [1] approach to identify a self-join: 1. Collect all mergejoinable join quals which look like a.x = b.x 2. Check innerrel_is_unique() for the qual list from (1). If it returns true, then outer row matches only the same row from the inner relation. So proved, that this join is self-join and can be replaced by a scan. Some regression tests changed due to self-join removal logic. [1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com --- src/backend/optimizer/plan/analyzejoins.c | 888 +- src/backend/optimizer/plan/planmain.c | 5 + src/backend/optimizer/util/joininfo.c | 3 + src/backend/optimizer/util/relnode.c | 26 +- src/backend/utils/misc/guc.c | 10 + src/include/optimizer/pathnode.h | 4 + src/include/optimizer/planmain.h | 2 + src/test/regress/expected/equivclass.out | 32 + src/test/regress/expected/join.out| 426 +++ src/test/regress/expected/sysviews.out| 3 +- src/test/regress/sql/equivclass.sql | 16 + src/test/regress/sql/join.sql | 197 + 12 files changed, 1583 insertions(+), 29 deletions(-) diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c index 337f470d58..c5ac8e2bd4 100644 --- a/src/backend/optimizer/plan/analyzejoins.c +++ b/src/backend/optimizer/plan/analyzejoins.c @@ -22,6 +22,7 @@ */ #include "postgres.h" +#include "catalog/pg_class.h" #include "nodes/nodeFuncs.h" #include "optimizer/clauses.h" #include "optimizer/joininfo.h" @@ -32,10 +33,12 @@ #include "optimizer/tlist.h" #include "utils/lsyscache.h" +bool enable_self_join_removal; + /* local functions */ static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo); static void remove_rel_from_query(PlannerInfo *root, int relid, - Relids joinrelids); + Relids joinrelids, int subst_relid); static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved); static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel); static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, @@ -47,6 +50,9 @@ static bool is_innerrel_unique_for(PlannerInfo *root, RelOptInfo *innerrel, JoinType jointype, List *restrictlist); +static void change_rinfo(RestrictInfo* rinfo, Index from, Index to); +static Bitmapset* change_relid(Relids relids, Index oldId, Index newId); +static void change_varno(Expr *expr, Index oldRelid, Index newRelid); /* @@ -86,7 +92,7 @@ restart: remove_rel_from_query(root, innerrelid, bms_union(sjinfo->min_lefthand, - sjinfo->min_righthand)); +
Re: Merging statistics from children instead of re-sampling everything
On 21/1/2022 01:25, Tomas Vondra wrote: But I don't have a very good idea what to do about statistics that we can't really merge. For some types of statistics it's rather tricky to reasonably merge the results - ndistinct is a simple example, although we could work around that by building and merging hyperloglog counters. I think, as a first step on this way we can reduce a number of pulled tuples. We don't really needed to pull all tuples from a remote server. To construct a reservoir, we can pull only a tuple sample. Reservoir method needs only a few arguments to return a sample like you read tuples locally. Also, to get such parts of samples asynchronously, we can get size of each partition on a preliminary step of analysis. In my opinion, even this solution can reduce heaviness of a problem drastically. -- regards, Andrey Lepikhov Postgres Professional
Re: Multiple Query IDs for a rewritten parse tree
On 29/1/2022 12:51, Julien Rouhaud wrote: 4. We should reserve position of default in-core generator From the discussion above I was under the impression that the core generator should be distinguished by a predefined kind. I don't really like this approach. IIUC, this patch as-is is meant to break pg_stat_statements extensibility. If kind == 0 is reserved for in-core queryid then you can't use pg_stat_statement with a different queryid generator anymore. Yes, it is one more problem. Maybe if we want to make it extensible, we could think about hooks in the pg_stat_statements too? The patch also reserves kind == -1 for pg_stat_statements internal purpose, and I'm not really sure why that's needed. My idea - tags with positive numbers are reserved for generation results, that is performance-critical. As I see during the implementation, pg_stat_statements makes additional changes on queryId (no matter which ones). Because our purpose is to avoid interference in this place, I invented negative values, where extensions can store their queryIds, based on any generator or not. Maybe it is redundant - main idea here was to highlight the issue. I'm also unsure of how are extensions supposed to cooperate in general, as I feel that queryids should be implemented for some "intent" (like monitoring, planning optimization...). That being said I'm not sure if e.g. AQO heuristics are too specific for its need or if it could be shared with other extension that might be dealing with similar concerns. I think, it depends on a specific purpose of an extension. -- regards, Andrey Lepikhov Postgres Professional
Re: POC: GROUP BY optimization
On 22/1/2022 01:34, Tomas Vondra wrote: The other thing we could do is reduce the coefficient gradually - so it'd be 1.5 for the first pathkey, then 1.25 for the next one, and so on. But it seems somewhat arbitrary (I certainly don't have some sound theoretical justification ...). I think, it hasn't a reason to increase complexity without any theory at the bottom. Simple solution allows to realize problems much faster, if they arise. ... I've skipped the path_save removal in planner.c, because that seems incorrect - if there are multiple pathkeys, we must start with the original path, not the modified one we built in the last iteration. Or am I missing something You are right, I misunderstood the idea of path_save variable. -- regards, Andrey Lepikhov Postgres Professional
Re: POC: GROUP BY optimization
On 7/22/21 3:58 AM, Tomas Vondra wrote: 4) I'm not sure it's actually a good idea to pass tuplesPerPrevGroup to estimate_num_groups_incremental. In principle yes, if we use "group size" from the previous step, then the returned value is the number of new groups after adding the "new" pathkey. But even if we ignore the issues with amplification mentioned in (3), there's an issue with non-linear behavior in estimate_num_groups, because at some point it's calculating D(N,n,p) = n * (1 - ((N-p)/N)^(N/n)) where N - total rows, p - sample size, n - number of distinct values. And if we have (N1,n1) and (N2,n2) then the ratio of calculated estimated (which is pretty much what calculating group size does) D(N2,n2,p2) / D(N1,n1,p1) which will differ depending on p1 and p2. And if we're tweaking the tuplesPerPrevGroup all the time, that's really annoying, as it may make the groups smaller or larger, which is unpredictable and annoying, and I wonder if it might go against the idea of penalizing tuplesPerPrevGroup to some extent. We could simply use the input "tuples" value here, and then divide the current and previous estimate to calculate the number of new groups. tuplesPerPrevGroup is only a top boundary for the estimation. I think, we should use result of previous estimation as a limit for the next during incremental estimation. Maybe we simply limit the tuplesPerPrevGroup value to ensure of monotonic non-growth of this value? - see in attachment patch to previous fixes. -- regards, Andrey Lepikhov Postgres Professionaldiff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index 70af9c91d5..4e26cd275d 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -2025,8 +2025,10 @@ compute_cpu_sort_cost(PlannerInfo *root, List *pathkeys, int nPresortedKeys, /* * Real-world distribution isn't uniform but now we don't have a way to * determine that, so, add multiplier to get closer to worst case. +* Guard this value against irrational decisions. */ - tuplesPerPrevGroup = ceil(1.5 * tuplesPerPrevGroup / nGroups); + tuplesPerPrevGroup = Min(tuplesPerPrevGroup, +ceil(1.5 * tuplesPerPrevGroup / nGroups)); /* * We could skip all following columns for cost estimation, because we
Re: Multiple Query IDs for a rewritten parse tree
On 10/1/2022 15:39, Julien Rouhaud wrote: On Mon, Jan 10, 2022 at 03:24:46PM +0500, Andrey Lepikhov wrote: On 10/1/2022 13:56, Julien Rouhaud wrote: Yes. the same input query string doesn't prove that frozen query plan can be used, because rewrite rules could be changed. So we use only a query tree. Yes, I'm fully aware of that. I wasn't asking about using the input query string but the need for generating a dedicated normalized output query string that would be potentially different from the one generated by pg_stat_statements (or similar). Thanks, now I got it. I don't remember a single situation where we would need to normalize a query string. But then, if generating 2 queryid is a better for performance, does the extension really need an additional jumble state and/or normalized query string? Additional Jumble state isn't necessary, but it would be better for performance to collect pointers to all constant nodes during a process of hash generation. I agree, but it's even better for performance if this is collected only once. I still don't know if this extension (or any extension) would require something different from a common jumble state that would serve for all purpose or if we can work with a single jumble state and only handle multiple queryid. I think, JumbleState in highly monitoring-specific, maybe even pg_stat_statements specific (maybe you can designate another examples). I haven't ideas how it can be used in arbitrary extension. But, it is not so difficult to imagine an implementation (as part of Tom's approach, described earlier) such kind of storage for each generation method. -- regards, Andrey Lepikhov Postgres Professional
Re: Multiple Query IDs for a rewritten parse tree
On 10/1/2022 13:56, Julien Rouhaud wrote: On Mon, Jan 10, 2022 at 12:37:34PM +0500, Andrey V. Lepikhov wrote: I think, pg_stat_statements can live with an queryId generator of the sr_plan extension. But It replaces all constants with $XXX parameter at the query string. In our extension user defines which plan is optimal and which constants can be used as parameters in the plan. I don't know the details of that extension, but I think that it should work as long as you have the constants information in the jumble state, whatever the resulting normalized query string is right? Yes. the same input query string doesn't prove that frozen query plan can be used, because rewrite rules could be changed. So we use only a query tree. Here we must have custom jumbling implementation. queryId in this extension defines two aspects: 1. How many input queries will be compared with a query tree template of the frozen statement. 2. As a result, performance overheads on unsuccessful comparings. One drawback I see here - creating or dropping of my extension changes behavior of pg_stat_statements that leads to distortion of the DB load profile. Also, we haven't guarantees, that another extension will work correctly (or in optimal way) with such queryId. But then, if generating 2 queryid is a better for performance, does the extension really need an additional jumble state and/or normalized query string? Additional Jumble state isn't necessary, but it would be better for performance to collect pointers to all constant nodes during a process of hash generation. -- regards, Andrey Lepikhov Postgres Professional
Multiple Query IDs for a rewritten parse tree
On 5/1/2022 10:13, Tom Lane wrote: > I feel like we need to get away from the idea that there is just > one query hash, and somehow let different extensions attach > differently-calculated hashes to a query. I don't have any immediate > ideas about how to do that in a reasonably inexpensive way. Now, queryId field represents an query class (depending on an jumbling implementation). It is used by extensions as the way for simple tracking a query from a parse tree creation point to the end of its life along all hook calls, which an extension uses (remember about possible plan caching). I know at least two different cases of using queryId: 1) for monitoring purposes - pg_stat_statements is watching how often queries of a class emerge in the database and collects a stat on each class. 2) adaptive purposes - some extensions influence a planner decision during the optimization stage and want to learn on a performance shift at the end of execution stage. Different purposes may require different jumbling implementations. But users can want to use such extensions at the same time. So we should allow to attach many different query IDs to a query (maybe better to call it as 'query label'?). Thinking for a while I invented three different ways to implement it: 1. queryId will be a trivial 64-bit counter. So, each extension can differ each query from any other, track it along all hooks, use an jumbling code and store an queryId internally. Here only one big problem I see - increasing overhead in the case of many consumers of queryId feature. 2. Instead of simple queryId we can store a list of pairs (QueryId, funcOid). An extension can register a callback for queryId generation and the core will form a list of queryIds right after an query tree rewriting. funcOid is needed to differ jumbling implementations. Here we should invent an additional node type for an element of the list. 3. Instead of queryId we could add a multi-purpose 'private' list in the Query struct. Any extension can add to this list additional object(s) (with registered object type, of course). As an example, i can imagine a kind of convention for queryIds in such case - store a String node with value: ' - '. This way we should implement a registered callback mechanism too. I think, third way is the cheapest, flexible and simple for implementation. Any thoughts, comments, criticism ? -- regards, Andrey Lepikhov Postgres Professional
Re: Add index scan progress to pg_stat_progress_vacuum
On 21/12/2021 00:05, Peter Geoghegan wrote: * Some index AMs don't work like nbtree and GiST in that they cannot do their scan sequentially -- they have to do something like a logical/keyspace order scan instead, which is *totally* different to heapam (not just a bit different). There is no telling how many times each page will be accessed in these other index AMs, and in what order, even under optimal conditions. We should arguably not even try to provide any granular progress information here, since it'll probably be too messy. Maybe we could add callbacks into AM interface for send/receive/representation implementation of progress? So AM would define a set of parameters to send into stat collector and show to users. -- regards, Andrey Lepikhov Postgres Professional
Re: Clarifying/rationalizing Vars' varno/varattno/varnoold/varoattno
On 22/12/2021 20:42, Tom Lane wrote: Andrey Lepikhov writes: On 5/2/2020 01:24, Tom Lane wrote: I've not written any actual code, but am close to being ready to. This thread gives us hope to get started on solving some of the basic planner problems. But there is no activity for a long time, as I see. Have You tried to implement this idea? Is it actual now? It's been on the back burner for awhile :-(. I've not forgotten about it though. I would try to develop this feature. Idea is clear for me, but definition of the NullableVars structure is not obvious. Maybe you can prepare a sketch of this struct or you already have some draft code? -- regards, Andrey Lepikhov Postgres Professional
Re: Clarifying/rationalizing Vars' varno/varattno/varnoold/varoattno
On 5/2/2020 01:24, Tom Lane wrote: I've not written any actual code, but am close to being ready to. This thread gives us hope to get started on solving some of the basic planner problems. But there is no activity for a long time, as I see. Have You tried to implement this idea? Is it actual now? -- regards, Andrey Lepikhov Postgres Professional
Re: Make query ID more portable
On 14/10/21 10:40, Julien Rouhaud wrote: On Thu, Oct 14, 2021 at 12:37 PM Andrey Lepikhov wrote: On 12/10/21 18:45, Bruce Momjian wrote: On Tue, Oct 12, 2021 at 09:40:47AM -0400, Tom Lane wrote: Andrey Lepikhov writes: I think that there are just too many arbitrary decisions that could be made on what exactly should be a query identifier to have a single in-core implementation. Yes, and I use such custom decision too. But core jumbling code implements good idea and can be generalized for reuse. Patch from previous letter and breaking down of JumbleState can allow coders to implement their codes based on queryjumble.c module with smaller changes. If you do sharding, you already have to properly configure each node, so configuring your custom query id extension shouldn't be a big problem. My project is about adaptive query optimization techniques. It is not obvious how to match (without a field in Query struct) a post parse and an execution phases because of nested queries. Also, if we use queryId in an extension, we interfere with pg_stat_statements. -- regards, Andrey Lepikhov Postgres Professional
Re: Make query ID more portable
On 12/10/21 18:40, Tom Lane wrote: Andrey Lepikhov writes: But core jumbling code is good, fast and much easier in support. A bigger issue is that query ID stability isn't something we are going to promise on a large scale --- for example, what if a new release adds some new fields to struct Query? So I'm not sure that "query IDs should survive dump/reload" is a useful goal to consider. It's certainly not something that could be reached by anything even remotely like the existing code. Thank you for the explanation. I think the problem of queryId is that is encapsulates two different meanings: 1. It allows an extension to match an query on post parse and execution stages. In this sense, queryId should be as unique as possible for each query. 2. For pg_stat_statements purposes (and my project too) it represents an query class and should be stable against permutations of range table entries, clauses, e.t.c. For example: "SELECT * FROM a,b;" and "SELECT * FROM b,a;" should have the same queryId. This issue may be solved in an extension with next approach: 1. Force as unique value for queryId as extension wants in a post parse hook 2. Generalize the JumbleQuery routine code to generate a kind of query class signature. The attached patch is a first sketch for such change. -- regards, Andrey Lepikhov Postgres ProfessionalFrom ef4033935b25c90fd8c5b6f10a0646a7ede385e3 Mon Sep 17 00:00:00 2001 From: "Andrey V. Lepikhov" Date: Wed, 13 Oct 2021 10:22:53 +0500 Subject: [PATCH] Add callback for jumbling of an OID value. Query jumbling machinery depends on OIDs of objects. It is fine for specific instance, because allows to survive an object rename. But it is bad for a cross instance case. Even if you have the same code version, the same schema, you can't guarantee stability of oids. So, an extension can't rely on this technique This patch changes an interface of the JumbleQuery routine to allow an extension to use it for generation of an query signature and use specific algorithm for oids jumbling. --- src/backend/commands/explain.c | 6 +-- src/backend/parser/analyze.c | 12 ++--- src/backend/tcop/postgres.c | 6 +-- src/backend/utils/misc/queryjumble.c | 81 ++-- src/include/utils/queryjumble.h | 14 - 5 files changed, 67 insertions(+), 52 deletions(-) diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 10644dfac4..b2f10e828e 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -166,7 +166,7 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, { ExplainState *es = NewExplainState(); TupOutputState *tstate; - JumbleState *jstate = NULL; + JumbleState jstate; Query *query; List *rewritten; ListCell *lc; @@ -246,10 +246,10 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, query = castNode(Query, stmt->query); if (IsQueryIdEnabled()) - jstate = JumbleQuery(query, pstate->p_sourcetext); + query->queryId = JumbleQuery(query, pstate->p_sourcetext, NULL, ); if (post_parse_analyze_hook) - (*post_parse_analyze_hook) (pstate, query, jstate); + (*post_parse_analyze_hook) (pstate, query, ); /* * Parse analysis was done already, but we still have to run the rule diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c index 146ee8dd1e..f9c3991a94 100644 --- a/src/backend/parser/analyze.c +++ b/src/backend/parser/analyze.c @@ -113,7 +113,7 @@ parse_analyze(RawStmt *parseTree, const char *sourceText, { ParseState *pstate = make_parsestate(NULL); Query *query; - JumbleState *jstate = NULL; + JumbleState jstate; Assert(sourceText != NULL); /* required as of 8.4 */ @@ -127,10 +127,10 @@ parse_analyze(RawStmt *parseTree, const char *sourceText, query = transformTopLevelStmt(pstate, parseTree); if (IsQueryIdEnabled()) - jstate = JumbleQuery(query, sourceText); + query->queryId = JumbleQuery(query, sourceText, NULL, ); if (post_parse_analyze_hook) - (*post_parse_analyze_hook) (pstate, query, jstate); + (*post_parse_analyze_hook) (pstate, query, ); free_parsestate(pstate); @@ -152,7 +152,7 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText, { ParseState *pstate = make_parsestate(NULL); Query *query; - JumbleState *jstate = NULL; + JumbleState jstate; Assert(sourceText != NULL); /* required as of 8.4 */ @@ -166,10 +166,10 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText, check_variable_parameters(pstate, query); if (IsQueryIdEnabled()) - jstate = JumbleQuery(query, sourceText); +
Re: Make query ID more portable
On 12/10/21 18:45, Bruce Momjian wrote: On Tue, Oct 12, 2021 at 09:40:47AM -0400, Tom Lane wrote: Andrey Lepikhov writes: But core jumbling code is good, fast and much easier in support. Also, the current code handles renames of schemas and objects, but this would not. Yes, It is good option if an extension works only in the context of one node. But my efforts are directed to the cross-instance usage of a monitoring data. As an example, it may be useful for sharding. Also, I guess for an user is essential to think that if he changed a name of any object he also would change queries and reset monitoring data, related on this object. -- regards, Andrey Lepikhov Postgres Professional
Re: Make query ID more portable
On 12/10/21 13:35, Julien Rouhaud wrote: Hi, On Tue, Oct 12, 2021 at 4:12 PM Andrey V. Lepikhov wrote: See the patch in attachment as an POC. Main idea here is to break JumbleState down to a 'clocations' part that can be really interested in a post parse hook and a 'context data', that needed to build query or subquery signature (hash) and, I guess, isn't really needed in any extensions. There have been quite a lot of threads about that in the past, and almost every time people wanted to change how the hash was computed. So it seems to me that extensions would actually be quite interested in that. This is even more the case now that an extension can be used to replace the queryid calculation only and keep the rest of the extension relying on it as is. Yes, I know. I have been using such self-made queryID for four years. And I will use it further. But core jumbling code is good, fast and much easier in support. The purpose of this work is extending of jumbling to use in more flexible way to avoid meaningless copying of this code to an extension. I think, it adds not much complexity and overhead. I think the biggest change in your patch is: case RTE_RELATION: - APP_JUMB(rte->relid); - JumbleExpr(jstate, (Node *) rte->tablesample); + { + char *relname = regclassout_ext(rte->relid, true); + + APP_JUMB_STRING(relname); + JumbleExpr(jstate, (Node *) rte->tablesample, ctx); APP_JUMB(rte->inh); break; Have you done any benchmark on OLTP workload? Adding catalog access there is likely to add significant overhead. Yes, I should do benchmarking. But I guess, main goal of Query ID is monitoring, that can be switched off, if necessary. This part made for a demo. It can be replaced by a hook, for example. Also, why only using the fully qualified relation name for stable hashes? At least operators and functions should also be treated the same way. If you do that you will probably have way too much overhead to be usable in a busy production environment. Why not using the new possibility of 3rd party extension for the queryid calculation that exactly suits your need? I fully agree with these arguments. This code is POC. Main part here is breaking down JumbleState, using a local context for subqueries and sorting of a range table entries hashes. I think, we can call one routine (APP_JUMB_OBJECT(), as an example) for all oids in this code. It would allow an extension to intercept this call and replace oid with an arbitrary value. -- regards, Andrey Lepikhov Postgres Professional
Re: Increase value of OUTER_VAR
On 14/9/21 16:37, Aleksander Alekseev wrote: Hi Andrey, only performance issues That's interesting. Any chance you could share the hardware description, the configuration file, and steps to reproduce with us? I didn't control execution time exactly. Because it is a join of two empty tables. As I see, this join used most part of 48GB RAM memory, planned all day on a typical 6 amd cores computer. I guess this is caused by sequental traversal of the partition list in some places in the optimizer. If it makes practical sense, I could investigate reasons for such poor performance. -- regards, Andrey Lepikhov Postgres Professional
Re: Asymmetric partition-wise JOIN
On 14/9/21 11:37, Andrey V. Lepikhov wrote: Thank you for this good catch! The problem was in the adjust_child_relids_multilevel routine. The tmp_result variable sometimes points to original required_outer. This patch adds new ways which optimizer can generate plans. One possible way is optimizer reparameterizes an inner by a plain relation from the outer (maybe as a result of join of the plain relation and partitioned relation). In this case we have to compare tmp_result with original pointer to realize, it was changed or not. The patch in attachment fixes this problem. Additional regression test added. I thought more and realized there isn't necessary to recurse in the adjust_child_relids_multilevel() routine if required_outer contains only normal_relids. Also, regression tests were improved a bit. -- regards, Andrey Lepikhov Postgres Professional >From ee627d07282629a85785a63341cf875bfb0decb2 Mon Sep 17 00:00:00 2001 From: Andrey Lepikhov Date: Fri, 2 Apr 2021 11:02:20 +0500 Subject: [PATCH] Asymmetric partitionwise join. Teach optimizer to consider partitionwise join of non-partitioned table with each partition of partitioned table. Disallow asymmetric machinery for joining of two partitioned (or appended) relations because it could cause huge consumption of CPU and memory during reparameterization of NestLoop path. Change logic of the multilevel child relids adjustment, because this feature allows the optimizer to plan in new way. --- src/backend/optimizer/path/joinpath.c| 9 + src/backend/optimizer/path/joinrels.c| 187 src/backend/optimizer/plan/setrefs.c | 17 +- src/backend/optimizer/util/appendinfo.c | 44 +- src/backend/optimizer/util/pathnode.c| 9 +- src/backend/optimizer/util/relnode.c | 19 +- src/include/optimizer/paths.h| 7 +- src/test/regress/expected/partition_join.out | 425 +++ src/test/regress/sql/partition_join.sql | 180 9 files changed, 867 insertions(+), 30 deletions(-) diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c index 6407ede12a..32618ebbd5 100644 --- a/src/backend/optimizer/path/joinpath.c +++ b/src/backend/optimizer/path/joinpath.c @@ -335,6 +335,15 @@ add_paths_to_joinrel(PlannerInfo *root, if (set_join_pathlist_hook) set_join_pathlist_hook(root, joinrel, outerrel, innerrel, jointype, ); + + /* +* 7. If outer relation is delivered from partition-tables, consider +* distributing inner relation to every partition-leaf prior to +* append these leafs. +*/ + try_asymmetric_partitionwise_join(root, joinrel, + outerrel, innerrel, + jointype, ); } /* diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c index 8b69870cf4..9453258f83 100644 --- a/src/backend/optimizer/path/joinrels.c +++ b/src/backend/optimizer/path/joinrels.c @@ -16,6 +16,7 @@ #include "miscadmin.h" #include "optimizer/appendinfo.h" +#include "optimizer/cost.h" #include "optimizer/joininfo.h" #include "optimizer/pathnode.h" #include "optimizer/paths.h" @@ -1552,6 +1553,192 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2, } } +/* + * Build RelOptInfo on JOIN of each partition of the outer relation and the inner + * relation. Return List of such RelOptInfo's. Return NIL, if at least one of + * these JOINs is impossible to build. + */ +static List * +extract_asymmetric_partitionwise_subjoin(PlannerInfo *root, + RelOptInfo *joinrel, + AppendPath *append_path, + RelOptInfo *inner_rel, + JoinType jointype, + JoinPathExtraData *extra) +{ + List*result = NIL; + ListCell*lc; + + foreach (lc, append_path->subpaths) + { + Path*child_path = lfirst(lc); + RelOptInfo *child_rel = child_path->parent; + Relids child_joinrelids; + Relids parent_relids; + RelOptInfo *child_joinrel; + SpecialJoinInfo *child_sjinfo; + List*child_restrictlist; + + child_joinrelids = bms_union(child_rel-&g
Re: Postgres picks suboptimal index after building of an extended statistics
On 12/8/21 04:26, Tomas Vondra wrote: On 8/11/21 2:48 AM, Peter Geoghegan wrote: On Wed, Jun 23, 2021 at 7:19 AM Andrey V. Lepikhov I agree the current behavior is unfortunate, but I'm not convinced the proposed patch is fixing the right place - doesn't this mean the index costing won't match the row estimates displayed by EXPLAIN? I rewrote the patch. It's now simpler and shorter. May be more convenient. Also, it includes a regression test to detect the problem in future. I wonder if we should teach clauselist_selectivity about UNIQUE indexes, and improve the cardinality estimates directly, not just costing for index scans. I tried to implement this in different ways. But it causes additional overhead and code complexity - analyzing a list of indexes and match clauses of each index with input clauses in each selectivity estimation. I don't like that way and propose a new patch in attachment. -- regards, Andrey Lepikhov Postgres Professional From 41ce6007cd552afd1a73983f0b9c9cac0e125d58 Mon Sep 17 00:00:00 2001 From: "Andrey V. Lepikhov" Date: Mon, 30 Aug 2021 11:21:57 +0500 Subject: [PATCH] Estimating number of fetched rows in a btree index we save selectivity estimation in the costs structure. It will be used by the genericcostestimate routine as a top bound for estimation of total tuples, visited in the main table. This code fix the problem with unique index, when we know for sure that no more than one tuple can be fetched, but clauselist_selectivity gives us much less accurate estimation because of many possible reasons. A regression test is added as a demonstration of the problem. --- src/backend/utils/adt/selfuncs.c| 18 ++-- src/test/regress/expected/stats_ext.out | 38 + src/test/regress/sql/stats_ext.sql | 34 ++ 3 files changed, 88 insertions(+), 2 deletions(-) diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index c2aeb4b947..dd1cadad61 100644 --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -6074,6 +6074,14 @@ genericcostestimate(PlannerInfo *root, */ numIndexTuples = rint(numIndexTuples / num_sa_scans); } + else if (costs->indexSelectivity > 0. && + indexSelectivity > costs->indexSelectivity) + /* +* If caller give us an estimation of amount of fetched index tuples, +* it could give the selectivity estimation. In this case amount of +* returned tuples can't be more than amount of fetched tuples. +*/ + indexSelectivity = costs->indexSelectivity; /* * We can bound the number of tuples by the index size in any case. Also, @@ -6258,6 +6266,7 @@ btcostestimate(PlannerInfo *root, IndexPath *path, double loop_count, boolfound_is_null_op; double num_sa_scans; ListCell *lc; + Selectivity btreeSelectivity; /* * For a btree scan, only leading '=' quals plus inequality quals for the @@ -6362,19 +6371,23 @@ btcostestimate(PlannerInfo *root, IndexPath *path, double loop_count, /* * If index is unique and we found an '=' clause for each column, we can * just assume numIndexTuples = 1 and skip the expensive -* clauselist_selectivity calculations. However, a ScalarArrayOp or +* clauselist_selectivity calculations. However, a ScalarArrayOp or * NullTest invalidates that theory, even though it sets eqQualHere. +* Value of btreeSelectivity is used as a top bound for selectivity +* estimation of returned tuples in the genericcostestimate routine. */ if (index->unique && indexcol == index->nkeycolumns - 1 && eqQualHere && !found_saop && !found_is_null_op) + { numIndexTuples = 1.0; + btreeSelectivity = 1. / index->rel->tuples; + } else { List *selectivityQuals; - Selectivity btreeSelectivity; /* * If the index is partial, AND the index predicate with the @@ -6402,6 +6415,7 @@ btcostestimate(PlannerInfo *root, IndexPath *path, double loop_count, */ MemSet(, 0, sizeof(costs)); costs.numIndexTuples = numIndexTuples; + costs.indexSelectivity = btreeSelectivity; genericcostestimate(root, path, loop_count, ); diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out index 7524e65142..b90463821f 100644 --- a/src/test/regress/expected/stats_ext.out +++ b/src/test/regress/expected/stats_ext.out @@ -1602,3 +1602,41 @@ NOTICE: drop cascades to 2 other objects DETAIL: drop cascades to table tst
Representation of SubPlan testexpr in EXPLAIN
Hi, EXPLAIN command doesn't show testexpr. Sometimes it is not easy to understand a query plan. That I mean: CREATE TABLE a (x integer, y integer); EXPLAIN (COSTS OFF, VERBOSE) SELECT x, y FROM a upper WHERE y IN (SELECT y FROM a WHERE upper.y = x); EXPLAIN (COSTS OFF, VERBOSE) SELECT x, y FROM a upper WHERE x+y IN (SELECT y FROM a WHERE upper.y = x); These two explains have the same representation: Seq Scan on public.a upper Output: upper.x, upper.y Filter: (SubPlan 1) SubPlan 1 -> Seq Scan on public.a Output: a.y Filter: (upper.y = a.x) It is a bit annoying when you don't have original query or don't trust competence of a user who sent you this explain. In attachment - patch which solves this problem. I'm not completely sure that this option really needed and patch presents a proof of concept only. -- regards, Andrey Lepikhov Postgres Professional From db440802d84a96a98d0fb30672232e81e10c4598 Mon Sep 17 00:00:00 2001 From: Andrey Lepikhov Date: Tue, 24 Aug 2021 10:59:00 +0500 Subject: [PATCH] Improve EXPLAIN presentation --- src/backend/utils/adt/ruleutils.c | 101 +++--- src/test/regress/expected/insert_conflict.out | 2 +- src/test/regress/expected/join.out| 2 +- src/test/regress/expected/rowsecurity.out | 6 +- src/test/regress/expected/select_parallel.out | 12 +-- src/test/regress/expected/subselect.out | 40 +++ src/test/regress/expected/updatable_views.out | 24 ++--- 7 files changed, 106 insertions(+), 81 deletions(-) diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index 4df8cc5abf..1c25769633 100644 --- a/src/backend/utils/adt/ruleutils.c +++ b/src/backend/utils/adt/ruleutils.c @@ -455,7 +455,10 @@ static void get_const_expr(Const *constval, deparse_context *context, int showtype); static void get_const_collation(Const *constval, deparse_context *context); static void simple_quote_literal(StringInfo buf, const char *val); +static void get_subplan_expr(SubPlan *subplan, deparse_context *context); static void get_sublink_expr(SubLink *sublink, deparse_context *context); +static bool get_subselect_expr(SubLinkType subLinkType, Node *testexpr, + deparse_context *context); static void get_tablefunc(TableFunc *tf, deparse_context *context, bool showimplicit); static void get_from_clause(Query *query, const char *prefix, @@ -8536,20 +8539,7 @@ get_rule_expr(Node *node, deparse_context *context, break; case T_SubPlan: - { - SubPlan*subplan = (SubPlan *) node; - - /* -* We cannot see an already-planned subplan in rule deparsing, -* only while EXPLAINing a query plan. We don't try to -* reconstruct the original SQL, just reference the subplan -* that appears elsewhere in EXPLAIN's result. -*/ - if (subplan->useHashTable) - appendStringInfo(buf, "(hashed %s)", subplan->plan_name); - else - appendStringInfo(buf, "(%s)", subplan->plan_name); - } + get_subplan_expr((SubPlan *) node, context); break; case T_AlternativeSubPlan: @@ -10358,6 +10348,31 @@ simple_quote_literal(StringInfo buf, const char *val) } +static void +get_subplan_expr(SubPlan *subplan, deparse_context *context) +{ + StringInfo buf = context->buf; + + if (subplan->testexpr) + appendStringInfoChar(buf, '('); + + (void) get_subselect_expr(subplan->subLinkType, subplan->testexpr, context); + + /* +* We cannot see an already-planned subplan in rule deparsing, +* only while EXPLAINing a query plan. We don't try to +* reconstruct the original SQL, just reference the subplan +* that appears elsewhere in EXPLAIN's result. +*/ + if (subplan->useHashTable) + appendStringInfo(buf, "(hashed %s)", subplan->plan_name); + else + appendStringInfo(buf, "(%s)", subplan->plan_name); + + if (subplan->testexpr) + appendStringInfoChar(buf, ')'); +} + /* -- * get_sublink_expr- Parse back a sublink * -- @@ -10367,14 +10382,35 @@ get_sublink_expr(SubLink *sublink, deparse_context *context) { StringInfo buf = context->buf; Query *query =