Re: RFC: Logging plan of the running query

2023-09-27 Thread Andrey Lepikhov

On 28/9/2023 09:04, torikoshia wrote:

On 2023-09-25 18:49, Andrey Lepikhov wrote:

On 25/9/2023 14:21, torikoshia wrote:

On 2023-09-20 14:39, Lepikhov Andrei wrote:
Hmm, as a test, I made sure to call ProcessLogQueryPlanInterrupt() on 
all CFI using 
v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch, and 
then ran the following query but did not cause any problems.


```
=# CREATE TABLE test();
=# CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$
BEGIN
   EXECUTE format('ALTER TABLE test ADD COLUMN x integer;');
   PERFORM pg_sleep(5);
END; $$ LANGUAGE plpgsql VOLATILE;
=# SELECT ddl();
```

Is this the case you're worrying about?


I didn't find a problem either. I just feel uncomfortable if, at the
moment of interruption, we have a descriptor of another query than the
query have been executing and holding resources.


I think that "descriptor" here refers to ActiveQueryDesc, in which case
it is updated at the beginning of ExecutorRun(), so I am wondering if
the situation you're worried about would not occur.
As you can see, in my example we have the only DDL and no queries with 
plans. In this case postgres doesn't call ExecutorRun() just because it 
doesn't have a plan. But locks will be obtained.


--
regards,
Andrey Lepikhov
Postgres Professional





Re: POC: GROUP BY optimization

2023-09-25 Thread Andrey Lepikhov

On 20/7/2023 18:46, Tomas Vondra wrote:

2) estimating quicksort comparisons - This relies on ndistinct
estimates, and I'm not sure how much more reliable we can make those.
Probably not much :-( Not sure what to do about this, the only thing I
can think of is to track "reliability" of the estimates and only do the
reordering if we have high confidence in the estimates. That means we'll
miss some optimization opportunities, but it should limit the risk.

According to this issue, I see two options:
1. Go through the grouping column list and find the most reliable one. 
If we have a unique column or column with statistics on the number of 
distinct values, which is significantly more than ndistincts for other 
grouping columns, we can place this column as the first in the grouping. 
It should guarantee the reliability of such a decision, isn't it?
2. If we have extended statistics on distinct values and these 
statistics cover some set of first columns in the grouping list, we can 
optimize these positions. It also looks reliable.


Any thoughts?

--
regards,
Andrey Lepikhov
Postgres Professional





Re: RFC: Logging plan of the running query

2023-09-25 Thread Andrey Lepikhov

On 25/9/2023 14:21, torikoshia wrote:

On 2023-09-20 14:39, Lepikhov Andrei wrote:
Hmm, as a test, I made sure to call ProcessLogQueryPlanInterrupt() on 
all CFI using 
v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch, and then 
ran the following query but did not cause any problems.


```
=# CREATE TABLE test();
=# CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$
BEGIN
   EXECUTE format('ALTER TABLE test ADD COLUMN x integer;');
   PERFORM pg_sleep(5);
END; $$ LANGUAGE plpgsql VOLATILE;
=# SELECT ddl();
```

Is this the case you're worrying about?


I didn't find a problem either. I just feel uncomfortable if, at the 
moment of interruption, we have a descriptor of another query than the 
query have been executing and holding resources.


--
regards,
Andrey Lepikhov
Postgres Professional





Re: POC: GUC option for skipping shared buffers in core dumps

2023-09-25 Thread Andrey Lepikhov

Hi,

The current approach could be better because we want to use it on 
Windows/MacOS and other systems. So, I've tried to develop another 
strategy - detaching shmem and DSM blocks before executing the abort() 
routine.

As I can see, it helps and reduces the size of the core file.

--
regards,
Andrey Lepikhov
Postgres Professional
diff --git a/src/backend/bootstrap/bootstrap.c 
b/src/backend/bootstrap/bootstrap.c
index 5810f8825e..4d7bf2c0e4 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -325,7 +325,7 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
{
SetProcessingMode(NormalProcessing);
CheckerModeMain();
-   abort();
+   pg_abort();
}
 
/*
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index ed11e8be7f..34ac874ad0 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -197,7 +197,7 @@ main(int argc, char *argv[])
else
PostmasterMain(argc, argv);
/* the functions above should not return */
-   abort();
+   pg_abort();
 }
 
 
diff --git a/src/backend/postmaster/postmaster.c 
b/src/backend/postmaster/postmaster.c
index 54e9bfb8c4..fc32a6bb1b 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1469,7 +1469,7 @@ PostmasterMain(int argc, char *argv[])
 */
ExitPostmaster(status != STATUS_OK);
 
-   abort();/* not reached */
+   pg_abort(); /* not reached */
 }
 
 
diff --git a/src/backend/replication/walsender.c 
b/src/backend/replication/walsender.c
index e250b0567e..3123b388ab 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -385,7 +385,7 @@ WalSndShutdown(void)
whereToSendOutput = DestNone;
 
proc_exit(0);
-   abort();/* keep the compiler 
quiet */
+   pg_abort(); /* keep the compiler 
quiet */
 }
 
 /*
diff --git a/src/backend/utils/error/assert.c b/src/backend/utils/error/assert.c
index 719dd7b309..f422d42440 100644
--- a/src/backend/utils/error/assert.c
+++ b/src/backend/utils/error/assert.c
@@ -19,6 +19,32 @@
 #include 
 #endif
 
+#include 
+#include 
+
+int core_dump_no_shared_buffers = COREDUMP_INCLUDE_ALL;
+
+/*
+ * Remember, at the same time someone can work with shared memory, write them 
to
+ * disk and so on.
+ */
+void
+pg_abort(void)
+{
+   if (core_dump_no_shared_buffers != COREDUMP_INCLUDE_ALL)
+   {
+   if (core_dump_no_shared_buffers == COREDUMP_EXCLUDE_ALL ||
+   core_dump_no_shared_buffers == COREDUMP_EXCLUDE_DSM)
+   dsm_detach_all();
+
+   if (core_dump_no_shared_buffers == COREDUMP_EXCLUDE_ALL ||
+   core_dump_no_shared_buffers == COREDUMP_EXCLUDE_SHMEM)
+   PGSharedMemoryDetach();
+   }
+
+   abort();
+}
+
 /*
  * ExceptionalCondition - Handles the failure of an Assert()
  *
@@ -63,5 +89,5 @@ ExceptionalCondition(const char *conditionName,
sleep(100);
 #endif
 
-   abort();
+   pg_abort();
 }
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 8e1f3e8521..f6c863ca68 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -601,7 +601,7 @@ errfinish(const char *filename, int lineno, const char 
*funcname)
 * children...
 */
fflush(NULL);
-   abort();
+   pg_abort();
}
 
/*
diff --git a/src/backend/utils/misc/guc_tables.c 
b/src/backend/utils/misc/guc_tables.c
index bdb26e2b77..95e205e8d1 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -427,6 +427,14 @@ static const struct config_enum_entry 
debug_logical_replication_streaming_option
{NULL, 0, false}
 };
 
+static const struct config_enum_entry core_dump_no_shared_buffers_options[] = {
+   {"none", COREDUMP_INCLUDE_ALL, false},
+   {"shmem", COREDUMP_EXCLUDE_SHMEM, false},
+   {"dsm", COREDUMP_EXCLUDE_DSM, false},
+   {"all", COREDUMP_EXCLUDE_ALL, false},
+   {NULL, 0, false}
+};
+
 StaticAssertDecl(lengthof(ssl_protocol_versions_info) == (PG_TLS1_3_VERSION + 
2),
 "array length mismatch");
 
@@ -4971,6 +4979,17 @@ struct config_enum ConfigureNamesEnum[] =
NULL, NULL, NULL
},
 
+   {
+   {"core_dump_no_shared_buffers", PGC_POSTMASTER, 
DEVELOPER_OPTIONS,
+   NULL,
+   NULL,
+   GUC_NOT_IN_SAMPLE
+   },
+   _dump_no_shared_buffers,
+   COREDUMP_INCLUDE_ALL, cor

Re: Postgres picks suboptimal index after building of an extended statistics

2023-09-24 Thread Andrey Lepikhov

On 12/8/2021 06:26, Tomas Vondra wrote:

On 8/11/21 2:48 AM, Peter Geoghegan wrote:

On Wed, Jun 23, 2021 at 7:19 AM Andrey V. Lepikhov
 wrote:

Ivan Frolkov reported a problem with choosing a non-optimal index during
a query optimization. This problem appeared after building of an
extended statistics.


Any thoughts on this, Tomas?



Thanks for reminding me, I missed / forgot about this thread.

I agree the current behavior is unfortunate, but I'm not convinced the 
proposed patch is fixing the right place - doesn't this mean the index 
costing won't match the row estimates displayed by EXPLAIN?


I wonder if we should teach clauselist_selectivity about UNIQUE indexes, 
and improve the cardinality estimates directly, not just costing for 
index scans.


Also, is it correct that the patch calculates num_sa_scans only when 
(numIndexTuples >= 0.0)?
I can't stop thinking about this issue. It is bizarre when Postgres 
chooses a non-unique index if a unique index gives us proof of minimum scan.
I don't see a reason to teach the clauselist_selectivity() routine to 
estimate UNIQUE indexes. We add some cycles, but it will work with btree 
indexes only.
Maybe to change compare_path_costs_fuzzily() and add some heuristic, for 
example:
"If selectivity of both paths gives us no more than 1 row, prefer to use 
a unique index or an index with least selectivity."


--
regards,
Andrey Lepikhov
Postgres Professional





Re: [PATCH] Add extra statistics to explain for Nested Loop

2023-09-22 Thread Andrey Lepikhov

On 31/7/2022 10:49, Julien Rouhaud wrote:

On Sat, Jul 30, 2022 at 08:54:33PM +0800, Julien Rouhaud wrote:
Anyway, 1% is in my opinion still too much overhead for extensions that won't
get any extra information.
I have read all the thread and still can't understand something. What 
valuable data can I find with these extra statistics if no one 
parameterized node in the plan exists?
Also, thinking about min/max time in the explain, I guess it would be 
necessary in rare cases. Usually, the execution time will correlate to 
the number of tuples scanned, won't it? So, maybe skip the time 
boundaries in the instrument structure?
In my experience, it is enough to know the total number of tuples 
bubbled up from a parameterized node to decide further optimizations. 
Maybe simplify this feature up to the one total_rows field in the case 
of nloops > 1 and in the presence of parameters?
And at the end. If someone wants a lot of additional statistics, why not 
give them that by extension? It is only needed to add a hook into the 
point of the node explanation and some efforts to make instrumentation 
extensible. But here, honestly, I don't have code/ideas so far.


--
regards,
Andrey Lepikhov
Postgres Professional





Re: disfavoring unparameterized nested loops

2023-09-20 Thread Andrey Lepikhov

On 29/9/2022 21:32, Benjamin Coutu wrote:

I'd like to revamp this important discussion.

As is well described in this fairly recent paper here 
https://www.vldb.org/pvldb/vol9/p204-leis.pdf (which also looks at Postgres) 
"estimation errors quickly grow as the number of joins increases, and that these 
errors are usually the reason for bad plans" - I think we can all get behind that 
statement.

While nested loop joins work great when cardinality estimates are correct, they 
are notoriously bad when the optimizer underestimates and they degrade very 
fast in such cases - the good vs. bad here is very asymmetric. On the other 
hand hash joins degrade much more gracefully - they are considered very robust 
against underestimation. The above mentioned paper illustrates that all mayor 
DBMS (including Postgres) tend to underestimate and usually that 
underestimation increases drastically with the number of joins (see Figures 3+4 
of the paper).

Now, a simple approach to guarding against bad plans that arise from 
underestimation could be to use what I would call a 
nested-loop-conviction-multiplier based on the current depth of the join tree, 
e.g. for a base table that multiplier would obviously be 1, but would then grow 
(e.g.) quadratically. That conviction-multiplier would *NOT* be used to skew 
the cardinality estimates themselves, but rather be applied to the overall 
nested loop join cost at each particular stage of the plan when comparing it to 
other more robust join strategies like hash or sort-merge joins. That way when 
we can be sure to have a good estimate at the bottom of the join tree we treat 
all things equal, but favor nested loops less and less as we move up the join 
tree for the sake of robustness.
Also, we can expand the multiplier whenever we fall back to using the default 
cardinality constant as surely all bets are off at that point - we should 
definitely treat nested loop joins as out of favor in this instance and that 
could easily be incorporated by simply increasing the conviction-mutliplier.

What are your thoughts on this simple idea - is it perhaps too simple?
In my practice, parameterized nested loop reduces, sometimes 
drastically, execution time. If your query touches a lot of tables but 
extracts only a tiny part of the data, and you have good coverage by 
indexes, PNL works great.
Moreover, I have pondered extending parameterization through subqueries 
and groupings.


What could you say about a different way: hybrid join? In MS SQL Server, 
they have such a feature [1], and, according to their description, it 
requires low overhead. They start from HashJoin and switch to NestLoop 
if the inner input contains too small tuples. It solves the issue, Isn't it?


[1] 
https://techcommunity.microsoft.com/t5/sql-server-blog/introducing-batch-mode-adaptive-joins/ba-p/385411


--
regards,
Andrey Lepikhov
Postgres Professional





Re: Oversight in reparameterize_path_by_child leading to executor crash

2023-09-20 Thread Andrey Lepikhov

On 23/8/2023 12:37, Richard Guo wrote:

If we go with the "tablesample scans can't be reparameterized" approach
in the back branches, I'm a little concerned that what if we find more
cases in the futrue where we need modify RTEs for reparameterization.
So I spent some time seeking and have managed to find one: there might
be lateral references in a scan path's restriction clauses, and
currently reparameterize_path_by_child fails to adjust them.
It may help you somehow: in [1], we designed a feature where the 
partitionwise join technique can be applied to a JOIN of partitioned and 
non-partitioned tables. Unfortunately, it is out of community 
discussions, but we still support it for sharding usage - it is helpful 
for the implementation of 'global' tables in a distributed 
configuration. And there we were stuck into the same problem with 
lateral relids adjustment. So you can build a more general view of the 
problem with this patch.


[1] Asymmetric partition-wise JOIN
https://www.postgresql.org/message-id/flat/CAOP8fzaVL_2SCJayLL9kj5pCA46PJOXXjuei6-3aFUV45j4LJQ%40mail.gmail.com

--
regards,
Andrey Lepikhov
Postgres Professional





Re: [PoC] Reducing planning time when tables have many partitions

2023-09-19 Thread Andrey Lepikhov

On 25/8/2023 14:39, Yuya Watari wrote:

Hello,

On Wed, Aug 9, 2023 at 8:54 PM David Rowley  wrote:

I think the best way to move this forward is to explore not putting
partitioned table partitions in EMs and instead see if we can
translate to top-level parent before lookups.  This might just be too
complex to translate the Exprs all the time and it may add overhead
unless we can quickly determine somehow that we don't need to attempt
to translate the Expr when the given Expr is already from the
top-level parent. If that can't be made to work, then maybe that shows
the current patch has merit.


Based on your suggestion, I have experimented with not putting child
EquivalenceMembers in an EquivalenceClass. I have attached a new
patch, v20, to this email. The following is a summary of v20.
Working on self-join removal in the thread [1] nearby, I stuck into the 
problem, which made an additional argument to work in this new direction 
than a couple of previous ones.
With indexing positions in the list of equivalence members, we make some 
optimizations like join elimination more complicated - it may need to 
remove some clauses and equivalence class members.
For changing lists of derives or ec_members, we should go through all 
the index lists and fix them, which is a non-trivial operation.


[1] 
https://www.postgresql.org/message-id/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru


--
regards,
Andrey Lepikhov
Postgres Professional





Re: POC: GROUP BY optimization

2023-09-18 Thread Andrey Lepikhov

On 20/7/2023 18:46, Tomas Vondra wrote:

On 7/20/23 08:37, Andrey Lepikhov wrote:

On 3/10/2022 21:56, Tom Lane wrote:

Revert "Optimize order of GROUP BY keys".

This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
several follow-on fixes.
...
Since we're hard up against the release deadline for v15, let's
revert these changes for now.  We can always try again later.


It may be time to restart the project. As a first step, I rebased the
patch on the current master. It wasn't trivial because of some latest
optimizations (a29eab, 1349d27 and 8d83a5d).
Now, Let's repeat the review and rewrite the current path according to
the reasons uttered in the revert commit.

1) procost = 1.0 - I guess we could make this more realistic by doing
some microbenchmarks and tuning the costs for the most expensive cases.
Ok, some thoughts on this part of the task. As I see, we have not so 
many different operators: 26 with fixed width and 16 with variable width:


SELECT o.oid,oprcode,typname,typlen FROM pg_operator o
  JOIN pg_type t ON (oprleft = t.oid)
WHERE (oprname='<') AND oprleft=oprright AND typlen>0
ORDER BY o.oid;

SELECT o.oid,oprcode,typname,typlen FROM pg_operator o
  JOIN pg_type t ON (oprleft = t.oid)
WHERE (oprname='<') AND oprleft=oprright AND typlen<0
ORDER BY o.oid;

Benchmarking procedure of types with fixed length can be something like 
below:


CREATE OR REPLACE FUNCTION pass_sort(typ regtype) RETURNS TABLE (
nrows integer,
exec_time float
)  AS $$
DECLARE
  data json;
  step integer;
BEGIN
  SET work_mem='1GB';

  FOR step IN 0..3 LOOP
SELECT pow(100, step)::integer INTO nrows;
DROP TABLE IF EXISTS test CASCADE;
EXECUTE format('CREATE TABLE test AS SELECT gs::%s AS x
FROM generate_series(1,%s) AS gs;', typ, nrows);

EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, FORMAT JSON)
  SELECT * FROM test ORDER BY (x) DESC INTO data;
SELECT data->0->'Execution Time' INTO exec_time;
RETURN NEXT;
  END LOOP;
END;
$$ LANGUAGE plpgsql VOLATILE;

Execution of SELECT * FROM pass_sort('integer'); shows quite linear grow 
of the execution time. So, using '2.0 * cpu_operator_cost' as a cost for 
the integer type (as a basis) we can calculate costs for other 
operators. Variable-width types, i think, could require more complex 
technique to check dependency on the length.


Does this way look worthwhile?

--
regards,
Andrey Lepikhov
Postgres Professional





Re: POC: GROUP BY optimization

2023-09-12 Thread Andrey Lepikhov

Hi,
Here is the patch rebased on the current master. Also, I fixed some 
minor slips and one static analyzer warning.
This is just for adding to the next commitfest and enforcing work with 
this patch.


One extra difference in newly added postgres_fdw tests is caused by this 
patch - see changes in the query plan in attachment.


--
regards,
Andrey Lepikhov
Postgres Professional
From 33953655c9ac3f9ec64b80c9f2a2ff38bd178745 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 13 Sep 2023 11:20:03 +0700
Subject: [PATCH] Explore alternative orderings of group-by pathkeys during
 optimization.

When evaluating a query with a multi-column GROUP BY clause using sort,
the cost may depend heavily on the order in which the keys are compared when
building the groups. Grouping does not imply any ordering, so we can compare
the keys in arbitrary order, and a Hash Agg leverages this. But for Group Agg,
we simply compared keys in the order specified in the query. This commit
explores alternative ordering of the keys, trying to find a cheaper one.

In principle, we might generate grouping paths for all permutations of the keys
and leave the rest to the optimizer. But that might get very expensive, so we
try to pick only a couple interesting orderings based on both local and global
information.

When planning the grouping path, we explore statistics (number of distinct
values, cost of the comparison function) for the keys and reorder them
to minimize comparison costs. Intuitively, it may be better to perform more
expensive comparisons (for complex data types, etc.) last because maybe
the cheaper comparisons will be enough. Similarly, the higher the cardinality
of a key, the lower the probability we'll need to compare more keys. The patch
generates and costs various orderings, picking the cheapest ones.

The ordering of group keys may interact with other parts of the query, some of
which may not be known while planning the grouping. For example, there may be
an explicit ORDER BY clause or some other ordering-dependent operation higher up
in the query, and using the same ordering may allow using either incremental
sort or even eliminating the sort entirely.

The patch generates orderings and picks those, minimizing the comparison cost
(for various path keys), and then adds orderings that might be useful for
operations higher up in the plan (ORDER BY, etc.). Finally, it always keeps
the ordering specified in the query, assuming the user might have additional
insights.

This introduces a new GUC enable_group_by_reordering so that the optimization
may be disabled if needed.
---
 .../postgres_fdw/expected/postgres_fdw.out|  36 +-
 src/backend/optimizer/path/costsize.c | 363 ++-
 src/backend/optimizer/path/equivclass.c   |  13 +-
 src/backend/optimizer/path/pathkeys.c | 602 ++
 src/backend/optimizer/plan/planner.c  | 465 --
 src/backend/optimizer/util/pathnode.c |   4 +-
 src/backend/utils/adt/selfuncs.c  |  38 +-
 src/backend/utils/misc/guc_tables.c   |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/nodes/pathnodes.h |  10 +
 src/include/optimizer/cost.h  |   4 +-
 src/include/optimizer/paths.h |   7 +
 src/include/utils/selfuncs.h  |   5 +
 src/test/regress/expected/aggregates.out  | 244 ++-
 .../regress/expected/incremental_sort.out |   2 +-
 src/test/regress/expected/join.out|  51 +-
 src/test/regress/expected/merge.out   |  15 +-
 .../regress/expected/partition_aggregate.out  |  44 +-
 src/test/regress/expected/partition_join.out  |  75 +--
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/expected/union.out   |  60 +-
 src/test/regress/sql/aggregates.sql   |  99 +++
 src/test/regress/sql/incremental_sort.sql |   2 +-
 23 files changed, 1771 insertions(+), 382 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 144c114d0f..63af7feabe 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -2319,18 +2319,21 @@ SELECT t1."C 1" FROM "S 1"."T 1" t1, LATERAL (SELECT 
DISTINCT t2.c1, t3.c1 FROM
 -- join with pseudoconstant quals
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1 AND CURRENT_USER 
= SESSION_USER) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
-  

Re: Removing unneeded self joins

2023-09-12 Thread Andrey Lepikhov

On 5/7/2023 21:28, Andrey Lepikhov wrote:

Hi,

During the significant code revision in v.41 I lost some replacement 
operations. Here is the fix and extra tests to check this in the future.
Also, Tom added the JoinDomain structure five months ago, and I added 
code to replace relids for that place too.
One more thing, I found out that we didn't replace SJs, defined by 
baserestrictinfos if no one self-join clause have existed for the join. 
Now, it is fixed, and the test has been added.

To understand changes readily, see the delta file in the attachment.
Here is new patch in attachment. Rebased on current master and some 
minor gaffes are fixed.


--
regards,
Andrey Lepikhov
Postgres Professional
From 70bb5cf3d11b2797f1a9c7b04740435135229d29 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Tue, 12 Sep 2023 18:25:51 +0700
Subject: [PATCH] Remove self-joins.

Self Join Elimination (SJE) feature removes an inner join of a plain table to
itself in the query tree if is proved that the join can be replaced with
a scan without impact to the query result.
Self join and inner relation are replaced with the outer in query, equivalence
classes and planner info structures. Also, inner restrictlist moves to the
outer one with removing duplicated clauses. Thus, this optimization reduces
length of range table list (especially it make sense for partitioned relations),
reduces number of restriction clauses === selectivity estimations and
potentially can improve total planner prediction for the query.

The SJE proof based on innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. Each matched inner row must be (physically) the same row as the outer one.

In this patch we use the next approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Add to the list above baseretrictinfo of inner table.
3. Check innerrel_is_unique() for the qual list. If it returns false, skip this
pair of joining tables.
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove 
possibility
of the self-join elimination inner and outer clauses must have exact match.

Relation replacement procedure is not trivial and it is partly combined with 
the one,
used to remove useless left joins.

Tests, covering this feature, added to the join.sql.
Some regression tests changed due to self-join removal logic.
---
 doc/src/sgml/config.sgml  |   16 +
 src/backend/optimizer/path/indxpath.c |   38 +
 src/backend/optimizer/plan/analyzejoins.c | 1094 -
 src/backend/optimizer/plan/planmain.c |5 +
 src/backend/utils/misc/guc_tables.c   |   22 +
 src/include/optimizer/paths.h |3 +
 src/include/optimizer/planmain.h  |7 +
 src/test/regress/expected/equivclass.out  |   32 +
 src/test/regress/expected/join.out|  824 -
 src/test/regress/expected/sysviews.out|3 +-
 src/test/regress/expected/updatable_views.out |   17 +-
 src/test/regress/sql/equivclass.sql   |   16 +
 src/test/regress/sql/join.sql |  360 ++
 src/tools/pgindent/typedefs.list  |2 +
 14 files changed, 2375 insertions(+), 64 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6bc1b215db..43c07b0d3b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5299,6 +5299,22 @@ ANY num_sync ( 
+  enable_self_join_removal (boolean)
+  
+   enable_self_join_removal configuration 
parameter
+  
+  
+  
+   
+   Enables or disables the query planner's optimization which analyses
+query tree and replaces self joins with semantically equivalent single
+scans. Take into consideration only plain tables.
+The default is on.
+   
+  
+ 
+
  
   enable_seqscan (boolean)
   
diff --git a/src/backend/optimizer/path/indxpath.c 
b/src/backend/optimizer/path/indxpath.c
index 6a93d767a5..508285d1ef 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3494,6 +3494,21 @@ bool
 relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
  List *restrictlist,
  List *exprlist, List 
*oprlist)
+{
+   return relation_has_unique_index_ext(root, rel, restrictlist,
+   
 exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * if extra_clauses isn't NULL, return baserestrictinfo clauses which were
+ * used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *re

Re: Report planning memory in EXPLAIN ANALYZE

2023-08-13 Thread Andrey Lepikhov

On 14/8/2023 06:53, David Rowley wrote:

On Thu, 10 Aug 2023 at 20:33, Ashutosh Bapat
 wrote:

My point is what's relevant here is how much net memory planner asked
for.


But that's not what your patch is reporting. All you're reporting is
the difference in memory that's *currently* palloc'd from before and
after the planner ran.  If we palloc'd 600 exabytes then pfree'd it
again, your metric won't change.

I'm struggling a bit to understand your goals here.  If your goal is
to make a series of changes that reduces the amount of memory that's
palloc'd at the end of planning, then your patch seems to suit that
goal, but per the quote above, it seems you care about how many bytes
are palloc'd during planning and your patch does not seem track that.

With your patch as it is, to improve the metric you're reporting we
could go off and do things like pfree Paths once createplan.c is done,
but really, why would we do that? Just to make the "Planning Memory"
metric looks better doesn't seem like a worthy goal.

Instead, if we reported the context's mem_allocated, then it would
give us the flexibility to make changes to the memory context code to
have the metric look better. It might also alert us to planner
inefficiencies and problems with new code that may cause a large spike
in the amount of memory that gets allocated.  Now, I'm not saying we
should add a patch that shows mem_allocated. I'm just questioning if
your proposed patch meets the goals you're trying to achieve.  I just
suggested that you might want to consider something else as a metric
for your memory usage reduction work.
Really, the current approach with the final value of consumed memory 
smooths peaks of memory consumption. I recall examples likewise massive 
million-sized arrays or reparameterization with many partitions where 
the optimizer consumes much additional memory during planning.
Ideally, to dive into the planner issues, we should have something like 
a report-in-progress in the vacuum, reporting on memory consumption at 
each subquery and join level. But it looks too much for typical queries.


--
regards,
Andrey Lepikhov
Postgres Professional





Re: Report planning memory in EXPLAIN ANALYZE

2023-08-10 Thread Andrey Lepikhov

On 10/8/2023 15:33, Ashutosh Bapat wrote:

On Wed, Aug 9, 2023 at 8:56 PM David Rowley  wrote:


On Thu, 10 Aug 2023 at 03:12, Ashutosh Bapat
 wrote:
I guess it depends on the problem you're trying to solve. I had
thought you were trying to do some work to reduce the memory used by
the planner, so I imagined you wanted this so you could track your
progress and also to help ensure we don't make too many mistakes in
the future which makes memory consumption worse again.
Another way we might go about reducing planner memory is to make
changes to the allocators themselves.  For example, aset rounds up to
the next power of 2.  If we decided to do something like add more
freelists to double the number so we could add a mid-way point
freelist between each power of 2, then we might find it reduces the
planner memory by, say 12.5%.  If we just tracked what was consumed by
palloc() that would appear to save us nothing, but it might actually
save us several malloced blocks.



This won't just affect planner but every subsystem of PostgreSQL, not
just planner, unless we device a new context type for planner.

My point is what's relevant here is how much net memory planner asked
for. Internally how much memory PostgreSQL allocated seems relevant
for the memory context infra as such but not so much for planner
alone.

If you think that memory allocated is important, I will add both used
and (net) allocated memory to EXPLAIN output.
As a developer, I like having as much internal info in my EXPLAIN as 
possible. But this command is designed mainly for users, not core 
developers. The size of memory allocated usually doesn't make sense for 
users. On the other hand, developers can watch it easily in different 
ways, if needed. So, I vote for the 'memory used' metric only.


The patch looks good, passes the tests.

--
regards,
Andrey Lepikhov
Postgres Professional





Re: [PoC] Reducing planning time when tables have many partitions

2023-08-07 Thread Andrey Lepikhov

On 7/8/2023 19:15, Ashutosh Bapat wrote:



On Mon, Aug 7, 2023 at 2:21 PM Andrey Lepikhov 
mailto:a.lepik...@postgrespro.ru>> wrote:


 >> Do you think that the memory measurement patch I have shared in
those threads is useful in itself? If so, I will start another
proposal to address it.
 >
 > For me, who is developing the planner in this thread, the memory
 > measurement patch is useful. However, most users do not care about
 > memory usage, so there is room for consideration. For example, making
 > the metrics optional in EXPLAIN ANALYZE outputs might be better.
 >
+1. Any memory-related info in the output of EXPLAIN ANALYZE makes
tests
more complex because of architecture dependency.


As far as the tests go, the same is the case with planning time and 
execution time. They change even without changing the architecture. But 
we have tests which mask the actual values. Something similar will be 
done to the planning memory.
It is a positive thing to access some planner internals from the 
console, of course. My point is dedicated to the structuration of an 
EXPLAIN output and is caused by two reasons:
1. I use the EXPLAIN command daily to identify performance issues and 
the optimiser's weak points. According to the experience, when you have 
an 'explain analyze' containing more than 100 strings, you try removing 
unnecessary information to improve observability. It would be better to 
have the possibility to see an EXPLAIN with different levels of the 
output details. Flexibility here reduces a lot of manual work, sometimes.
2. Writing extensions and having an explain analyze in the regression 
test, we must create masking functions just to make the test more 
stable. That additional work can be avoided with another option, like 
MEMUSAGE ON/OFF.


So, in my opinion, it would be better to introduce this new output data 
guarded by additional option.




I will propose it as a separate patch in the next commitfest and will 
seek opinions from other hackers.

Cool, good news.

--
regards,
Andrey Lepikhov
Postgres Professional





Re: [PoC] Reducing planning time when tables have many partitions

2023-08-07 Thread Andrey Lepikhov

On 7/8/2023 15:19, Yuya Watari wrote:

Hello,

Thank you for your reply.

On Thu, Aug 3, 2023 at 10:29 PM Ashutosh Bapat
 wrote:

If you think that the verification is important to catch bugs, you may want to 
encapsulate it with an #ifdef .. #endif such that the block within is not 
compiled by default. See OPTIMIZER_DEBUG for example.


In my opinion, verifying the iteration results is only necessary to
avoid introducing bugs while developing this patch. The verification
is too excessive for regular development of PostgreSQL. I agree that
we should avoid a significant degradation in assert enabled builds, so
I will consider removing it.
I should admit, these checks has helped me during backpatching this 
feature to pg v.13 (users crave speed up of query planning a lot). Maybe 
it is a sign of a lack of tests, but in-fact, it already has helped.


One more thing: I think, you should add comments to 
add_child_rel_equivalences() and add_child_join_rel_equivalences()

on replacing of:

if (bms_is_subset(cur_em->em_relids, top_parent_relids) &&
!bms_is_empty(cur_em->em_relids))
and
if (bms_overlap(cur_em->em_relids, top_parent_relids))

with different logic. What was changed? It will be better to help future 
developers realize this part of the code more easily by adding some 
comments.



Do you think that the memory measurement patch I have shared in those threads 
is useful in itself? If so, I will start another proposal to address it.


For me, who is developing the planner in this thread, the memory
measurement patch is useful. However, most users do not care about
memory usage, so there is room for consideration. For example, making
the metrics optional in EXPLAIN ANALYZE outputs might be better.

+1. Any memory-related info in the output of EXPLAIN ANALYZE makes tests 
more complex because of architecture dependency.


--
regards,
Andrey Lepikhov
Postgres Professional





Re: [PoC] Reducing planning time when tables have many partitions

2023-08-02 Thread Andrey Lepikhov

On 2/8/2023 13:40, Yuya Watari wrote:

As seen from the above, verifying iteration results was the cause of
the performance degradation. I agree that we should avoid such
degradation because it negatively affects the development of
PostgreSQL. Removing the verification when committing this patch is
one possible option.
You introduced list_ptr_cmp as an extern function of a List, but use it 
the only under USE_ASSERT_CHECKING ifdef.

Maybe you hide it under USE_ASSERT_CHECKING or remove all the stuff?

--
regards,
Andrey Lepikhov
Postgres Professional





Re: [PoC] Reducing planning time when tables have many partitions

2023-07-27 Thread Andrey Lepikhov

On 5/7/2023 16:57, Yuya Watari wrote:

Hello,

On Fri, Mar 10, 2023 at 5:38 PM Yuya Watari  wrote:

Thank you for pointing it out. I have attached the rebased version to
this email.


Recent commits, such as a8c09daa8b [1], have caused conflicts and
compilation errors in these patches. I have attached the fixed version
to this email.

The v19-0004 adds an 'em_index' field representing the index within
root->eq_members of the EquivalenceMember. This field is needed to
delete EquivalenceMembers when iterating them using the ec_members
list instead of the ec_member_indexes.

[1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a8c09daa8bb1d741bb8b3d31a12752448eb6fb7c

Discovering quality of partition pruning at the stage of execution 
initialization and using your set of patches I have found some dubious 
results with performance degradation. Look into the test case in attachment.

Here is three queries. Execution times:
1 - 8s; 2 - 30s; 3 - 131s (with your patch set).
1 - 5s; 2 - 10s; 3 - 33s (current master).

Maybe it is a false alarm, but on my laptop I see this degradation at 
every launch.


--
regards,
Andrey Lepikhov
Postgres Professional


parts-problem.sql
Description: application/sql


Re: POC: GROUP BY optimization

2023-07-24 Thread Andrey Lepikhov

On 24/7/2023 16:56, Tomas Vondra wrote:

On 7/24/23 04:10, Andrey Lepikhov wrote:

On 20/7/2023 18:46, Tomas Vondra wrote:

On 7/20/23 08:37, Andrey Lepikhov wrote:

On 3/10/2022 21:56, Tom Lane wrote:

Revert "Optimize order of GROUP BY keys".

This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
several follow-on fixes.
...
Since we're hard up against the release deadline for v15, let's
revert these changes for now.  We can always try again later.


It may be time to restart the project. As a first step, I rebased the
patch on the current master. It wasn't trivial because of some latest
optimizations (a29eab, 1349d27 and 8d83a5d).
Now, Let's repeat the review and rewrite the current path according to
the reasons uttered in the revert commit.


I think the fundamental task is to make the costing more reliable, and
the commit message 443df6e2db points out a couple challenges in this
area. Not sure how feasible it is to address enough of them ...

1) procost = 1.0 - I guess we could make this more realistic by doing
some microbenchmarks and tuning the costs for the most expensive cases.

2) estimating quicksort comparisons - This relies on ndistinct
estimates, and I'm not sure how much more reliable we can make those.
Probably not much :-( Not sure what to do about this, the only thing I
can think of is to track "reliability" of the estimates and only do the
reordering if we have high confidence in the estimates. That means we'll
miss some optimization opportunities, but it should limit the risk.

I read up on the history of this thread.
As I see, all the problems mentioned above can be beaten by excluding
the new cost model at all. We can sort GROUP BY columns according to the
'ndistinct' value.
I see the reason for introducing the cost model in [1]. The main
supporting point here is that with this patch, people couldn't optimize
the query by themselves, organizing the order of the columns in a more
optimal way. But now we have at least the GUC to switch off the
behaviour introduced here. Also, some extensions, like the well-known
pg_hint_plan, can help with automation.


I think the main concern is that if we reorder the group keys and get it
wrong, it's a regression. If that happens often (due to costing based on
poor stats), it's a problem. Yes, there's a GUC, but that's a rather
blunt instrument, unfortunately.
I see. My point here is if the ndistinct of one column is much more than 
the ndistinct of another, it is more probable that this correlation will 
be kept in the grouping phase. Here we can get some regression, which 
can be overweighed by the possibility below.



So, how about committing of the undoubted part of the feature and
working on the cost model in a new thread?



But Tom's commit message says this:

 Worse, to arrive at estimates of the number of calls made to the
 lower-order-column comparison functions, the code needs to make
 estimates of the numbers of distinct values of multiple columns,
 which are necessarily even less trustworthy than per-column stats.

so I'm not sure this really counts as "undoubted".
Don't try to estimate multiple  columns - just sort columns according to 
the value of ndistinct as a heuristic.
I think we should estimate the number of values of multiple columns only 
if we have extended statistics on these columns. And this can extend the 
applicability of extended statistics.


The suggestion above is just an attempt to gather low-hanging fruit 
right now. If it is not applicable, we will go a long way.


--
regards,
Andrey Lepikhov
Postgres Professional





Re: POC: GROUP BY optimization

2023-07-23 Thread Andrey Lepikhov

On 20/7/2023 18:46, Tomas Vondra wrote:

On 7/20/23 08:37, Andrey Lepikhov wrote:

On 3/10/2022 21:56, Tom Lane wrote:

Revert "Optimize order of GROUP BY keys".

This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
several follow-on fixes.
...
Since we're hard up against the release deadline for v15, let's
revert these changes for now.  We can always try again later.


It may be time to restart the project. As a first step, I rebased the
patch on the current master. It wasn't trivial because of some latest
optimizations (a29eab, 1349d27 and 8d83a5d).
Now, Let's repeat the review and rewrite the current path according to
the reasons uttered in the revert commit.


I think the fundamental task is to make the costing more reliable, and
the commit message 443df6e2db points out a couple challenges in this
area. Not sure how feasible it is to address enough of them ...

1) procost = 1.0 - I guess we could make this more realistic by doing
some microbenchmarks and tuning the costs for the most expensive cases.

2) estimating quicksort comparisons - This relies on ndistinct
estimates, and I'm not sure how much more reliable we can make those.
Probably not much :-( Not sure what to do about this, the only thing I
can think of is to track "reliability" of the estimates and only do the
reordering if we have high confidence in the estimates. That means we'll
miss some optimization opportunities, but it should limit the risk.

I read up on the history of this thread.
As I see, all the problems mentioned above can be beaten by excluding 
the new cost model at all. We can sort GROUP BY columns according to the 
'ndistinct' value.
I see the reason for introducing the cost model in [1]. The main 
supporting point here is that with this patch, people couldn't optimize 
the query by themselves, organizing the order of the columns in a more 
optimal way. But now we have at least the GUC to switch off the 
behaviour introduced here. Also, some extensions, like the well-known 
pg_hint_plan, can help with automation.
So, how about committing of the undoubted part of the feature and 
working on the cost model in a new thread?


[1] 
https://www.postgresql.org/message-id/6d1e0cdb-dde3-f62a-43e2-e90bbd9b0f42%402ndquadrant.com


--
regards,
Andrey Lepikhov
Postgres Professional





Re: POC: GROUP BY optimization

2023-07-20 Thread Andrey Lepikhov

On 20/7/2023 18:46, Tomas Vondra wrote:

On 7/20/23 08:37, Andrey Lepikhov wrote:

On 3/10/2022 21:56, Tom Lane wrote:

Revert "Optimize order of GROUP BY keys".

This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
several follow-on fixes.
...
Since we're hard up against the release deadline for v15, let's
revert these changes for now.  We can always try again later.


It may be time to restart the project. As a first step, I rebased the
patch on the current master. It wasn't trivial because of some latest
optimizations (a29eab, 1349d27 and 8d83a5d).
Now, Let's repeat the review and rewrite the current path according to
the reasons uttered in the revert commit.


I think the fundamental task is to make the costing more reliable, and
the commit message 443df6e2db points out a couple challenges in this
area. Not sure how feasible it is to address enough of them ...

1) procost = 1.0 - I guess we could make this more realistic by doing
some microbenchmarks and tuning the costs for the most expensive cases.

2) estimating quicksort comparisons - This relies on ndistinct
estimates, and I'm not sure how much more reliable we can make those.
Probably not much :-( Not sure what to do about this, the only thing I
can think of is to track "reliability" of the estimates and only do the
reordering if we have high confidence in the estimates. That means we'll
miss some optimization opportunities, but it should limit the risk.

For me personally, the most challenging issue is:
3. Imbalance, induced by the cost_sort() changes. It may increase or 
decrease the contribution of sorting to the total cost compared to other 
factors and change choice of sorted paths.


--
regards,
Andrey Lepikhov
Postgres Professional





Re: POC: GROUP BY optimization

2023-07-20 Thread Andrey Lepikhov

On 3/10/2022 21:56, Tom Lane wrote:
> Revert "Optimize order of GROUP BY keys".
>
> This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
> several follow-on fixes.
> ...
> Since we're hard up against the release deadline for v15, let's
> revert these changes for now.  We can always try again later.

It may be time to restart the project. As a first step, I rebased the 
patch on the current master. It wasn't trivial because of some latest 
optimizations (a29eab, 1349d27 and 8d83a5d).
Now, Let's repeat the review and rewrite the current path according to 
the reasons uttered in the revert commit.


--
regards,
Andrey Lepikhov
Postgres Professional
From 913d55ee887dccfeba360f5f44ed347dd1ba9044 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Fri, 14 Jul 2023 10:29:36 +0700
Subject: [PATCH] When evaluating a query with a multi-column GROUP BY clause
 using sort, the cost may be heavily dependent on the order in which the keys
 are compared when building the groups. Grouping does not imply any ordering,
 so we're allowed to compare the keys in arbitrary order, and a Hash Agg
 leverages this. But for Group Agg, we simply compared keys in the order as
 specified in the query. This commit explores alternative ordering of the
 keys, trying to find a cheaper one.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In principle, we might generate grouping paths for all permutations of
the keys, and leave the rest to the optimizer. But that might get very
expensive, so we try to pick only a couple interesting orderings based
on both local and global information.

When planning the grouping path, we explore statistics (number of
distinct values, cost of the comparison function) for the keys and
reorder them to minimize comparison costs. Intuitively, it may be better
to perform more expensive comparisons (for complex data types etc.)
last, because maybe the cheaper comparisons will be enough. Similarly,
the higher the cardinality of a key, the lower the probability we’ll
need to compare more keys. The patch generates and costs various
orderings, picking the cheapest ones.

The ordering of group keys may interact with other parts of the query,
some of which may not be known while planning the grouping. E.g. there
may be an explicit ORDER BY clause, or some other ordering-dependent
operation, higher up in the query, and using the same ordering may allow
using either incremental sort or even eliminate the sort entirely.

The patch generates orderings and picks those minimizing the comparison
cost (for various pathkeys), and then adds orderings that might be
useful for operations higher up in the plan (ORDER BY, etc.). Finally,
it always keeps the ordering specified in the query, on the assumption
the user might have additional insights.

This introduces a new GUC enable_group_by_reordering, so that the
optimization may be disabled if needed.
---
 .../postgres_fdw/expected/postgres_fdw.out|  15 +-
 src/backend/optimizer/path/costsize.c | 363 ++-
 src/backend/optimizer/path/equivclass.c   |  13 +-
 src/backend/optimizer/path/pathkeys.c | 602 ++
 src/backend/optimizer/plan/planner.c  | 467 --
 src/backend/optimizer/util/pathnode.c |   4 +-
 src/backend/utils/adt/selfuncs.c  |  38 +-
 src/backend/utils/misc/guc_tables.c   |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/nodes/pathnodes.h |  10 +
 src/include/optimizer/cost.h  |   4 +-
 src/include/optimizer/paths.h |   7 +
 src/include/utils/selfuncs.h  |   5 +
 src/test/regress/expected/aggregates.out  | 244 ++-
 .../regress/expected/incremental_sort.out |   2 +-
 src/test/regress/expected/join.out|  51 +-
 src/test/regress/expected/merge.out   |  15 +-
 .../regress/expected/partition_aggregate.out  |  44 +-
 src/test/regress/expected/partition_join.out  |  75 +--
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/expected/union.out   |  60 +-
 src/test/regress/sql/aggregates.sql   |  99 +++
 src/test/regress/sql/incremental_sort.sql |   2 +-
 23 files changed, 1760 insertions(+), 374 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 852b5b4707..3f3c181ac8 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9593,13 +9593,16 @@ SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 
ON (t1.a = t2.b) INNER J
 -- left outer join + nullable clause
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 
10) t2 ON (t1.a = t2.b and

Re: POC, WIP: OR-clause support for indexes

2023-07-11 Thread Andrey Lepikhov

On 10/7/2023 15:38, Alena Rybakina wrote:

I agreed with the changes. Thank you for your work.

I updated patch and added you to the authors.

I specified Ranier Vilela as a reviewer.
This patch looks much better than earlier. But it definitely needs some 
covering with tests. As a first simple approximation, here you can see 
the result of regression tests, where the transformation limit is set to 
0. See in the attachment some test changes induced by these diffs.


Also, I see some impact of the transformation to other queries:
create_view.out:
(NOT x > z) > (x <= z)
inherit.out:
(((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
to
(((a)::text = ANY ('{NULL,cd}'::text[])) OR ((a)::text = 'ab'::text))

Transformations, mentioned above, are correct, of course. But it can be 
a sign of possible unstable behavior.


--
regards,
Andrey Lepikhov
Postgres Professional
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 0ddcb880ef..3d9b385c1f 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -43,6 +43,7 @@
 
 /* GUC parameters */
 bool   Transform_null_equals = false;
+intor_transform_limit = 500;
 
 
 static Node *transformExprRecurse(ParseState *pstate, Node *expr);
@@ -104,8 +105,6 @@ typedef struct OrClauseGroupEntry
Expr   *expr;
 } OrClauseGroupEntry;
 
-static int const_transform_or_limit = 500;
-
 static Node *
 transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig)
 {
@@ -115,7 +114,7 @@ transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig)
 
/* If this is not an 'OR' expression, skip the transformation */
if (expr_orig->boolop != OR_EXPR ||
-   list_length(expr_orig->args) < const_transform_or_limit)
+   list_length(expr_orig->args) < or_transform_limit)
return transformBoolExpr(pstate, (BoolExpr *) expr_orig);
 
foreach(lc, expr_orig->args)
diff --git a/src/backend/utils/misc/guc_tables.c 
b/src/backend/utils/misc/guc_tables.c
index c14456060c..c7ac73ebe0 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2046,6 +2046,16 @@ struct config_int ConfigureNamesInt[] =
100, 1, MAX_STATISTICS_TARGET,
NULL, NULL, NULL
},
+   {
+   {"or_transform_limit", PGC_USERSET, QUERY_TUNING_OTHER,
+   gettext_noop("Transform a sequence of OR clauses to an 
IN expression."),
+   gettext_noop("The planner will replace clauses like 
'x=c1 OR x=c2 .."
+"to the clause 'x IN 
(c1,c2,...)'")
+   },
+   _transform_limit,
+   500, 0, INT_MAX,
+   NULL, NULL, NULL
+   },
{
{"from_collapse_limit", PGC_USERSET, QUERY_TUNING_OTHER,
gettext_noop("Sets the FROM-list size beyond which 
subqueries "
diff --git a/src/include/parser/parse_expr.h b/src/include/parser/parse_expr.h
index 7d38ca75f7..891e6a462b 100644
--- a/src/include/parser/parse_expr.h
+++ b/src/include/parser/parse_expr.h
@@ -17,6 +17,7 @@
 
 /* GUC parameters */
 extern PGDLLIMPORT bool Transform_null_equals;
+extern PGDLLIMPORT int or_transform_limit;
 
 extern Node *transformExpr(ParseState *pstate, Node *expr, ParseExprKind 
exprKind);
 
diff --git a/src/test/regress/expected/create_index.out 
b/src/test/regress/expected/create_index.out
index acfd9d1f4f..60e053d217 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -1883,6 +1883,46 @@ SELECT count(*) FROM tenk1
 10
 (1 row)
 
+SET or_transform_limit = 0;
+EXPLAIN (COSTS OFF)
+SELECT * FROM tenk1
+  WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR tenthous = 42);
+  QUERY PLAN  
+--
+ Index Scan using tenk1_thous_tenthous on tenk1
+   Index Cond: ((thousand = 42) AND (tenthous = ANY ('{1,3,42}'::integer[])))
+(2 rows)
+
+SELECT * FROM tenk1
+  WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR tenthous = 42);
+ unique1 | unique2 | two | four | ten | twenty | hundred | thousand | 
twothousand | fivethous | tenthous | odd | even | stringu1 | stringu2 | string4 
+-+-+-+--+-++-+--+-+---+--+-+--+--+--+-
+  42 |5530 |   0 |2 |   2 |  2 |  42 |   42 |  
42 |42 |   42 |  84 |   85 | QB   | SEIAAA   | xx
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM tenk1
+  WHERE hundred = 42

Re: Generating code for query jumbling through gen_node_support.pl

2023-07-11 Thread Andrey Lepikhov

On 11/7/2023 12:35, Michael Paquier wrote:

On Tue, Jul 11, 2023 at 12:29:29PM +0700, Andrey Lepikhov wrote:

I vote for only one method based on a query tree structure.


Noted


BTW, did you think about different algorithms of queryId generation?


Not really, except if you are referring to the possibility of being
able to handle differently different portions of the nodes depending
on a context given by the callers willing to do a query jumbling
computation.  (For example, choose to *not* silence the Const nodes,
etc.)

Yes, I have two requests on different queryId algorithms:
1. With suppressed Const nodes.
2. With replacement of Oids with full names - to give a chance to see 
the same queryId at different instances for the same query.


It is quite trivial to implement, but not easy in support.



Auto-generated queryId code can open a way for extensions to have
easy-supporting custom queryIds.


Extensions can control that at some extent, already.
--
Michael


--
regards,
Andrey Lepikhov
Postgres Professional





Re: Generating code for query jumbling through gen_node_support.pl

2023-07-10 Thread Andrey Lepikhov

On 11/7/2023 05:35, Michael Paquier wrote:

On Mon, Jan 30, 2023 at 11:48:45AM +0100, Peter Eisentraut wrote:

On 27.01.23 03:59, Michael Paquier wrote:

At the end, that would be unnoticeable for the average user, I guess,
but here are the numbers I get on my laptop :)


Personally, I think we do not want the two jumble methods in parallel.

Maybe there are other opinions.


(Thanks Jonathan for the poke.)

Now that we are in mid-beta for 16, it would be a good time to
conclude on this open item:
"Reconsider a utility_query_id GUC to control if query jumbling of
utilities can go through the past string-only mode and the new mode?"

In Postgres ~15, utility commands used a hash of the query string to
compute their query ID.  The current query jumbling code uses a Query
instead, like any other queries.  I have registered this open item as
a self-reminder, mostly in case there would be an argument to have a
GUC where users could switch from one mode to another.  See here as
well for some computation times for each method (table is in ns, wiht
millions of iterations):
https://www.postgresql.org/message-id/y9eeyindb1acp...@paquier.xyz

I still don't think that we need both methods based on these numbers,
but there may be more opinions about that?  Are people OK if this open
item is discarded?

I vote for only one method based on a query tree structure.
BTW, did you think about different algorithms of queryId generation? 
Auto-generated queryId code can open a way for extensions to have 
easy-supporting custom queryIds.


--
regards,
Andrey Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2023-07-09 Thread Andrey Lepikhov

On 7/7/2023 15:20, Alena Rybakina wrote:


because we will provide similar manipulation in this:

foreach(l, gentry->consts)
{
       Node       *rexpr = (Node *) lfirst(l);

       rexpr = coerce_to_common_type(pstate, rexpr,
                                             scalar_type,
                                             "IN");
      aexprs = lappend(aexprs, rexpr);
}

I'm not sure that it should be replaced.
In attachment - a bit more corrections to the patch.
The most important change - or_list contains already transformed 
expression subtree. So, I think we don't need to free it at all.


--
regards,
Andrey Lepikhov
Postgres Professional
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 961ca3e482..f0fd63f05c 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -112,9 +112,6 @@ transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig)
List   *or_list = NIL;
List   *groups_list = NIL;
ListCell   *lc;
-   Node   *result;
-   BoolExpr   *expr;
-   boolchange_apply = false;
 
/* If this is not an 'OR' expression, skip the transformation */
if (expr_orig->boolop != OR_EXPR ||
@@ -131,16 +128,7 @@ transformBoolExprOr(ParseState *pstate, BoolExpr 
*expr_orig)
OrClauseGroupEntry *gentry;
boolconst_is_left = true;
 
-   /*
-* The first step: checking that the expression consists only 
of equality.
-* We can only do this here, while arg is still row data type, 
namely A_Expr.
-* After applying transformExprRecurce, we already know that it 
will be OpExr type,
-* but checking the expression for equality is already becoming 
impossible for us.
-* Sometimes we have the chance to devide expression into the 
groups on
-* equality and inequality. This is possible if all list items 
are not at the
-* same level of a single BoolExpr expression, otherwise all of 
them cannot be converted.
-*/
-
+   /* At first, transform the arg and evaluate constant 
expressions. */
orqual = transformExprRecurse(pstate, (Node *) arg);
orqual = coerce_to_boolean(pstate, orqual, "OR");
orqual = eval_const_expressions(NULL, orqual);
@@ -151,7 +139,13 @@ transformBoolExprOr(ParseState *pstate, BoolExpr 
*expr_orig)
continue;
}
 
-   /* Detect constant side of the clause */
+   /*
+* Detect the constant side of the clause. Recall non-constant
+* expression can be made not only with Vars, but also with 
Params,
+* which is not bonded with any relation. Thus, we detect the 
const
+* side - if another side is constant too, the orqual couldn't 
be
+* an OpExpr.
+*/
if (IsA(get_leftop(orqual), Const))
const_is_left = true;
else if (IsA(get_rightop(orqual), Const))
@@ -175,26 +169,14 @@ transformBoolExprOr(ParseState *pstate, BoolExpr 
*expr_orig)
}
 
/*
-* The next step: transform all the inputs, and detect whether
-* any contain Vars.
-*/
-   if (!contain_vars_of_level(orqual, 0))
-   {
-   /* Again, it's not the expr we can transform */
-   or_list = lappend(or_list, orqual);
-   continue;
-   }
-
-   ;
-
-   /*
-   * At this point we definitely have a transformable 
clause.
-   * Classify it and add into specific group of clauses, 
or create new
-   * group.
-   * TODO: to manage complexity in the case of many 
different clauses
-   * (X1=C1) OR (X2=C2 OR) ... (XN = CN) we could invent 
something
-   * like a hash table (htab key ???).
-   */
+   * At this point we definitely have a transformable clause.
+   * Classify it and add into specific group of clauses, or create 
new
+   * group.
+   * TODO: to manage complexity in the case of many different 
clauses
+   * (X1=C1) OR (X2=C2 OR) ... (XN = CN) we could invent something
+   * like a hash table. But also we believe, that the case of many
+   * different variable sides is very rare.
+   */
foreach(lc_groups, groups_list)
{
OrClauseGroupEntry *v = (OrClauseGroupEntry *) 
lfirst(lc_groups);
@@ -220,20 +202

Re: POC, WIP: OR-clause support for indexes

2023-07-06 Thread Andrey Lepikhov

On 6/7/2023 03:06, Alena Rybakina wrote:
The test was performed on the same benchmark database generated by 2 
billion values.


I corrected this constant in the patch.
In attempt to resolve some issues had mentioned in my previous letter I 
used op_mergejoinable to detect mergejoinability of a clause.
Constant side of the expression is detected by call of 
eval_const_expressions() and check each side on the Const type of node.


See 'diff to diff' in attachment.

--
regards,
Andrey Lepikhov
Postgres Professional
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 26648b0876..67d6f85010 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -111,38 +111,27 @@ transformBoolExprOr(ParseState *pstate, BoolExpr 
*expr_orig)
 {
List   *or_list = NIL;
List   *groups_list = NIL;
-   ListCell   *lc_eargs;
+   ListCell   *lc;
Node   *result;
BoolExpr   *expr;
boolchange_apply = false;
 
-   /* If this is not expression "Or", then will do it the old way. */
+   /* If this is not an 'OR' expression, skip the transformation */
if (expr_orig->boolop != OR_EXPR ||
list_length(expr_orig->args) < const_transform_or_limit)
-   return transformBoolExpr(pstate, (BoolExpr *)expr_orig);
+   return transformBoolExpr(pstate, (BoolExpr *) expr_orig);
 
expr = (BoolExpr *)copyObject(expr_orig);
 
-   /*
-   * NOTE:
-   * It is an OR-clause. So, rinfo->orclause is a BoolExpr node, 
contains
-   * a list of sub-restrictinfo args, and rinfo->clause - which is 
the
-   * same expression, made from bare clauses. To not break 
selectivity
-   * caches and other optimizations, use both:
-   * - use rinfos from orclause if no transformation needed
-   * - use  bare quals from rinfo->clause in the case of 
transformation,
-   * to create new RestrictInfo: in this case we have no options 
to avoid
-   * selectivity estimation procedure.
-   */
-   foreach(lc_eargs, expr->args)
+   foreach(lc, expr->args)
{
-   A_Expr *arg = (A_Expr *) lfirst(lc_eargs);
-   Node   *bare_orarg;
+   Node   *arg = lfirst(lc);
+   Node   *orqual;
Node   *const_expr;
-   Node   *non_const_expr;
+   Node   *nconst_expr;
ListCell   *lc_groups;
OrClauseGroupEntry *gentry;
-   boolallow_transformation;
+   boolconst_is_left = true;
 
/*
 * The first step: checking that the expression consists only 
of equality.
@@ -154,33 +143,51 @@ transformBoolExprOr(ParseState *pstate, BoolExpr 
*expr_orig)
 * same level of a single BoolExpr expression, otherwise all of 
them cannot be converted.
 */
 
-   if (!arg)
-   break;
+   orqual = transformExprRecurse(pstate, (Node *) arg);
+   orqual = coerce_to_boolean(pstate, orqual, "OR");
+   orqual = eval_const_expressions(NULL, orqual);
 
-   allow_transformation = (arg->type == T_A_Expr && (arg)->kind == 
AEXPR_OP &&
-   
list_length((arg)->name) >=1 && strcmp(strVal(linitial((arg)->name)), "=") == 0
-  );
+   if (!IsA(orqual, OpExpr))
+   {
+   or_list = lappend(or_list, orqual);
+   continue;
+   }
 
+   /* Detect constant side of the clause */
+   if (IsA(get_leftop(orqual), Const))
+   const_is_left = true;
+   else if (IsA(get_rightop(orqual), Const))
+   const_is_left = false;
+   else
+   {
+   or_list = lappend(or_list, orqual);
+   continue;
+   }
 
-   bare_orarg = transformExprRecurse(pstate, (Node *)arg);
-   bare_orarg = coerce_to_boolean(pstate, bare_orarg, "OR");
+   /* Get pointers to constant and expression sides of the qual */
+   nconst_expr = (const_is_left) ? get_rightop(orqual) :
+   
get_leftop(orqual);
+   const_expr = (const_is_left) ?  get_leftop(orqual

Re: POC, WIP: OR-clause support for indexes

2023-07-06 Thread Andrey Lepikhov

On 6/7/2023 03:06, Alena Rybakina wrote:

I corrected this constant in the patch.

The patch don't apply cleanly: it contains some trailing spaces.

Also, quick glance into the code shows some weak points;
1. transformBoolExprOr should have input type BoolExpr.
2. You can avoid the switch operator at the beginning of the function, 
because you only need one option.

3. Stale comments: RestrictIinfos definitely not exists at this point.
4. I don't know, you really need to copy the expr or not, but it is 
better to do as late, as possible.

5. You assume, that leftop is non-constant and rightop - constant. Why?
6.I doubt about equivalence operator. Someone can invent a custom '=' 
operator with another semantics, than usual. May be better to check 
mergejoinability?
7. I don't know how to confidently identify constant expressions at this 
level. So, I guess, You can only merge here expressions like 
"F(X)=Const", not an 'F(X)=ConstExpression'.


See delta.diff with mentioned changes in attachment.

--
regards,
Andrey Lepikhov
Postgres Professional
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index c9193d826f..26648b0876 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -107,41 +107,22 @@ typedef struct OrClauseGroupEntry
 static int const_transform_or_limit = 500;
 
 static Node *
-transformBoolExprOr(ParseState *pstate, Expr *expr_orig)
+transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig)
 {
List   *or_list = NIL;
List   *groups_list = NIL;
ListCell   *lc_eargs;
Node   *result;
-   BoolExpr   *expr = (BoolExpr *)copyObject(expr_orig);
-   const char *opname;
+   BoolExpr   *expr;
boolchange_apply = false;
-   boolor_statement = false;
-
-   Assert(IsA(expr, BoolExpr));
 
/* If this is not expression "Or", then will do it the old way. */
-   switch (expr->boolop)
-   {
-   case AND_EXPR:
-   opname = "AND";
-   break;
-   case OR_EXPR:
-   opname = "OR";
-   or_statement = true;
-   break;
-   case NOT_EXPR:
-   opname = "NOT";
-   break;
-   default:
-   elog(ERROR, "unrecognized boolop: %d", (int) 
expr->boolop);
-   opname = NULL;  /* keep compiler quiet */
-   break;
-   }
-
-   if (!or_statement || list_length(expr->args) < const_transform_or_limit)
+   if (expr_orig->boolop != OR_EXPR ||
+   list_length(expr_orig->args) < const_transform_or_limit)
return transformBoolExpr(pstate, (BoolExpr *)expr_orig);
 
+   expr = (BoolExpr *)copyObject(expr_orig);
+
/*
* NOTE:
* It is an OR-clause. So, rinfo->orclause is a BoolExpr node, 
contains
@@ -176,15 +157,13 @@ transformBoolExprOr(ParseState *pstate, Expr *expr_orig)
if (!arg)
break;
 
-   allow_transformation = (
-   or_statement &&
-   arg->type == T_A_Expr && (arg)->kind == 
AEXPR_OP &&
+   allow_transformation = (arg->type == T_A_Expr && (arg)->kind == 
AEXPR_OP &&

list_length((arg)->name) >=1 && strcmp(strVal(linitial((arg)->name)), "=") == 0
   );
 
 
bare_orarg = transformExprRecurse(pstate, (Node *)arg);
-   bare_orarg = coerce_to_boolean(pstate, bare_orarg, opname);
+   bare_orarg = coerce_to_boolean(pstate, bare_orarg, "OR");
 
/*
 * The next step: transform all the inputs, and detect whether 
any contain
@@ -357,14 +336,10 @@ transformBoolExprOr(ParseState *pstate, Expr *expr_orig)
}
}
 
-   if (!change_apply)
-   {
-   or_statement = false;
-   }
list_free_deep(groups_list);
}
 
-   if (or_statement)
+   if (change_apply)
{
/* One more trick: assemble correct clause */
expr = list_length(or_list) > 1 ? makeBoolExpr(OR_EXPR, 
list_copy(or_list), -1) : linitial(or_list);
@@ -376,9 +351,8 @@ transformBoolExprOr(ParseState *pstate, Expr *expr_orig)
 * transformBoolExpr function
 */
else
-   {
result = transformB

Re: Removing unneeded self joins

2023-07-05 Thread Andrey Lepikhov

Hi,

During the significant code revision in v.41 I lost some replacement 
operations. Here is the fix and extra tests to check this in the future.
Also, Tom added the JoinDomain structure five months ago, and I added 
code to replace relids for that place too.
One more thing, I found out that we didn't replace SJs, defined by 
baserestrictinfos if no one self-join clause have existed for the join. 
Now, it is fixed, and the test has been added.

To understand changes readily, see the delta file in the attachment.

--
regards,
Andrey Lepikhov
Postgres Professional
From 8f5a432f6fbbcad1fd2937f33af09e9328690b6b Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Tue, 4 Jul 2023 16:07:50 +0700
Subject: [PATCH] Add lost arrangements of relids and varnos. Add the test to
 check it. Add one more cleaning procedure on JoinDomain relids which was
 introduced recently with commit 3bef56e. Fix the corner case when we haven't
 removed SJ if the selfjoinquals list was empty.

---
 src/backend/optimizer/plan/analyzejoins.c | 15 ++-
 src/test/regress/expected/join.out| 26 ---
 src/test/regress/expected/updatable_views.out | 17 +---
 src/test/regress/sql/join.sql |  9 +++
 4 files changed, 53 insertions(+), 14 deletions(-)

diff --git a/src/backend/optimizer/plan/analyzejoins.c 
b/src/backend/optimizer/plan/analyzejoins.c
index a93e4ce05c..15234b7a3b 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -424,6 +424,8 @@ remove_rel_from_query_common(PlannerInfo *root, RelOptInfo 
*rel,
sjinf->commute_above_r = replace_relid(sjinf->commute_above_r, 
ojrelid, subst);
sjinf->commute_below_l = replace_relid(sjinf->commute_below_l, 
ojrelid, subst);
sjinf->commute_below_r = replace_relid(sjinf->commute_below_r, 
ojrelid, subst);
+
+   replace_varno((Node *) sjinf->semi_rhs_exprs, relid, subst);
}
 
/*
@@ -465,6 +467,8 @@ remove_rel_from_query_common(PlannerInfo *root, RelOptInfo 
*rel,
/* ph_needed might or might not become empty */
phv->phrels = replace_relid(phv->phrels, relid, subst);
phv->phrels = replace_relid(phv->phrels, ojrelid, 
subst);
+   phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, 
relid, subst);
+   phinfo->ph_var->phrels = 
replace_relid(phinfo->ph_var->phrels, relid, subst);
Assert(!bms_is_empty(phv->phrels));
Assert(phv->phnullingrels == NULL); /* no need to 
adjust */
}
@@ -1545,6 +1549,7 @@ update_eclass(EquivalenceClass *ec, int from, int to)
}
 
em->em_relids = replace_relid(em->em_relids, from, to);
+   em->em_jdomain->jd_relids = 
replace_relid(em->em_jdomain->jd_relids, from, to);
 
/* We only process inner joins */
replace_varno((Node *) em->em_expr, from, to);
@@ -2101,7 +2106,7 @@ remove_self_joins_one_group(PlannerInfo *root, Relids 
relids)
 */
restrictlist = generate_join_implied_equalities(root, 
joinrelids,

inner->relids,
-   
outer, 0);
+   
outer, NULL);
 
/*
 * Process restrictlist to seperate the self join quals 
out of
@@ -2111,6 +2116,14 @@ remove_self_joins_one_group(PlannerInfo *root, Relids 
relids)
split_selfjoin_quals(root, restrictlist, ,
 
, inner->relid, outer->relid);
 
+   /*
+* To enable SJE for the only degenerate case without 
any self join
+* clauses at all, add baserestrictinfo to this list.
+* Degenerate case works only if both sides have the 
same clause. So
+* doesn't matter which side to add.
+*/
+   selfjoinquals = list_concat(selfjoinquals, 
outer->baserestrictinfo);
+
/*
 * Determine if the inner table can duplicate outer 
rows.  We must
 * bypass the unique rel cache here since we're 
possibly using a
diff --git a/src/test/regress/expected/join.out 
b/src/test/regress/expected/join.out
index b1f43f6ff8..027c356bcc 100644
--- a/src/test/regress/expected/joi

Re: Removing unneeded self joins

2023-06-30 Thread Andrey Lepikhov

On 4/4/2023 02:30, Gregory Stark (as CFM) wrote:

On Mon, 6 Mar 2023 at 00:30, Michał Kłeczek  wrote:


Hi All,

I just wanted to ask about the status and plans for this patch.
I can see it being stuck at “Waiting for Author” status in several commit tests.


Sadly it seems to now be badly in need of a rebase. There are large
hunks failing in the guts of analyzejoins.c as well as minor failures
elsewhere and lots of offsets which need to be reviewed.

I think given the lack of activity it's out of time for this release
at this point. I'm moving it ahead to the next CF.

Hi,

Version 41 is heavily remade of the feature:

1. In previous versions, I tried to reuse remove_rel_from_query() for 
both left and self-join removal. But for now, I realized that it is a 
bit different procedures which treat different operations. In this 
patch, only common stages of the PlannerInfo fixing process are united 
in one function.
2. Transferring clauses from the removing table to keeping one is more 
transparent now and contains comments.
3. Equivalence classes update procedure was changed according to David's 
commit 3373c71. As I see, Tom has added remove_rel_from_eclass since the 
last v.40 version, and it looks pretty similar to the update_eclass 
routine in this patch.


It passes regression tests, but some questions are still open:
1. Should we look for duplicated or redundant clauses (the same for 
eclasses) during the clause transfer procedure? On the one side, we 
reduce the length of restrict lists that can impact planning or 
executing time. Additionally, we improve the accuracy of cardinality 
estimation. On the other side, it is one more place that can make 
planning time much longer in specific cases. It would have been better 
to avoid calling the equal() function here, but it's the only way to 
detect duplicated inequality expressions.
2. Could we reuse ChangeVarNodes instead of sje_walker(), merge 
remove_rel_from_restrictinfo with replace_varno?
3. Also, I still don't finish with the split_selfjoin_quals: some 
improvements could be made.


--
regards,
Andrey Lepikhov
Postgres Professional
From 4a342b9789f5be209318c13fb7ec336fcbd2aee5 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov 
Date: Mon, 15 May 2023 09:04:51 +0500
Subject: [PATCH] Remove self-joins.

A Self Join Elimination (SJE) feature removes inner join of plain table to 
itself
in a query tree if can be proved that the join can be replaced with a scan.
The proof based on innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.

In this patch we use the next approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation.
3. If uniqueness of outer relation is deduced from baserestrictinfo clause like
'x=const' on unique field(s), check what the inner has the same clause with the
same constant(s).
4. Compare RowMarks of inner and outer relations. They must have the same
strength.

Some regression tests changed due to self-join removal logic.
---
 doc/src/sgml/config.sgml  |   16 +
 src/backend/optimizer/path/indxpath.c |   38 +
 src/backend/optimizer/plan/analyzejoins.c | 1077 -
 src/backend/optimizer/plan/planmain.c |5 +
 src/backend/utils/misc/guc_tables.c   |   22 +
 src/include/optimizer/paths.h |3 +
 src/include/optimizer/planmain.h  |7 +
 src/test/regress/expected/equivclass.out  |   32 +
 src/test/regress/expected/join.out|  798 +++
 src/test/regress/expected/sysviews.out|3 +-
 src/test/regress/sql/equivclass.sql   |   16 +
 src/test/regress/sql/join.sql |  351 +++
 src/tools/pgindent/typedefs.list  |2 +
 13 files changed, 2319 insertions(+), 51 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6262cb7bb2..68215e1093 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5419,6 +5419,22 @@ ANY num_sync ( 
+  enable_self_join_removal (boolean)
+  
+   enable_self_join_removal configuration 
parameter
+  
+  
+  
+   
+   Enables or disables the query planner's optimization which analyses
+query tree and replaces self joins with semantically equivalent single
+scans. Take into consideration only plain tables.
+The default is on.
+   
+  
+ 
+
  
   enable_seqscan (boolean)
   
diff --git a/src/backend/optimizer/path/indxpath.c 
b/src/backend/optimizer/path/indxpath.c
index 0065c8992b..57bdc6811f 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3491,6 +3491,21

Re: Problems with estimating OR conditions, IS NULL on LEFT JOINs

2023-06-26 Thread Andrey Lepikhov

On 24/6/2023 17:23, Tomas Vondra wrote:

I really hope what I just wrote makes at least a little bit of sense.

Throw in one more example:

SELECT i AS id INTO l FROM generate_series(1,10) i;
CREATE TABLE r (id int8, v text);
INSERT INTO r (id, v) VALUES (1, 't'), (-1, 'f');
ANALYZE l,r;
EXPLAIN ANALYZE
SELECT * FROM l LEFT OUTER JOIN r ON (r.id = l.id) WHERE r.v IS NULL;

Here you can see the same kind of underestimation:
Hash Left Join  (... rows=500 width=14) (... rows=9 ...)

So the eqjoinsel_unmatch_left() function should be modified for the case 
where nd1

--
regards,
Andrey Lepikhov
Postgres Professional





Re: eqjoinsel_semi still sucks ...

2023-06-23 Thread Andrey Lepikhov

On 2/5/2012 20:34, Tom Lane wrote:

On reflection I think that the idea of clamping ndistinct beforehand is
just wrong, and what we ought to do instead is apply a multiplier to the
selectivity estimate afterwards.  In the case of a base rel we could
just multiply by the selectivity of its baserestrictinfo list.  For join
rels it's a bit harder to guess how much a given input relation might
have been decimated, but if the join's estimated size is smaller than
the output size of the base rel the correlation var came from, we could
multiply by that ratio (on top of whatever correction came from the base
rel's restriction clauses).
I got stuck in some cases where (due to a tree of filters) the planner 
underestimates the JOIN just because the ndistinct conveys a huge number 
to the selectivity estimation formula. However, the estimation of both 
input relations is made correctly and is limited.
I've tried to understand the logic through commits 0d3b231eebf, 
97930cf578e and 7f3eba30c9d. But it is still not clear.
So, why the idea of clamping ndistinct is terrible in general? Could you 
explain your reasons a bit more?


--
regards,
Andrey Lepikhov
Postgres Professional





MergeJoin beats HashJoin in the case of multiple hash clauses

2023-06-15 Thread Andrey Lepikhov

Hi, all.

Some of my clients use JOIN's with three - four clauses. Quite 
frequently, I see complaints on unreasonable switch of JOIN algorithm to 
Merge Join instead of Hash Join. Quick research have shown one weak 
place - estimation of an average bucket size in final_cost_hashjoin (see 
q2.sql in attachment) with very conservative strategy.
Unlike estimation of groups, here we use smallest ndistinct value across 
all buckets instead of multiplying them (or trying to make multivariate 
analysis).
It works fine for the case of one clause. But if we have many clauses, 
and if each has high value of ndistinct, we will overestimate average 
size of a bucket and, as a result, prefer to use Merge Join. As the 
example in attachment shows, it leads to worse plan than possible, 
sometimes drastically worse.
I assume, this is done with fear of functional dependencies between hash 
clause components. But as for me, here we should go the same way, as 
estimation of groups.

The attached patch shows a sketch of the solution.

--
regards,
Andrey Lepikhov
Postgres Professional

q2.sql
Description: application/sql
diff --git a/src/backend/optimizer/path/costsize.c 
b/src/backend/optimizer/path/costsize.c
index ef475d95a1..26f26d6a40 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -4033,11 +4033,12 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
thismcvfreq = restrictinfo->left_mcvfreq;
}
 
-   if (innerbucketsize > thisbucketsize)
-   innerbucketsize = thisbucketsize;
-   if (innermcvfreq > thismcvfreq)
-   innermcvfreq = thismcvfreq;
+   innerbucketsize *= thisbucketsize;
+   innermcvfreq *= thismcvfreq;
}
+
+   if (innerbucketsize > virtualbuckets)
+   innerbucketsize = 1.0 / virtualbuckets;
}
 
/*


Re: Removing unneeded self joins

2023-05-25 Thread Andrey Lepikhov

On 3/6/23 10:30, Michał Kłeczek wrote:
> Hi All,
>
> I just wanted to ask about the status and plans for this patch.
> I can see it being stuck at “Waiting for Author” status in several
> commit tests.
>
> I think this patch would be really beneficial for us as we heavily use
> views to structure out code.
> Each view is responsible for providing some calculated values and 
they > are joined in a query to retrieve the full set of information.

>
> Not sure how the process works and how I could help (I am absolutely
> not capable of helping with coding I am afraid - but could sponsor a
> (small :) ) bounty to speed things up).

Yes, I am still working on this feature. Because of significant changes 
in the optimizer code which Tom & Richard had been doing last months, I 
didn't touch it for a while. But now this work can be continued.


Current patch is rebased on current master. Because of the nullable_rels 
logic, introduced recently, ojrelids were highly spreaded across planner 
bitmapsets. So, JE logic was changed.


But now, I'm less happy with the code. It seems we need to refactor it:
1. According to reports of some performance engineers, the feature can 
cause overhead ~0.5% on trivial queries without joins at all. We should 
discover the patch and find the way for quick and cheap return, if the 
statement contains no one join or, maybe stronger, no one self join.
2. During join elimination we replace clauses like 'x=x' with 'x IS NOT 
NULL'. It is a weak point because we change clause semantic 
(mergejoinable to non-mergejoinable, in this example) and could forget 
consistently change some RestrictInfo fields.
3. In the previous versions we changed the remove_rel_from_query routine 
trying to use it in both 'Useless outer join' and 'Self join' 
elimination optimizations. Now, because of the 'ojrelid' field it looks 
too complicated. Do we need to split this routine again?


--
Regards
Andrey Lepikhov
Postgres Professional
From cb4340577dab0e8cf5531e9934f5734fda178490 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov 
Date: Mon, 15 May 2023 09:04:51 +0500
Subject: [PATCH] Remove self-joins.

A Self Join Elimination (SJE) feature removes inner join of plain table to itself
in a query tree if can be proved that the join can be replaced with a scan.
The proof based on innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.

In this patch we use the next approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation.
3. If uniqueness of outer relation is deduced from baserestrictinfo clause like
'x=const' on unique field(s), check what the inner has the same clause with the
same constant(s).
4. Compare RowMarks of inner and outer relations. They must have the same
strength.

Some regression tests changed due to self-join removal logic.
---
 doc/src/sgml/config.sgml  |   16 +
 src/backend/optimizer/path/indxpath.c |   38 +
 src/backend/optimizer/plan/analyzejoins.c | 1093 -
 src/backend/optimizer/plan/planmain.c |5 +
 src/backend/utils/misc/guc_tables.c   |   22 +
 src/include/optimizer/paths.h |3 +
 src/include/optimizer/planmain.h  |7 +
 src/test/regress/expected/equivclass.out  |   32 +
 src/test/regress/expected/join.out|  774 +++
 src/test/regress/expected/sysviews.out|3 +-
 src/test/regress/sql/equivclass.sql   |   16 +
 src/test/regress/sql/join.sql |  340 +++
 src/tools/pgindent/typedefs.list  |2 +
 13 files changed, 2313 insertions(+), 38 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5da74b3c40..b47d11759e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5437,6 +5437,22 @@ ANY num_sync ( 
+  enable_self_join_removal (boolean)
+  
+   enable_self_join_removal configuration parameter
+  
+  
+  
+   
+   Enables or disables the query planner's optimization which analyses
+query tree and replaces self joins with semantically equivalent single
+scans. Take into consideration only plain tables.
+The default is on.
+   
+  
+ 
+
  
   enable_seqscan (boolean)
   
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 1436dbc2f2..1b1aa9083c 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3491,6 +3491,21 @@ bool
 relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 			  List *restrictlist,
 	

Re: [POC] Allow an extension to add data into Query and PlannedStmt nodes

2023-03-30 Thread Andrey Lepikhov

On 30/3/2023 12:57, Julien Rouhaud wrote:

Extensions could need to pass some query-related data through all stages of
the query planning and execution. As a trivial example, pg_stat_statements
uses queryid at the end of execution to save some statistics. One more
reason - extensions now conflict on queryid value and the logic of its
changing. With this patch, it can be managed.


I just had a quick lookc at the patch, and IIUC it doesn't really help on that
side, as there's still a single official "queryid" that's computed, stored
everywhere and later used by pg_stat_statements (which does then store in
additionally to that official queryid).

Thank you for the attention!
This patch doesn't try to solve the problem of oneness of queryId. In 
this patch we change pg_stat_statements and it doesn't set 0 into 
queryId field according to its internal logic. And another extension 
should do the same - use queryId on your own but not erase it - erase 
your private copy in the ext_field.



Ruthless pgbench benchmark shows that we got some overhead:
1.6% - in default mode
4% - in prepared mode
~0.1% in extended mode.


That's a quite significant overhead.  But the only reason to accept such a
change is to actually use it to store additional data, so it would be
interesting to see what the overhead is like once you store at least 2
different values there.
Yeah, but as I said earlier, it can be reduced to much smaller value 
just with simple optimization. Here I intentionally avoid it to discuss 
the core concept.



- Are we need to invent a registration procedure to do away with the names
of entries and use some compact integer IDs?


Note that the patch as proposed doesn't have any defense for two extensions
trying to register something with the same name, or update a stored value, as
AddExtensionDataToNode() simply prepend the new value to the list.  You can
actually update the value by just storing the new value, but it will add a
significant overhead to every other extension that want to read another value.
Thanks a lot! Patch in attachment implements such an idea - extension 
can allocate some entries and use these private IDs to add entries. I 
hope, an extension would prefer to use only one entry for all the data 
to manage overhead, but still.


The API is also quite limited as each stored value has a single identifier.
What if your extension needs to store multiple things?  Since it's all based on
Node you can't really store some custom struct, so you probably have to end up
with things like "my_extension.my_val1", "my_extension.my_val2" which isn't
great.
Main idea here - if an extension holds custom struct and want to pass it 
along all planning and execution stages it should use extensible node 
with custom read/write/copy routines.


--
regards,
Andrey Lepikhov
Postgres Professional
From ab101322330684e9839e46c26f70ad5462e40dac Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Thu, 30 Mar 2023 21:49:32 +0500
Subject: [PATCH 2/2] Add ExtendedData entry registration routines to use
 entryId instead of symbolic name.

---
 .../pg_stat_statements/pg_stat_statements.c   |  7 +-
 src/backend/commands/extension.c  | 69 +++
 src/include/commands/extension.h  |  6 +-
 src/include/nodes/parsenodes.h|  2 +-
 4 files changed, 64 insertions(+), 20 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c 
b/contrib/pg_stat_statements/pg_stat_statements.c
index fc47661c86..5b21163365 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -109,11 +109,11 @@ static const uint32 PGSS_PG_MAJOR_VERSION = 
PG_VERSION_NUM / 100;
!IsA(n, 
DeallocateStmt))
 
 #define GET_QUERYID(node) \
-   (Bigint *) GetExtensionData(node->ext_field, "pg_stat_statements")
+   (Bigint *) GetExtensionData(node->ext_field, extendedEntryID)
 
 #define INSERT_QUERYID(node, queryid) \
castNode(Bigint, AddExtensionDataToNode((Node *) node, \
-   
"pg_stat_statements", \
+   
extendedEntryID, \

(Node *) makeBigint((int64) queryid), \

true))
 /*
@@ -298,6 +298,7 @@ static bool pgss_track_utility = true;  /* whether to 
track utility commands */
 static bool pgss_track_planning = false;   /* whether to track planning

 * duration */
 static bool pgss_save = true;  /* whether to save stats across shutdown */
+static int

[POC] Allow an extension to add data into Query and PlannedStmt nodes

2023-03-29 Thread Andrey Lepikhov

Hi,

Previously, we read int this mailing list some controversial opinions on 
queryid generation and Jumbling technique. Here we don't intend to solve 
these problems but help an extension at least don't conflict with others 
on the queryId value.


Extensions could need to pass some query-related data through all stages 
of the query planning and execution. As a trivial example, 
pg_stat_statements uses queryid at the end of execution to save some 
statistics. One more reason - extensions now conflict on queryid value 
and the logic of its changing. With this patch, it can be managed.


This patch introduces the structure 'ExtensionData' which allows to 
manage of a list of entries with a couple of interface functions 
addExtensionDataToNode() and GetExtensionData(). Keep in mind the 
possible future hiding of this structure from the public interface.
An extension should invent a symbolic key to identify its data. It may 
invent as many additional keys as it wants but the best option here - is 
no more than one entry for each extension.
Usage of this machinery is demonstrated by the pg_stat_statements 
example - here we introduced Bigint node just for natively storing of 
queryId value.


Ruthless pgbench benchmark shows that we got some overhead:
1.6% - in default mode
4% - in prepared mode
~0.1% in extended mode.

An optimization that avoids copying of queryId by storing it into the 
node pointer field directly allows to keep this overhead in a range of 
%0.5 for all these modes but increases complexity. So here we 
demonstrate not optimized variant.


Some questions still cause doubts:
- QueryRewrite: should we copy extension fields from the parent 
parsetree to the rewritten ones?
- Are we need to invent a registration procedure to do away with the 
names of entries and use some compact integer IDs?
- Do we need to optimize this structure to avoid a copy for simple data 
types, for example, inventing something like A_Const?


All in all, in our opinion, this issue is tend to grow with an 
increasing number of extensions that utilize planner and executor hooks 
for some purposes. So, any thoughts will be useful.


--
Regards
Andrey Lepikhov
Postgres ProfessionalFrom 944ce61d7ff934727240d90ee7620bfb69ad3a5a Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov 
Date: Wed, 22 Mar 2023 16:59:30 +0500
Subject: [PATCH] Add on more field into Query and PlannedStmt structures to
 allow an extension to pass some data across all the execution stages of
 specific instance of the query.

---
 .../pg_stat_statements/pg_stat_statements.c   |  57 --
 src/backend/commands/extension.c  | 103 ++
 src/backend/executor/execParallel.c   |   1 +
 src/backend/nodes/value.c |  12 ++
 src/backend/optimizer/plan/planner.c  |   1 +
 src/backend/rewrite/rewriteHandler.c  |   6 +
 src/backend/tcop/postgres.c   |   2 +
 src/include/commands/extension.h  |   4 +
 src/include/nodes/parsenodes.h|  23 
 src/include/nodes/plannodes.h |   2 +
 src/include/nodes/value.h |  10 ++
 11 files changed, 209 insertions(+), 12 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 5285c3f7fa..fc47661c86 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -49,6 +49,7 @@
 
 #include "access/parallel.h"
 #include "catalog/pg_authid.h"
+#include "commands/extension.h"
 #include "common/hashfn.h"
 #include "executor/instrument.h"
 #include "funcapi.h"
@@ -107,6 +108,14 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 	!IsA(n, PrepareStmt) && \
 	!IsA(n, DeallocateStmt))
 
+#define GET_QUERYID(node) \
+	(Bigint *) GetExtensionData(node->ext_field, "pg_stat_statements")
+
+#define INSERT_QUERYID(node, queryid) \
+	castNode(Bigint, AddExtensionDataToNode((Node *) node, \
+			"pg_stat_statements", \
+			(Node *) makeBigint((int64) queryid), \
+			true))
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -821,6 +830,13 @@ error:
 static void
 pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 {
+	Bigint *queryId;
+
+	if ((queryId = GET_QUERYID(query)) == NULL)
+		queryId = INSERT_QUERYID(query, query->queryId);
+
+	Assert(queryId);
+
 	if (prev_post_parse_analyze_hook)
 		prev_post_parse_analyze_hook(pstate, query, jstate);
 
@@ -837,7 +853,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 	{
 		if (pgss_track_utility && !PGSS_HANDLED_UTILITY(query->utilityStmt))
 		{
-			query->queryId = UINT64CONST(0);
+			queryId->val = UINT64CONST(0);
 			return;
 		}
 	}
@@ -851,7 +867,7

Re: [PoC] Reducing planning time when tables have many partitions

2023-02-14 Thread Andrey Lepikhov

On 2/6/23 06:47, Yuya Watari wrote:

Of course, I'm not sure if my approach in v16-0003 is ideal, but it
may help solve your concern above. Since simple_rel_array[0] is no
longer necessary with my patch, I removed the setup_append_rel_entry()
function in v16-0004. However, to work the patch, I needed to change
some assertions in v16-0005. For more details, please see the commit
message of v16-0005. After these works, the attached patches passed
all regression tests in my environment.

Instead of my approach, imitating the following change to
get_eclass_indexes_for_relids() is also a possible solution. Ignoring
NULL RelOptInfos enables us to avoid the segfault, but we have to
adjust EquivalenceMemberIterator to match the result, and I'm not sure
if this idea is correct.
As I see, You moved the indexes from RelOptInfo to PlannerInfo. May be 
better to move them into RangeTblEntry instead?


--
Regards
Andrey Lepikhov
Postgres Professional





Re: [PATCH] random_normal function

2023-01-18 Thread Andrey Lepikhov

On 1/19/23 11:01, Tom Lane wrote:

Andrey Lepikhov  writes:

On 1/9/23 23:52, Tom Lane wrote:

BTW, if this does bring the probability of failure down to the
one-in-a-billion range, I think we could also nuke the whole
"ignore:" business, simplifying pg_regress and allowing the
random test to be run in parallel with others.



We have used the pg_sleep() function to interrupt a query at certain
execution phase. But on some platforms, especially in containers, the
query can vary execution time in so widely that the pg_sleep() timeout,
required to get rid of dependency on a query execution time, has become
unacceptable. So, the "ignore" option was the best choice.


But does such a test have any actual value?  If your test infrastructure
ignores the result, what makes you think you'd notice if the test did
indeed detect a problem?
Yes, it is good to catch SEGFAULTs and assertions which may be frequent 
because of a logic complexity in the case of timeouts.




I think "ignore:" was a kluge we put in twenty-plus years ago when our
testing standards were a lot lower, and it's way past time we got
rid of it.
Ok, I will try to invent alternative way for deep (and stable) testing 
of timeouts. Thank you for the answer.


--
Regards
Andrey Lepikhov
Postgres Professional





Re: [PATCH] random_normal function

2023-01-18 Thread Andrey Lepikhov

On 1/9/23 23:52, Tom Lane wrote:

BTW, if this does bring the probability of failure down to the
one-in-a-billion range, I think we could also nuke the whole
"ignore:" business, simplifying pg_regress and allowing the
random test to be run in parallel with others.
With 'ignore' option we get used to cover by tests some of the time 
dependent features, such as "statement_timeout", 
"idle_in_transaction_session_timeout", usage of user timeouts in 
extensions and so on.


We have used the pg_sleep() function to interrupt a query at certain 
execution phase. But on some platforms, especially in containers, the 
query can vary execution time in so widely that the pg_sleep() timeout, 
required to get rid of dependency on a query execution time, has become 
unacceptable. So, the "ignore" option was the best choice.


For Now, Do we only have the "isolation tests" option to create stable 
execution time-dependent tests now? Or I'm not aware about some test 
machinery?


--
Regards
Andrey Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2022-12-27 Thread Andrey Lepikhov

On 12/26/15 23:04, Teodor Sigaev wrote:
I'd like to present OR-clause support for indexes. Although OR-clauses 
could be supported by bitmapOR index scan it isn't very effective and 
such scan lost any order existing in index. We (with Alexander Korotkov) 
presented results on Vienna's conference this year. In short, it 
provides performance improvement:


EXPLAIN ANALYZE
SELECT count(*) FROM tst WHERE id = 5 OR id = 500 OR id = 5000;
...
The problems on the way which I see for now:
1 Calculating cost. Right now it's just a simple transformation of costs 
computed for BitmapOr path. I'd like to hope that's possible and so 
index's estimation function could be non-touched. So, they could believe 
that all clauses are implicitly-ANDed
2 I'd like to add such support to btree but it seems that it should be a 
separated patch. Btree search algorithm doesn't use any kind of stack of 
pages and algorithm to walk over btree doesn't clear for me for now.
3 I could miss some places which still assumes  implicitly-ANDed list of 
clauses although regression tests passes fine.
I support such a cunning approach. But this specific case, you 
demonstrated above, could be optimized independently at an earlier 
stage. If to convert:


(F(A) = ConstStableExpr_1) OR (F(A) = ConstStableExpr_2)
to
F(A) IN (ConstStableExpr_1, ConstStableExpr_2)

it can be seen significant execution speedup. For example, using the 
demo.sql to estimate maximum positive effect we see about 40% of 
execution and 100% of planning speedup.


To avoid unnecessary overhead, induced by the optimization, such 
transformation may be made at the stage of planning (we have cardinality 
estimations and have pruned partitions) but before creation of a 
relation scan paths. So, we can avoid planning overhead and non-optimal 
BitmapOr in the case of many OR's possibly aggravated by many indexes on 
the relation.
For example, such operation can be executed in create_index_paths() 
before passing rel->indexlist.


--
Regards
Andrey Lepikhov
Postgres Professional


demo.sql
Description: application/sql


Re: Optimization issue of branching UNION ALL

2022-12-22 Thread Andrey Lepikhov

On 22/12/2022 06:50, Tom Lane wrote:

2. Iterative passes along the append_rel_list for replacing vars in the
translated_vars field. I can't grasp real necessity of passing all the
append_rel_list during flattening of an union all leaf subquery. No one
can reference this leaf, isn't it?


After thinking about that for awhile, I believe we can go further:
the containing_appendrel is actually the *only* part of the upper
query that needs to be adjusted.  So we could do something like
the attached.

This passes check-world, but I don't have quite enough confidence
in it to just commit it.
Thanks, I have written the letter because of some doubts too. But only 
one weak point I could imagine - if someday sql standard will be changed.

Your code looks better, than previous attempt.

--
regards,
Andrey Lepikhov
Postgres Professional





Optimization issue of branching UNION ALL

2022-12-20 Thread Andrey Lepikhov

Hi,

Client report on a corner case have shown up possible minor 
non-optimality in procedure of transformation of simple UNION ALL 
statement tree.
Complaint is about auto-generated query with 1E4 simple union all's (see 
t.sh to generate a demo script). The reason: in REL_11_STABLE it is 
planned and executed in a second, but REL_12_STABLE and beyond makes 
matters worse: planning of such a query needs tons of gigabytes of RAM.


Superficial study revealed possibly unnecessary operations that could be 
avoided:
1. Walking across a query by calling substitute_phv_relids() even if 
lastPHId shows that no one phv is presented.
2. Iterative passes along the append_rel_list for replacing vars in the 
translated_vars field. I can't grasp real necessity of passing all the 
append_rel_list during flattening of an union all leaf subquery. No one 
can reference this leaf, isn't it?


In attachment you can see some sketch that reduces a number of planner 
cycles/copyings.


--
Regards
Andrey Lepikhov
Postgres Professional

t.sh
Description: application/shellscript
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 08a73fb9d86..3739e3fe7ba 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -131,7 +131,7 @@ static bool find_dependent_phvs_in_jointree(PlannerInfo *root,
 			Node *node, int varno);
 static void substitute_phv_relids(Node *node,
   int varno, Relids subrelids);
-static void fix_append_rel_relids(List *append_rel_list, int varno,
+static void fix_append_rel_relids(PlannerInfo *root, int varno,
   Relids subrelids);
 static Node *find_jointree_node_for_rel(Node *jtnode, int relid);
 
@@ -1156,8 +1156,9 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
 		Relids		subrelids;
 
 		subrelids = get_relids_in_jointree((Node *) subquery->jointree, false);
-		substitute_phv_relids((Node *) parse, varno, subrelids);
-		fix_append_rel_relids(root->append_rel_list, varno, subrelids);
+		if (root->glob->lastPHId != 0)
+			substitute_phv_relids((Node *) parse, varno, subrelids);
+		fix_append_rel_relids(root, varno, subrelids);
 	}
 
 	/*
@@ -2064,17 +2065,25 @@ perform_pullup_replace_vars(PlannerInfo *root,
 	 * use PHVs for safety.  (This analysis could be made tighter but it seems
 	 * unlikely to be worth much trouble.)
 	 */
-	foreach(lc, root->append_rel_list)
+	if (containing_appendrel)
 	{
-		AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
-		bool		save_need_phvs = rvcontext->need_phvs;
+		bool save_need_phvs = rvcontext->need_phvs;
 
-		if (appinfo == containing_appendrel)
-			rvcontext->need_phvs = false;
-		appinfo->translated_vars = (List *)
-			pullup_replace_vars((Node *) appinfo->translated_vars, rvcontext);
+		rvcontext->need_phvs = false;
+		containing_appendrel->translated_vars = (List *)
+			pullup_replace_vars((Node *) containing_appendrel->translated_vars,
+rvcontext);
 		rvcontext->need_phvs = save_need_phvs;
 	}
+	else
+	{
+		foreach(lc, root->append_rel_list)
+		{
+			AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
+			appinfo->translated_vars = (List *)
+pullup_replace_vars((Node *) appinfo->translated_vars, rvcontext);
+		}
+	}
 
 	/*
 	 * Replace references in the joinaliasvars lists of join RTEs.
@@ -3273,7 +3282,7 @@ remove_result_refs(PlannerInfo *root, int varno, Node *newjtloc)
 		subrelids = get_relids_in_jointree(newjtloc, false);
 		Assert(!bms_is_empty(subrelids));
 		substitute_phv_relids((Node *) root->parse, varno, subrelids);
-		fix_append_rel_relids(root->append_rel_list, varno, subrelids);
+		fix_append_rel_relids(root, varno, subrelids);
 	}
 
 	/*
@@ -3492,7 +3501,7 @@ substitute_phv_relids(Node *node, int varno, Relids subrelids)
  * We assume we may modify the AppendRelInfo nodes in-place.
  */
 static void
-fix_append_rel_relids(List *append_rel_list, int varno, Relids subrelids)
+fix_append_rel_relids(PlannerInfo *root, int varno, Relids subrelids)
 {
 	ListCell   *l;
 	int			subvarno = -1;
@@ -3503,7 +3512,7 @@ fix_append_rel_relids(List *append_rel_list, int varno, Relids subrelids)
 	 * AppendRelInfo nodes refer to it.  So compute it on first use. Note that
 	 * bms_singleton_member will complain if set is not singleton.
 	 */
-	foreach(l, append_rel_list)
+	foreach(l, root->append_rel_list)
 	{
 		AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
 
@@ -3518,8 +3527,9 @@ fix_append_rel_relids(List *append_rel_list, int varno, Relids subrelids)
 		}
 
 		/* Also fix up any PHVs in its translated vars */
-		substitute_phv_relids((Node *) appinfo->translated_vars,
-			  varno, subrelids);
+		if (root->glob->lastPHId != 0)
+			substitute_phv_relids((Node *) appinfo->translated_vars,
+  varno, subrelids);
 	}
 }
 


Re: Removing unneeded self joins

2022-12-15 Thread Andrey Lepikhov

On 12/6/22 23:46, Andres Freund wrote:

This doesn't pass the main regression tests due to a plan difference.
https://cirrus-ci.com/task/5537518245380096
https://api.cirrus-ci.com/v1/artifact/task/5537518245380096/testrun/build/testrun/regress/regress/regression.diffs

diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/join.out 
/tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/join.out 2022-12-05 
19:11:52.453920838 +
+++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out 
2022-12-05 19:15:21.864183651 +
@@ -5806,7 +5806,7 @@
   Nested Loop
 Join Filter: (sj_t3.id = sj_t1.id)
 ->  Nested Loop
- Join Filter: (sj_t3.id = sj_t2.id)
+ Join Filter: (sj_t2.id = sj_t3.id)
   ->  Nested Loop Semi Join
 ->  Nested Loop
   ->  HashAggregate

This change in the test behaviour is induced by the a5fc4641
"Avoid making commutatively-duplicate clauses in EquivalenceClasses."
Nothing special, as I see. Attached patch fixes this.

--
Regards
Andrey Lepikhov
Postgres Professional
From 3e546637561bf4c6d195bc7c95b1e05263e797e2 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 5 Oct 2022 16:58:34 +0500
Subject: [PATCH] Remove self-joins.

A Self Join Removal (SJR) feature removes inner join of plain table to itself
in a query plan if can be proved that the join can be replaced with a scan.
The proof based on innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.

In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation.
3. If uniqueness of outer relation deduces from baserestrictinfo clause like
'x=const' on unique field(s), check that inner has the same clause with the
same constant(s).
4. Compare RowMarks of inner and outer relations. They must have the same
strength.

Some regression tests changed due to self-join removal logic.

[1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
 doc/src/sgml/config.sgml  |   16 +
 src/backend/optimizer/path/indxpath.c |   38 +
 src/backend/optimizer/plan/analyzejoins.c | 1046 -
 src/backend/optimizer/plan/planmain.c |5 +
 src/backend/utils/misc/guc_tables.c   |   22 +
 src/include/optimizer/paths.h |3 +
 src/include/optimizer/planmain.h  |7 +
 src/test/regress/expected/equivclass.out  |   32 +
 src/test/regress/expected/join.out|  774 +++
 src/test/regress/expected/sysviews.out|3 +-
 src/test/regress/sql/equivclass.sql   |   16 +
 src/test/regress/sql/join.sql |  340 +++
 src/tools/pgindent/typedefs.list  |2 +
 13 files changed, 2278 insertions(+), 26 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8e4145979d..2f9948d5f8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5311,6 +5311,22 @@ ANY num_sync ( 
+  enable_self_join_removal (boolean)
+  
+   enable_self_join_removal configuration parameter
+  
+  
+  
+   
+Enables or disables the query planner's optimization which analyses
+query tree and replaces self joins with semantically equivalent single
+scans. Take into consideration only plain tables.
+The default is on.
+   
+  
+ 
+
  
   enable_seqscan (boolean)
   
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 914bfd90bc..8d57c68b1f 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3494,6 +3494,21 @@ bool
 relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 			  List *restrictlist,
 			  List *exprlist, List *oprlist)
+{
+	return relation_has_unique_index_ext(root, rel, restrictlist,
+		 exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * if extra_clauses isn't NULL, return baserestrictinfo clauses which were
+ * used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+			  List *restrictlist,
+			  List *exprlist, List *oprlist,
+			  List **extra_clauses)
 {
 	ListCell   *ic;
 
@@ -3549,6 +3564,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 	{
 		IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
 		int			c;
+		List	   *exprs = NIL;
 
 		/*
 		 * If the index is not uni

Re: [PoC] Reducing planning time when tables have many partitions

2022-11-08 Thread Andrey Lepikhov

On 2/11/2022 15:27, Yuya Watari wrote:

I noticed that the previous patch does not apply to the current HEAD.
I attached the rebased version to this email.

Looking into find_em_for_rel() changes I see that you replaced
if (bms_is_subset(em->em_relids, rel->relids)
with assertion statement.
According of get_ecmember_indexes(), the em_relids field of returned 
equivalence members can contain relids, not mentioned in the relation.
I don't understand, why it works now? For example, we can sort by t1.x, 
but have an expression t1.x=t1.y*t2.z. Or I've missed something? If it 
is not a mistake, maybe to add a comment why assertion here isn't failed?


--
regards,
Andrey Lepikhov
Postgres Professional





Re: [PoC] Reducing planning time when tables have many partitions

2022-11-06 Thread Andrey Lepikhov

On 2/11/2022 15:27, Yuya Watari wrote:

Hello,

I noticed that the previous patch does not apply to the current HEAD.
I attached the rebased version to this email.

I'm still in review of your patch now. At most it seems ok, but are you 
really need both eq_sources and eq_derives lists now? As I see, 
everywhere access to these lists guides by eclass_source_indexes and 
eclass_derive_indexes correspondingly. Maybe to merge them?


--
regards,
Andrey Lepikhov
Postgres Professional





Re: explain analyze rows=%.0f

2022-11-06 Thread Andrey Lepikhov

On 22/7/2022 16:47, Amit Kapila wrote:

I feel the discussion has slightly deviated which makes it unclear
whether this patch is required or not?


After quick review I want to express my thoughts.
At first, We have been waiting for this feature for years. Often clients 
give an explain to us where we see something like:

"rows=0, loops=100".
Without verbose mode, I can't even understand whether this node produces 
any rows or not.

So, I think this feature is useful for parameterized plans mostly.
Also, printing two decimal digits or even three isn't meaningful - 
sometimes we have a plan where number of loops is about 1E6 or even 1E7, 
but number of real rows is equal 100 or 1000.

To overcome this issue, I see two options:
1. Show the exact number of tuples without division by loops (fair case 
but invasive and bad for automation tools).

2. Show rows in scientific format like X.XXEXX.
I vote for second option.

--
regards,
Andrey Lepikhov
Postgres Professional





Re: A new strategy for pull-up correlated ANY_SUBLINK

2022-11-01 Thread Andrey Lepikhov

On 2/11/2022 09:02, Andy Fan wrote:

In the past we pull-up the ANY-sublink with 2 steps, the first step is to
pull up the sublink as a subquery, and the next step is to pull up the
subquery if it is allowed.  The benefits of this method are obvious,
pulling up the subquery has more requirements, even if we can just finish
the first step, we still get huge benefits. However the bad stuff happens
if varlevelsup = 1 involves, things fail at step 1.

convert_ANY_sublink_to_join ...

     if (contain_vars_of_level((Node *) subselect, 1))
         return NULL;

In this patch we distinguish the above case and try to pull-up it within
one step if it is helpful, It looks to me that what we need to do is just
transform it to EXIST-SUBLINK.

Maybe code [1] would be useful for your purposes/tests.
We implemented flattening of correlated subqueries for simple N-J case, 
but found out that in some cases the flattening isn't obvious the best 
solution - we haven't info about cardinality/cost estimations and can do 
worse.
I guess, for more complex flattening procedure (with aggregate function 
in a targetlist of correlated subquery) situation can be even worse.

Maybe your idea has such corner cases too ?

[1] 
https://www.postgresql.org/message-id/flat/CALNJ-vTa5VgvV1NPRHnypdnbx-fhDu7vWp73EkMUbZRpNHTYQQ%40mail.gmail.com


--
regards,
Andrey Lepikhov
Postgres Professional





Re: Fast COPY FROM based on batch insert

2022-10-28 Thread Andrey Lepikhov

On 28/10/2022 16:12, Etsuro Fujita wrote:

On Thu, Oct 13, 2022 at 6:58 PM Etsuro Fujita  wrote:

I have committed the patch after tweaking comments a little bit further.


I think there is another patch that improves performance of COPY FROM
for foreign tables using COPY FROM STDIN, but if Andrey (or anyone
else) want to work on it again, I think it would be better to create a
new CF entry for it (and start a new thread for it).  So I plan to
close this in the November CF unless they think otherwise.

Anyway, thanks for the patch, Andrey!  Thanks for reviewing, Ian and Zhihong!

Thanks,

I studied performance of this code in comparison to bulk INSERTions.
This patch seems to improve speed of insertion by about 20%. Also, this 
patch is very invasive. So, I don't have any plans to work on it now.


--
regards,
Andrey Lepikhov
Postgres Professional





Re: Fast COPY FROM based on batch insert

2022-10-12 Thread Andrey Lepikhov

On 10/12/22 07:56, Etsuro Fujita wrote:

On Tue, Oct 11, 2022 at 3:06 PM Andrey Lepikhov
 wrote:

I reviewed the patch one more time. Only one question: bistate and
ri_FdwRoutine are strongly bounded. Maybe to add some assertion on
(ri_FdwRoutine XOR bistate) ? Just to prevent possible errors in future.


You mean the bistate member of CopyMultiInsertBuffer?

Yes


We do not use that member at all for foreign tables, so the patch
avoids initializing that member in CopyMultiInsertBufferInit() when
called for a foreign table.  So we have bistate = NULL for foreign
tables (and bistate != NULL for plain tables), as you mentioned above.
I think it is a good idea to add such assertions.  How about adding
them to CopyMultiInsertBufferFlush() and
CopyMultiInsertBufferCleanup() like the attached?  In the attached I
updated comments a bit further as well.

Yes, quite enough.

--
Regards
Andrey Lepikhov
Postgres Professional





Re: Fast COPY FROM based on batch insert

2022-10-11 Thread Andrey Lepikhov

On 10/7/22 11:18, Etsuro Fujita wrote:

On Tue, Sep 27, 2022 at 6:03 PM Etsuro Fujita  wrote:

I will review the patch a bit more, but I feel that it is
in good shape.


One thing I noticed is this bit added to CopyMultiInsertBufferFlush()
to run triggers on the foreign table.

+   /* Run AFTER ROW INSERT triggers */
+   if (resultRelInfo->ri_TrigDesc != NULL &&
+   (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+   {
+   Oid relid =
RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+   for (i = 0; i < inserted; i++)
+   {
+   TupleTableSlot *slot = rslots[i];
+
+   /*
+* AFTER ROW Triggers might reference the tableoid column,
+* so (re-)initialize tts_tableOid before evaluating them.
+*/
+   slot->tts_tableOid = relid;
+
+   ExecARInsertTriggers(estate, resultRelInfo,
+slot, NIL,
+cstate->transition_capture);
+   }
+   }

Since foreign tables cannot have transition tables, we have
trig_insert_new_table=false.  So I simplified the if test and added an
assertion ensuring trig_insert_new_table=false.  Attached is a new
version of the patch.  I tweaked some comments a bit as well.  I think
the patch is committable.  So I plan on committing it next week if
there are no objections.
I reviewed the patch one more time. Only one question: bistate and 
ri_FdwRoutine are strongly bounded. Maybe to add some assertion on 
(ri_FdwRoutine XOR bistate) ? Just to prevent possible errors in future.


--
Regards
Andrey Lepikhov
Postgres Professional





Re: document the need to analyze partitioned tables

2022-10-06 Thread Andrey Lepikhov

On 10/5/22 13:37, Laurenz Albe wrote:

On Mon, 2022-03-28 at 15:05 +0200, Tomas Vondra wrote:

I've pushed the last version, and backpatched it to 10 (not sure I'd
call it a bugfix, but I certainly agree with Justin it's worth
mentioning in the docs, even on older branches).


I'd like to suggest an improvement to this.  The current wording could
be read to mean that dead tuples won't get cleaned up in partitioned tables.


By the way, where are the statistics of a partitioned tables used?  The actual
tables scanned are always the partitions, and in the execution plans that
I have seen, the optimizer always used the statistics of the partitions.

For example, it is used to estimate selectivity of join clause:

CREATE TABLE test (id integer, val integer) PARTITION BY hash (id);
CREATE TABLE test_0 PARTITION OF test
  FOR VALUES WITH (modulus 2, remainder 0);
CREATE TABLE test_1 PARTITION OF test
  FOR VALUES WITH (modulus 2, remainder 1);

INSERT INTO test (SELECT q, q FROM generate_series(1,10) AS q);
VACUUM ANALYZE test;
INSERT INTO test (SELECT q, q%2 FROM generate_series(11,200) AS q);
VACUUM ANALYZE test_0,test_1;

EXPLAIN (ANALYZE, TIMING OFF, SUMMARY OFF)
SELECT * FROM test t1, test t2 WHERE t1.id = t2.val;
VACUUM ANALYZE test;
EXPLAIN (ANALYZE, TIMING OFF, SUMMARY OFF)
SELECT * FROM test t1, test t2 WHERE t1.id = t2.val;

Here without actual statistics on parent table we make wrong prediction.

--
Regards
Andrey Lepikhov
Postgres Professional





Re: Removing unneeded self joins

2022-10-05 Thread Andrey Lepikhov

New version, rebased onto current master.
Nothing special, just rebase.

--
regards,
Andrey Lepikhov
Postgres Professional
From 03aab7a2431032166c9ea5f52fbcccaf7168abec Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 5 Oct 2022 16:58:34 +0500
Subject: [PATCH] Remove self-joins.

A Self Join Removal (SJR) feature removes inner join of plain table to itself
in a query plan if can be proved that the join can be replaced with a scan.
The proof based on innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.

In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation.
3. If uniqueness of outer relation deduces from baserestrictinfo clause like
'x=const' on unique field(s), check that inner has the same clause with the
same constant(s).
4. Compare RowMarks of inner and outer relations. They must have the same
strength.

Some regression tests changed due to self-join removal logic.

[1] 
https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
 doc/src/sgml/config.sgml  |   16 +
 src/backend/optimizer/path/indxpath.c |   38 +
 src/backend/optimizer/plan/analyzejoins.c | 1046 -
 src/backend/optimizer/plan/planmain.c |5 +
 src/backend/utils/misc/guc_tables.c   |   22 +
 src/include/optimizer/paths.h |3 +
 src/include/optimizer/planmain.h  |7 +
 src/test/regress/expected/equivclass.out  |   32 +
 src/test/regress/expected/join.out|  774 +++
 src/test/regress/expected/sysviews.out|3 +-
 src/test/regress/sql/equivclass.sql   |   16 +
 src/test/regress/sql/join.sql |  340 +++
 src/tools/pgindent/typedefs.list  |2 +
 13 files changed, 2278 insertions(+), 26 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d750290f13..5ce2d4d2fa 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5290,6 +5290,22 @@ ANY num_sync ( 
+  enable_self_join_removal (boolean)
+  
+   enable_self_join_removal configuration 
parameter
+  
+  
+  
+   
+Enables or disables the query planner's optimization which analyses
+query tree and replaces self joins with semantically equivalent single
+scans. Take into consideration only plain tables.
+The default is on.
+   
+  
+ 
+
  
   enable_seqscan (boolean)
   
diff --git a/src/backend/optimizer/path/indxpath.c 
b/src/backend/optimizer/path/indxpath.c
index c31fcc917d..51f672a65c 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3513,6 +3513,21 @@ bool
 relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
  List *restrictlist,
  List *exprlist, List 
*oprlist)
+{
+   return relation_has_unique_index_ext(root, rel, restrictlist,
+   
 exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * if extra_clauses isn't NULL, return baserestrictinfo clauses which were
+ * used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List 
*oprlist,
+ List **extra_clauses)
 {
ListCell   *ic;
 
@@ -3568,6 +3583,7 @@ relation_has_unique_index_for(PlannerInfo *root, 
RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+   List   *exprs = NIL;
 
/*
 * If the index is not unique, or not immediately enforced, or 
if it's
@@ -3616,6 +3632,24 @@ relation_has_unique_index_for(PlannerInfo *root, 
RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+   if 
(bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+   {
+   MemoryContext oldMemCtx =
+   
MemoryContext

Re: [POC] Allow flattening of subquery with a link to upper query

2022-10-05 Thread Andrey Lepikhov

On 5/10/2022 02:45, Zhihong Yu wrote:

Hi,
For contain_placeholders():

+   if (IsA(node, Query))
+       return query_tree_walker((Query *) node, contain_placeholders, 
context, 0);

+   else if (IsA(node, PlaceHolderVar))

Fixed


The `else` is not needed.

For correlated_t struct, it would be better if the fields have comments.

Ok, I've added some comments.


+                    * (for grouping, as an example). So, revert its 
status to

+                    * a full valued entry.

full valued -> fully valued

Fixed

--
regards,
Andrey Lepikhov
Postgres Professional
From da570914e53bc34e6bf3291649725a041a8b929c Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 28 Sep 2022 13:42:12 +0300
Subject: [PATCH] Transform correlated subquery of type N-J [1] into ordinary
 join query. With many restrictions, check each subquery and pull up
 expressions, references upper query block. Works for operators '=' and 'IN'.
 Machinery of converting ANY subquery to JOIN stays the same with minor
 changes in walker function.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Pass a lower outer join through the pullup_sublink recursive procedure to find
out some situations when qual references outer side of an outer join.

[1] Kim, Won. “On optimizing an SQL-like nested query.” ACM Trans. Database 
Syst. 7 (1982): 443-469.
---
 src/backend/optimizer/plan/subselect.c| 328 +++-
 src/backend/optimizer/prep/prepjointree.c |  71 ++--
 src/backend/optimizer/util/tlist.c|   2 +-
 src/backend/optimizer/util/var.c  |   8 +
 src/backend/utils/misc/guc_tables.c   |  10 +
 src/include/optimizer/optimizer.h |   1 +
 src/include/optimizer/subselect.h |   3 +-
 src/include/optimizer/tlist.h |   1 +
 src/test/regress/expected/prepare.out |  18 +
 src/test/regress/expected/subselect.out   | 438 ++
 src/test/regress/sql/prepare.sql  |   7 +
 src/test/regress/sql/subselect.sql| 162 
 src/tools/pgindent/typedefs.list  |   1 +
 13 files changed, 1013 insertions(+), 37 deletions(-)

diff --git a/src/backend/optimizer/plan/subselect.c 
b/src/backend/optimizer/plan/subselect.c
index 92e3338584..be19d85524 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -32,6 +32,7 @@
 #include "optimizer/planner.h"
 #include "optimizer/prep.h"
 #include "optimizer/subselect.h"
+#include "optimizer/tlist.h"
 #include "parser/parse_relation.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/builtins.h"
@@ -65,6 +66,8 @@ typedef struct inline_cte_walker_context
 } inline_cte_walker_context;
 
 
+bool optimize_correlated_subqueries = true;
+
 static Node *build_subplan(PlannerInfo *root, Plan *plan, PlannerInfo *subroot,
   List *plan_params,
   SubLinkType subLinkType, int 
subLinkId,
@@ -1229,6 +1232,296 @@ inline_cte_walker(Node *node, inline_cte_walker_context 
*context)
return expression_tree_walker(node, inline_cte_walker, context);
 }
 
+static bool
+contain_placeholders(Node *node, inline_cte_walker_context *context)
+{
+   if (node == NULL)
+   return false;
+
+   if (IsA(node, Query))
+   return query_tree_walker((Query *) node, contain_placeholders, 
context, 0);
+   if (IsA(node, PlaceHolderVar))
+   return true;
+
+   return expression_tree_walker(node, contain_placeholders, context);
+}
+
+/*
+ * To be pullable all clauses of flattening correlated subquery should be
+ * anded and mergejoinable (XXX: really necessary?)
+ */
+static bool
+quals_is_pullable(Node *quals)
+{
+   if (!contain_vars_of_level(quals, 1))
+   return true;
+
+   if (quals && IsA(quals, OpExpr))
+   {
+   OpExpr *expr = (OpExpr *) quals;
+   Node   *leftarg;
+
+   /* Contains only one expression */
+   leftarg = linitial(expr->args);
+   if (!op_mergejoinable(expr->opno, exprType(leftarg))) /* Is it 
really necessary ? */
+   return false;
+
+   if (contain_placeholders(quals, NULL))
+   return false;
+
+   return true;
+   }
+   else if (is_andclause(quals))
+   {
+   ListCell   *l;
+
+   foreach(l, ((BoolExpr *) quals)->args)
+   {
+   Node *andarg = (Node *) lfirst(l);
+
+   if (!IsA(andarg, OpExpr))
+   return false;
+   if (!quals_is_pullable(andarg))
+   return false;
+   }
+
+   return true;
+   }
+
+   return false;
+}
+
+/*
+ * Mutator context used 

Re: [POC] Allow flattening of subquery with a link to upper query

2022-10-03 Thread Andrey Lepikhov

On 9/13/22 16:40, Andrey Lepikhov wrote:

On 5/9/2022 12:22, Richard Guo wrote:
On Fri, Sep 2, 2022 at 7:09 PM Andrey Lepikhov 
mailto:a.lepik...@postgrespro.ru>> wrote:
To resolve both issues, lower outer join passes through pull_sublinks_* 
into flattening routine (see attachment).

I've added these cases into subselect.sql

In attachment - new version of the patch, rebased onto current master.

--
Regards
Andrey Lepikhov
Postgres Professional
From 6462578f8789cb831f45ebfad65308d6afabb833 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 28 Sep 2022 13:42:12 +0300
Subject: [PATCH] Transform correlated subquery of type N-J [1] into ordinary
 join query. With many restrictions, check each subquery and pull up
 expressions, references upper query block. Works for operators '=' and 'IN'.
 Machinery of converting ANY subquery to JOIN stays the same with minor
 changes in walker function.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Pass a lower outer join through the pullup_sublink recursive procedure to find
out some situations when qual references outer side of an outer join.

[1] Kim, Won. “On optimizing an SQL-like nested query.” ACM Trans. Database Syst. 7 (1982): 443-469.
---
 src/backend/optimizer/plan/subselect.c| 320 +++-
 src/backend/optimizer/prep/prepjointree.c |  71 ++--
 src/backend/optimizer/util/tlist.c|   2 +-
 src/backend/optimizer/util/var.c  |   8 +
 src/backend/utils/misc/guc_tables.c   |  10 +
 src/include/optimizer/optimizer.h |   1 +
 src/include/optimizer/subselect.h |   3 +-
 src/include/optimizer/tlist.h |   1 +
 src/test/regress/expected/prepare.out |  18 +
 src/test/regress/expected/subselect.out   | 438 ++
 src/test/regress/sql/prepare.sql  |   7 +
 src/test/regress/sql/subselect.sql| 162 
 12 files changed, 1004 insertions(+), 37 deletions(-)

diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 92e3338584..29da43bb4d 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -32,6 +32,7 @@
 #include "optimizer/planner.h"
 #include "optimizer/prep.h"
 #include "optimizer/subselect.h"
+#include "optimizer/tlist.h"
 #include "parser/parse_relation.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/builtins.h"
@@ -65,6 +66,8 @@ typedef struct inline_cte_walker_context
 } inline_cte_walker_context;
 
 
+bool optimize_correlated_subqueries = true;
+
 static Node *build_subplan(PlannerInfo *root, Plan *plan, PlannerInfo *subroot,
 		   List *plan_params,
 		   SubLinkType subLinkType, int subLinkId,
@@ -1229,6 +1232,288 @@ inline_cte_walker(Node *node, inline_cte_walker_context *context)
 	return expression_tree_walker(node, inline_cte_walker, context);
 }
 
+static bool
+contain_placeholders(Node *node, inline_cte_walker_context *context)
+{
+	if (node == NULL)
+		return false;
+
+	if (IsA(node, Query))
+		return query_tree_walker((Query *) node, contain_placeholders, context, 0);
+	else if (IsA(node, PlaceHolderVar))
+	{
+		return true;
+	}
+
+	return expression_tree_walker(node, contain_placeholders, context);
+}
+
+/*
+ * To be pullable all clauses of flattening correlated subquery should be
+ * anded and mergejoinable (XXX: really necessary?)
+ */
+static bool
+quals_is_pullable(Node *quals)
+{
+	if (!contain_vars_of_level(quals, 1))
+		return true;
+
+	if (quals && IsA(quals, OpExpr))
+	{
+		OpExpr *expr = (OpExpr *) quals;
+		Node   *leftarg;
+
+		/* Contains only one expression */
+		leftarg = linitial(expr->args);
+		if (!op_mergejoinable(expr->opno, exprType(leftarg))) /* Is it really necessary ? */
+			return false;
+
+		if (contain_placeholders(quals, NULL))
+			return false;
+
+		return true;
+	}
+	else if (is_andclause(quals))
+	{
+		ListCell   *l;
+
+		foreach(l, ((BoolExpr *) quals)->args)
+		{
+			Node *andarg = (Node *) lfirst(l);
+
+			if (!IsA(andarg, OpExpr))
+return false;
+			if (!quals_is_pullable(andarg))
+return false;
+		}
+
+		return true;
+	}
+
+	return false;
+}
+
+typedef struct
+{
+	Query  *subquery;
+	int		newvarno;
+	List   *pulling_quals;
+	bool	varlevel_up;
+} correlated_t;
+
+static Node *
+pull_subquery_clauses_mutator(Node *node, correlated_t *ctx)
+{
+	if (node == NULL)
+		return NULL;
+
+	if (IsA(node, OpExpr) && !ctx->varlevel_up)
+	{
+		if (!contain_vars_of_level(node, 1))
+			return node;
+
+		/*
+		 * The expression contains links to upper relation. It will be pulled to
+		 * uplevel. All links into vars of upper levels must be changed.
+		 */
+
+		ctx->varlevel_up = true;
+		ctx->pulling_quals =
+			lappend(ctx->pulling_quals,
+	expression_tree_mutator(node,
+			pull_subquery_clauses_mutator,
+			(void *) ctx));
+		ctx->

Re: [POC] Allow flattening of subquery with a link to upper query

2022-09-13 Thread Andrey Lepikhov

On 5/9/2022 12:22, Richard Guo wrote:


On Fri, Sep 2, 2022 at 7:09 PM Andrey Lepikhov 
mailto:a.lepik...@postgrespro.ru>> wrote:

 > Hmm, I'm not sure this patch works correctly in all cases. It
seems to
 > me this patch pulls up the subquery without checking the constraints
 > imposed by lateral references. If its quals contain any lateral
 > references to rels outside a higher outer join, we would need to
 > postpone quals from below an outer join to above it, which is
probably
 > incorrect.

Yeah, it's not easy-to-solve problem. If I correctly understand the
code, to fix this problem we must implement the same logic, as
pull_up_subqueries (lowest_outer_join/safe_upper_varnos). 


Yeah, I think we'd have to consider the restrictions from lateral
references to guarantee correctness when we pull up subqueries. We need
to avoid the situation where quals need to be postponed past outer join.

However, even if we have taken care of that, there may be other issues
with flattening direct-correlated ANY SubLink. The constraints imposed
by LATERAL references may make it impossible for us to find any legal
join orders, as discussed in [1].
To resolve both issues, lower outer join passes through pull_sublinks_* 
into flattening routine (see attachment).

I've added these cases into subselect.sql

--
regards,
Andrey Lepikhov
Postgres Professional
From 883f9586acb5482bb88d5ca5123195c0efdd9cc7 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov 
Date: Mon, 5 Sep 2022 16:40:48 +0500
Subject: [PATCH] Pass a lower outer join through the pullup_sublink recursive
 procedure to find out some situations when qual references outer side of an
 outer join.

---
 src/backend/optimizer/plan/subselect.c| 19 --
 src/backend/optimizer/prep/prepjointree.c | 71 ++-
 src/include/optimizer/subselect.h |  3 +-
 src/test/regress/expected/subselect.out   | 52 +
 src/test/regress/sql/subselect.sql| 18 ++
 5 files changed, 131 insertions(+), 32 deletions(-)

diff --git a/src/backend/optimizer/plan/subselect.c 
b/src/backend/optimizer/plan/subselect.c
index 0e7ddc4a4e..29da43bb4d 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1442,7 +1442,8 @@ pull_subquery_clauses_mutator(Node *node, correlated_t 
*ctx)
 }
 
 static List *
-pull_correlated_clauses(PlannerInfo *root, SubLink *sublink)
+pull_correlated_clauses(PlannerInfo *root, SubLink *sublink,
+   JoinExpr *lowest_outer_join)
 {
Query  *parse = root->parse;
Query  *subselect = (Query *) sublink->subselect;
@@ -1451,6 +1452,7 @@ pull_correlated_clauses(PlannerInfo *root, SubLink 
*sublink)
   .newvarno = 
list_length(parse->rtable) + 1, /* Looks like a hack */
   .pulling_quals = NIL,
   .varlevel_up = false};
+   Relids  safe_upper_varnos = NULL;
 
Assert(IsA(subselect, Query));
 
@@ -1489,7 +1491,16 @@ pull_correlated_clauses(PlannerInfo *root, SubLink 
*sublink)
f = subselect->jointree;
 
if (!f || !f->quals || !quals_is_pullable(f->quals))
-   /* Degenerate case */
+   return NULL;
+
+   if (lowest_outer_join)
+   safe_upper_varnos = get_relids_in_jointree(
+   (lowest_outer_join->jointype == JOIN_RIGHT) ?
+   lowest_outer_join->larg : 
lowest_outer_join->rarg, true);
+
+   if (safe_upper_varnos &&
+   !bms_is_subset(pull_varnos_of_level(root, f->quals, 1),
+  safe_upper_varnos))
return NULL;
 
/*
@@ -1540,7 +1551,7 @@ pull_correlated_clauses(PlannerInfo *root, SubLink 
*sublink)
  */
 JoinExpr *
 convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
-   Relids available_rels)
+   Relids available_rels, 
JoinExpr *lowest_outer_join)
 {
JoinExpr   *result;
Query  *parse = root->parse;
@@ -1586,7 +1597,7 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink 
*sublink,
 * pulled_clauses list, their places in the quals replaces by "true" 
value.
 */
if (contain_vars_of_level((Node *) subselect, 1) &&
-   (pclauses = pull_correlated_clauses(root, sublink)) == NIL)
+   (pclauses = pull_correlated_clauses(root, sublink, 
lowest_outer_join)) == NIL)
return NULL;
 
/* Create a dummy ParseState for addRangeTableEntryForSubquery */
diff --git a/src/backend/optimizer/prep/prepjointree.c 
b/src/backend/optimizer/

Re: [POC] Allow flattening of subquery with a link to upper query

2022-09-05 Thread Andrey Lepikhov

On 9/5/22 12:22, Richard Guo wrote:


On Fri, Sep 2, 2022 at 7:09 PM Andrey Lepikhov 
Yeah, it's not easy-to-solve problem. If I correctly understand the

code, to fix this problem we must implement the same logic, as
pull_up_subqueries (lowest_outer_join/safe_upper_varnos). 


Yeah, I think we'd have to consider the restrictions from lateral
references to guarantee correctness when we pull up subqueries. We need
to avoid the situation where quals need to be postponed past outer join.

However, even if we have taken care of that, there may be other issues
with flattening direct-correlated ANY SubLink. The constraints imposed
by LATERAL references may make it impossible for us to find any legal
join orders, as discussed in [1].

[1] 
https://www.postgresql.org/message-id/CAMbWs49cvkF9akbomz_fCCKS=D5TY=4kgheqcfhpzcxs1gv...@mail.gmail.com <https://www.postgresql.org/message-id/CAMbWs49cvkF9akbomz_fCCKS=D5TY=4kgheqcfhpzcxs1gv...@mail.gmail.com>


The problem you mentioned under this link is about ineffective query 
plan - as I understand it.
This is a problem, especially if we would think about more complex 
pull-ups of subqueries - with aggregate functions in the target list.
I think about that problem as about next step - we already have an 
example - machinery of alternative plans. This problem may be solved in 
this way, or by a GUC, as usual.


--
Regards
Andrey Lepikhov
Postgres Professional





Re: [POC] Allow flattening of subquery with a link to upper query

2022-09-02 Thread Andrey Lepikhov

On 9/1/22 17:24, Richard Guo wrote:


On Wed, Aug 31, 2022 at 2:35 PM Andrey Lepikhov 
mailto:a.lepik...@postgrespro.ru>> wrote:


Before flattening procedure we just look through the quals of subquery,
pull to the upper level OpExpr's containing variables from the upper
relation and replace their positions in the quals with true expression.
Further, the flattening machinery works as usual.

Hmm, I'm not sure this patch works correctly in all cases. It seems to
me this patch pulls up the subquery without checking the constraints
imposed by lateral references. If its quals contain any lateral
references to rels outside a higher outer join, we would need to
postpone quals from below an outer join to above it, which is probably
incorrect.Yeah, it's not easy-to-solve problem. If I correctly understand the 
code, to fix this problem we must implement the same logic, as 
pull_up_subqueries (lowest_outer_join/safe_upper_varnos). It looks ugly. 
But, more important, does this feature deserve such efforts/changes?


--
Regards
Andrey Lepikhov
Postgres Professional





Re: [HACKERS] PoC: custom signal handler for extensions

2022-09-01 Thread Andrey Lepikhov

On 1/12/18 20:51, Teodor Sigaev wrote:
In perspective, this mechanism provides the low-level instrument to 
define remote procedure call on extension side. The simple RPC that 
defines effective userid on remote backend (remote_effective_user 
function) is attached for example.
7) Suppose, API allows to use different handlers in different processes 
for the same reason, it's could be reason of confusion. I suggest to 
restrict Register/Unregister call only for shared_preload_library, ie 
only during startup.

+1


8) I'd like to see an example of usage this API somewhere in contrib in 
exsting modules. Ideas?
I imagine, auto_explain could demonstrate usage of the API by 
implementing logging of current query state, triggered by a signal. Most 
of necessary code is already existed there.

Of course, this option will be enabled if auto_explain loads on startup.
But, maybe it breaks main concept of the auto_explain extension?

--
Regards
Andrey Lepikhov
Postgres Professional





[POC] Allow flattening of subquery with a link to upper query

2022-08-31 Thread Andrey Lepikhov

Hi,

One of the most annoying things in the planner for me is unnesting of 
correlated queries [1]. A number papers on this subject were published 
starting 1980s, but only trivial optimizations exists in the Core. It 
means a lack of performance, especially when we use foreign tables in 
subquery.

In the patch I'm trying to propose a sort of sketch of solution.

Before flattening procedure we just look through the quals of subquery, 
pull to the upper level OpExpr's containing variables from the upper 
relation and replace their positions in the quals with true expression.

Further, the flattening machinery works as usual.

This patch is dedicated to simplest variant of correlated queries - 
without aggregate functions in the target list. It passes regression 
tests and contains some additional tests to demonstrate achievements.


I'd like to get critics on the approach.

[1]  Kim, Won. “On optimizing an SQL-like nested query.” ACM Trans. 
Database Syst. 7 (1982): 443-469.


--
Regards
Andrey Lepikhov
Postgres ProfessionalFrom 3f4247b23175388f8c6ee43740fb641d97e39d0b Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Tue, 24 May 2022 15:59:02 +0500
Subject: [PATCH] Transform correlated subquery of type N-J [1] into ordinary
 join query. With many restrictions, check each subquery and pull up
 expressions, references upper query block. Works for operators '=' and 'IN'.
 Machinery of converting ANY subquery to JOIN stays the same with minor
 changes in walker function.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

[1] Kim, Won. “On optimizing an SQL-like nested query.” ACM Trans. Database Syst. 7 (1982): 443-469.
---
 src/backend/optimizer/plan/subselect.c  | 307 +++-
 src/backend/optimizer/util/tlist.c  |   2 +-
 src/backend/optimizer/util/var.c|   8 +
 src/backend/utils/misc/guc.c|  10 +
 src/include/optimizer/optimizer.h   |   1 +
 src/include/optimizer/tlist.h   |   1 +
 src/test/regress/expected/prepare.out   |  18 ++
 src/test/regress/expected/subselect.out | 369 
 src/test/regress/sql/prepare.sql|   7 +
 src/test/regress/sql/subselect.sql  | 140 +
 10 files changed, 855 insertions(+), 8 deletions(-)

diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 92e3338584..0e7ddc4a4e 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -32,6 +32,7 @@
 #include "optimizer/planner.h"
 #include "optimizer/prep.h"
 #include "optimizer/subselect.h"
+#include "optimizer/tlist.h"
 #include "parser/parse_relation.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/builtins.h"
@@ -65,6 +66,8 @@ typedef struct inline_cte_walker_context
 } inline_cte_walker_context;
 
 
+bool optimize_correlated_subqueries = true;
+
 static Node *build_subplan(PlannerInfo *root, Plan *plan, PlannerInfo *subroot,
 		   List *plan_params,
 		   SubLinkType subLinkType, int subLinkId,
@@ -1229,6 +1232,277 @@ inline_cte_walker(Node *node, inline_cte_walker_context *context)
 	return expression_tree_walker(node, inline_cte_walker, context);
 }
 
+static bool
+contain_placeholders(Node *node, inline_cte_walker_context *context)
+{
+	if (node == NULL)
+		return false;
+
+	if (IsA(node, Query))
+		return query_tree_walker((Query *) node, contain_placeholders, context, 0);
+	else if (IsA(node, PlaceHolderVar))
+	{
+		return true;
+	}
+
+	return expression_tree_walker(node, contain_placeholders, context);
+}
+
+/*
+ * To be pullable all clauses of flattening correlated subquery should be
+ * anded and mergejoinable (XXX: really necessary?)
+ */
+static bool
+quals_is_pullable(Node *quals)
+{
+	if (!contain_vars_of_level(quals, 1))
+		return true;
+
+	if (quals && IsA(quals, OpExpr))
+	{
+		OpExpr *expr = (OpExpr *) quals;
+		Node   *leftarg;
+
+		/* Contains only one expression */
+		leftarg = linitial(expr->args);
+		if (!op_mergejoinable(expr->opno, exprType(leftarg))) /* Is it really necessary ? */
+			return false;
+
+		if (contain_placeholders(quals, NULL))
+			return false;
+
+		return true;
+	}
+	else if (is_andclause(quals))
+	{
+		ListCell   *l;
+
+		foreach(l, ((BoolExpr *) quals)->args)
+		{
+			Node *andarg = (Node *) lfirst(l);
+
+			if (!IsA(andarg, OpExpr))
+return false;
+			if (!quals_is_pullable(andarg))
+return false;
+		}
+
+		return true;
+	}
+
+	return false;
+}
+
+typedef struct
+{
+	Query  *subquery;
+	int		newvarno;
+	List   *pulling_quals;
+	bool	varlevel_up;
+} correlated_t;
+
+static Node *
+pull_subquery_clauses_mutator(Node *node, correlated_t *ctx)
+{
+	if (node == NULL)
+		return NULL;
+
+	if (IsA(node, OpExpr) && !ctx->varlevel_up)
+	{
+		if (!contain_vars_of_level(node, 1))
+			return node;
+
+		/*
+		 * The expression contains link

Re: Removing unneeded self joins

2022-08-28 Thread Andrey Lepikhov

On 8/29/22 04:39, Zhihong Yu wrote:



On Fri, Aug 26, 2022 at 3:02 PM Zhihong Yu <mailto:z...@yugabyte.com>> wrote:


Hi,
For v36-0001-Remove-self-joins.patch :

bq removes inner join of plane table to itself

plane table -> plain table

For relation_has_unique_index_ext(), it seems when extra_clauses
is NULL, there is no need to compute `exprs`.

Cheers

Done



For remove_self_joins_recurse():

+                   if (bms_num_members(relids) > join_collapse_limit)
+                       break;

The above line just comes out of the switch statement. This check should 
be done again between foreach and switch.

Otherwise the above check wouldn't achieve what you want.

Cheers

Thanks for highlighting the problem.
I guess, usage either of join_collapse_limit or from_collapse_limit 
isn't practical here.
That we really afraid here - many senseless search cycles of self-joins. 
And it may have higher limit than GUCs above. So I introduced a guc, 
called "self_join_search_limit" (so far undocumented) that is an 
explicit limit for a set of plain relations in FROM-list to search 
self-joins.


--
Regards
Andrey Lepikhov
Postgres ProfessionalFrom 6283d6e21214e34d3c1a6351fa9f6ac1aeb75ce8 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Fri, 26 Aug 2022 15:17:53 +0300
Subject: [PATCH] Remove self-joins.

A Self Join Removal (SJR) feature removes inner join of plain table to itself
in a query plan if can be proved that the join can be replaced with a scan.
The proof based on innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.

In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation.
3. If uniqueness of outer relation deduces from baserestrictinfo clause like
'x=const' on unique field(s), check that inner has the same clause with the
same constant(s).
4. Compare RowMarks of inner and outer relations. They must have the same
strength.

Some regression tests changed due to self-join removal logic.

[1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
 doc/src/sgml/config.sgml  |   16 +
 src/backend/optimizer/path/indxpath.c |   38 +
 src/backend/optimizer/plan/analyzejoins.c | 1046 -
 src/backend/optimizer/plan/planmain.c |5 +
 src/backend/utils/misc/guc.c  |   22 +
 src/include/optimizer/paths.h |3 +
 src/include/optimizer/planmain.h  |7 +
 src/test/regress/expected/equivclass.out  |   32 +
 src/test/regress/expected/join.out|  774 +++
 src/test/regress/expected/sysviews.out|3 +-
 src/test/regress/sql/equivclass.sql   |   16 +
 src/test/regress/sql/join.sql |  340 +++
 src/tools/pgindent/typedefs.list  |2 +
 13 files changed, 2278 insertions(+), 26 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a5cd4e44c7..2cfb62f97f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5297,6 +5297,22 @@ ANY num_sync ( 
+  enable_self_join_removal (boolean)
+  
+   enable_self_join_removal configuration parameter
+  
+  
+  
+   
+Enables or disables the query planner's optimization which analyses
+query tree and replaces self joins with semantically equivalent single
+scans. Take into consideration only plain tables.
+The default is on.
+   
+  
+ 
+
  
   enable_seqscan (boolean)
   
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 045ff2e487..c41e7256be 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3495,6 +3495,21 @@ bool
 relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 			  List *restrictlist,
 			  List *exprlist, List *oprlist)
+{
+	return relation_has_unique_index_ext(root, rel, restrictlist,
+		 exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * if extra_clauses isn't NULL, return baserestrictinfo clauses which were
+ * used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+			  List *restrictlist,
+			  List *exprlist, List *oprlist,
+			  List **extra_clauses)
 {
 	ListCell   *ic;
 
@@ -3550,6 +3565,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 	{
 		IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
 		int			c;
+		List	   *exprs = NIL;
 
 		

Re: Removing unneeded self joins

2022-08-26 Thread Andrey Lepikhov

On 30/6/2022 17:11, Andrey Lepikhov wrote:

On 19/5/2022 16:47, Ronan Dunklau wrote:

I'll take a look at that one.

New version of the patch, rebased on current master:
1. pgindent over the patch have passed.
2. number of changed files is reduced.
3. Some documentation and comments is added.

New version rebased on new master, minor changes and tests added.

--
regards,
Andrey Lepikhov
Postgres ProfessionalFrom 8d864515da68728ddee10d455f8bdb64d34277aa Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Fri, 26 Aug 2022 15:17:53 +0300
Subject: [PATCH] Remove self-joins.

A Self Join Removal (SJR) feature removes inner join of plane table to itself
in a query plan if can be proved that the join can be replaced with a scan.
The proof based on innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.

In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation.
3. If uniqueness of outer relation deduces from baserestrictinfo clause like
'x=const' on unique field(s), check that inner has the same clause with the
same constant(s).
4. Compare RowMarks of inner and outer relations. They must have the same
strength.

Some regression tests changed due to self-join removal logic.

[1] 
https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
 doc/src/sgml/config.sgml  |   16 +
 src/backend/optimizer/path/indxpath.c |   39 +
 src/backend/optimizer/plan/analyzejoins.c | 1045 -
 src/backend/optimizer/plan/planmain.c |5 +
 src/backend/utils/misc/guc.c  |   10 +
 src/include/optimizer/paths.h |3 +
 src/include/optimizer/planmain.h  |6 +
 src/test/regress/expected/equivclass.out  |   32 +
 src/test/regress/expected/join.out|  735 +++
 src/test/regress/expected/sysviews.out|3 +-
 src/test/regress/sql/equivclass.sql   |   16 +
 src/test/regress/sql/join.sql |  329 +++
 src/tools/pgindent/typedefs.list  |2 +
 13 files changed, 2215 insertions(+), 26 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a5cd4e44c7..2cfb62f97f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5297,6 +5297,22 @@ ANY num_sync ( 
+  enable_self_join_removal (boolean)
+  
+   enable_self_join_removal configuration 
parameter
+  
+  
+  
+   
+Enables or disables the query planner's optimization which analyses
+query tree and replaces self joins with semantically equivalent single
+scans. Take into consideration only plain tables.
+The default is on.
+   
+  
+ 
+
  
   enable_seqscan (boolean)
   
diff --git a/src/backend/optimizer/path/indxpath.c 
b/src/backend/optimizer/path/indxpath.c
index 045ff2e487..a9f8f89312 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3495,8 +3495,24 @@ bool
 relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
  List *restrictlist,
  List *exprlist, List 
*oprlist)
+{
+   return relation_has_unique_index_ext(root, rel, restrictlist,
+   
 exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * if extra_clauses isn't NULL, return baserestrictinfo clauses which were
+ * used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List 
*oprlist,
+ List **extra_clauses)
 {
ListCell   *ic;
+   List   *exprs;
 
Assert(list_length(exprlist) == list_length(oprlist));
 
@@ -3551,6 +3567,8 @@ relation_has_unique_index_for(PlannerInfo *root, 
RelOptInfo *rel,
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
 
+   exprs = NIL;
+
/*
 * If the index is not unique, or not immediately enforced, or 
if it's
 * a partial index that doesn't match the query, it's useless 
here.
@@ -3598,6 +3616,23 @@ relation_has_unique_index_for(PlannerInfo *root, 
RelOptInfo *rel,
if (match_index_to_ope

Re: Fast COPY FROM based on batch insert

2022-08-22 Thread Andrey Lepikhov

On 22/8/2022 11:44, Etsuro Fujita wrote:

I think the latter is more consistent with the existing error context
information when in CopyMultiInsertBufferFlush().  Actually, I thought
this too, and I think this would be useful when the COPY FROM command
is executed on a foreign table.  My concern, however, is the case when
the command is executed on a partitioned table containing foreign
partitions; in that case the input data would not always be sorted in
the partition order, so the range for an error-occurring foreign
partition might contain many lines with rows from other partitions,
which I think makes the range information less useful.  Maybe I'm too
worried about that, though.
I got your point. Indeed, perharps such info doesn't really needed to be 
included into the core, at least for now.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Fast COPY FROM based on batch insert

2022-08-14 Thread Andrey Lepikhov

On 8/9/22 16:44, Etsuro Fujita wrote:

-1 foo
1 bar
\.

ERROR:  new row for relation "t1" violates check constraint "t1_f1positive"
DETAIL:  Failing row contains (-1, foo).
CONTEXT:  remote SQL command: INSERT INTO public.t1(f1, f2) VALUES
($1, $2), ($3, $4)
COPY ft1, line 3

In single-insert mode the error context information is correct, but in
batch-insert mode it isn’t (i.e., the line number isn’t correct).

The error occurs on the remote side, so I'm not sure if there is a
simple fix.  What I came up with is to just suppress error context
information other than the relation name, like the attached.  What do
you think about that?
I've spent many efforts to this problem too. Your solution have a 
rationale and looks fine.
I only think, we should add a bit of info into an error report to 
simplify comprehension why don't point specific line here. For example:

'COPY %s (buffered)'
or
'COPY FOREIGN TABLE %s'

or, if instead of relname_only field to save a MultiInsertBuffer 
pointer, we might add min/max linenos into the report:

'COPY %s, line between %llu and %llu'

--
Regards
Andrey Lepikhov
Postgres Professional




Re: Fast COPY FROM based on batch insert

2022-07-26 Thread Andrey Lepikhov

On 7/22/22 13:14, Etsuro Fujita wrote:

On Fri, Jul 22, 2022 at 3:39 PM Andrey Lepikhov
 wrote:

Analyzing multi-level heterogeneous partitioned configurations I
realized, that single write into a partition with a trigger will flush
buffers for all other partitions of the parent table even if the parent
haven't any triggers.
It relates to the code:
else if (insertMethod == CIM_MULTI_CONDITIONAL &&
!CopyMultiInsertInfoIsEmpty())
{
/*
 * Flush pending inserts if this partition can't use
 * batching, so rows are visible to triggers etc.
 */
CopyMultiInsertInfoFlush(, resultRelInfo);
}

Why such cascade flush is really necessary, especially for BEFORE and
INSTEAD OF triggers?


BEFORE triggers on the chosen partition might query the parent table,
not just the partition, so I think we need to do this so that such
triggers can see all the rows that have been inserted into the parent
table until then.

if you'll excuse me, I will add one more argument.
It wasn't clear, so I've made an experiment: result of a SELECT in an 
INSERT trigger function shows only data, existed in the parent table 
before the start of COPY.
So, we haven't tools to access newly inserting rows in neighboring 
partition and don't need to flush tuple buffers immediately.

Where am I wrong?

--
Regards
Andrey Lepikhov
Postgres Professional




Re: Fast COPY FROM based on batch insert

2022-07-22 Thread Andrey Lepikhov

On 7/22/22 13:14, Etsuro Fujita wrote:

On Fri, Jul 22, 2022 at 3:39 PM Andrey Lepikhov

Why such cascade flush is really necessary, especially for BEFORE and
INSTEAD OF triggers?


BEFORE triggers on the chosen partition might query the parent table,
not just the partition, so I think we need to do this so that such
triggers can see all the rows that have been inserted into the parent
table until then.
Thanks for the explanation of your point of view. So, maybe switch 
status of this patch to 'Ready for committer'?


--
Regards
Andrey Lepikhov
Postgres Professional




Re: Fast COPY FROM based on batch insert

2022-07-22 Thread Andrey Lepikhov

On 7/20/22 13:10, Etsuro Fujita wrote:

On Tue, Jul 19, 2022 at 6:35 PM Andrey Lepikhov
 wrote:

On 18/7/2022 13:22, Etsuro Fujita wrote:

I rewrote the decision logic to something much simpler and much less
invasive, which reduces the patch size significantly.  Attached is an
updated patch.  What do you think about that?



maybe you forgot this code:
/*
   * If a partition's root parent isn't allowed to use it, neither is the
   * partition.
*/
if (rootResultRelInfo->ri_usesMultiInsert)
 leaf_part_rri->ri_usesMultiInsert =
 ExecMultiInsertAllowed(leaf_part_rri);


I think the patch accounts for that.  Consider this bit to determine
whether to use batching for the partition chosen by
ExecFindPartition():

Agreed.

Analyzing multi-level heterogeneous partitioned configurations I 
realized, that single write into a partition with a trigger will flush 
buffers for all other partitions of the parent table even if the parent 
haven't any triggers.

It relates to the code:
else if (insertMethod == CIM_MULTI_CONDITIONAL &&
  !CopyMultiInsertInfoIsEmpty())
{
  /*
   * Flush pending inserts if this partition can't use
   * batching, so rows are visible to triggers etc.
   */
  CopyMultiInsertInfoFlush(, resultRelInfo);
}

Why such cascade flush is really necessary, especially for BEFORE and 
INSTEAD OF triggers? AFTER Trigger should see all rows of the table, but 
if it isn't exists for parent, I think, we wouldn't obligate to 
guarantee order of COPY into two different tables.


--
Regards
Andrey Lepikhov
Postgres Professional




Re: [PoC] Reducing planning time when tables have many partitions

2022-07-21 Thread Andrey Lepikhov

On 7/5/22 13:57, Yuya Watari wrote:

On Mon, Jul 4, 2022 at 6:28 AM Tom Lane  wrote:

Perhaps the bms_is_subset class could be handled in a similar
way, ie do a one-time pass to make a List of all EquivalenceMembers
that use a RelOptInfo.


Thank you for giving your idea. I will try to polish up my algorithm
based on your suggestion.
This work has significant interest for highly partitioned 
configurations. Are you still working on this patch? According to the 
current state of the thread, changing the status to 'Waiting on author' 
may be better until the next version.

Feel free to reverse the status if you need more feedback.

--
Regards
Andrey Lepikhov
Postgres Professional




Re: Fast COPY FROM based on batch insert

2022-07-19 Thread Andrey Lepikhov

On 18/7/2022 13:22, Etsuro Fujita wrote:

On Thu, Mar 24, 2022 at 3:43 PM Andrey V. Lepikhov
 wrote:

On 3/22/22 06:54, Etsuro Fujita wrote:

* To allow foreign multi insert, the patch made an invasive change to
the existing logic to determine whether to use multi insert for the
target relation, adding a new member ri_usesMultiInsert to the
ResultRelInfo struct, as well as introducing a new function
ExecMultiInsertAllowed().  But I’m not sure we really need such a
change.  Isn’t it reasonable to *adjust* the existing logic to allow
foreign multi insert when possible?

Of course, such approach would look much better, if we implemented it.



I'll ponder how to do it.


I rewrote the decision logic to something much simpler and much less
invasive, which reduces the patch size significantly.  Attached is an
updated patch.  What do you think about that?

While working on the patch, I fixed a few issues as well:

+   if (resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize != NULL)
+   resultRelInfo->ri_BatchSize =
+
resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);

When determining the batch size, I think we should check if the
ExecForeignBatchInsert callback routine is also defined, like other
places such as execPartition.c.  For consistency I fixed this by
copying-and-pasting the code from that file.

+* Also, the COPY command requires a non-zero input list of attributes.
+* Therefore, the length of the attribute list is checked here.
+*/
+   if (!cstate->volatile_defexprs &&
+   list_length(cstate->attnumlist) > 0 &&
+   !contain_volatile_functions(cstate->whereClause))
+   target_resultRelInfo->ri_usesMultiInsert =
+   ExecMultiInsertAllowed(target_resultRelInfo);

I think “list_length(cstate->attnumlist) > 0” in the if-test would
break COPY FROM; it currently supports multi-inserting into *plain*
tables even in the case where they have no columns, but this would
disable the multi-insertion support in that case.  postgres_fdw would
not be able to batch into zero-column foreign tables due to the INSERT
syntax limitation (i.e., the syntax does not allow inserting multiple
empty rows into a zero-column table in a single INSERT statement).
Which is the reason why this was added to the if-test?  But I think
some other FDWs might be able to, so I think we should let the FDW
decide whether to allow batching even in that case, when called from
GetForeignModifyBatchSize.  So I removed the attnumlist test from the
patch, and modified postgresGetForeignModifyBatchSize as such.  I
might miss something, though.

Thanks a lot,
maybe you forgot this code:
/*
 * If a partition's root parent isn't allowed to use it, neither is the
 * partition.
*/
if (rootResultRelInfo->ri_usesMultiInsert)
leaf_part_rri->ri_usesMultiInsert =
ExecMultiInsertAllowed(leaf_part_rri);

Also, maybe to describe in documentation, if the value of batch_size is 
more than 1, the ExecForeignBatchInsert routine have a chance to be called?


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Postgres picks suboptimal index after building of an extended statistics

2022-07-11 Thread Andrey Lepikhov

On 7/8/22 03:07, Tom Lane wrote:

Andrey Lepikhov  writes:

On 12/8/21 04:26, Tomas Vondra wrote:

I wonder if we should teach clauselist_selectivity about UNIQUE indexes,
and improve the cardinality estimates directly, not just costing for
index scans.



I tried to implement this in different ways. But it causes additional
overhead and code complexity - analyzing a list of indexes and match
clauses of each index with input clauses in each selectivity estimation.
I don't like that way and propose a new patch in attachment.


I looked at this briefly.  I do not think that messing with
btcostestimate/genericcostestimate is the right response at all.
The problem can be demonstrated with no index whatever, as in the
attached shortened version of the original example.  I get


I partly agree with you. Yes, I see the problem too. But also we have a 
problem that I described above: optimizer don't choose a path with 
minimal selectivity from a set selectivities which shows cardinality 
less than 1 (see badestimate2.sql).

New patch (see in attachment), fixes this problem.

--
Regards
Andrey Lepikhov
Postgres ProfessionalFrom 463c085852be921199b7a6d0987378d55145e41c Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov 
Date: Mon, 11 Jul 2022 12:25:43 +0500
Subject: [PATCH] Use Index path with the best selectivity estimation

---
 src/backend/optimizer/util/pathnode.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 483c4f4137..a8d7f22343 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -207,6 +207,20 @@ compare_path_costs_fuzzily(Path *path1, Path *path2, double fuzz_factor)
 		/* ... but path2 fuzzily worse on startup, so path1 wins */
 		return COSTS_BETTER1;
 	}
+	if (IsA(path1, IndexPath) && IsA(path2, IndexPath))
+	{
+		/*
+		 * Couldn't differ value of the paths. Last chance - if these paths
+		 * are index paths - use the path with the lower selectivity value.
+		 */
+		if (((IndexPath *) path1)->indexselectivity <
+			((IndexPath *) path2)->indexselectivity)
+			return COSTS_BETTER1;
+
+		if (((IndexPath *) path1)->indexselectivity >
+			((IndexPath *) path2)->indexselectivity)
+			return COSTS_BETTER2;
+	}
 	/* fuzzily the same on both costs */
 	return COSTS_EQUAL;
 
-- 
2.34.1



badestimate2.sql
Description: application/sql


Re: Fast COPY FROM based on batch insert

2022-07-10 Thread Andrey Lepikhov

On 11/7/2022 04:12, Ian Barwick wrote:

On 09/07/2022 00:09, Andrey Lepikhov wrote:

On 8/7/2022 05:12, Ian Barwick wrote:
 ERROR:  bind message supplies 0 parameters, but prepared 
statement "pgsql_fdw_prep_178" requires 6
 CONTEXT:  remote SQL command: INSERT INTO public.foo_part_1(t, 
v1, v2, v3, v4, v5) VALUES ($1, $2, $3, $4, $5, $6)

 COPY foo, line 88160
Thanks, I got it. MultiInsertBuffer are created on the first non-zero 
flush of tuples into the partition and isn't deleted from the buffers 
list until the end of COPY. And on a subsequent flush in the case of 
empty buffer we catch the error.
Your fix is correct, but I want to propose slightly different change 
(see in attachment).


LGTM.

New version (with aforementioned changes) is attached.

--
regards,
Andrey Lepikhov
Postgres ProfessionalFrom 976560f2ad406adba1aaf58a188b44302855ee12 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Fri, 4 Jun 2021 13:21:43 +0500
Subject: [PATCH] Implementation of a Bulk COPY FROM operation into foreign
 table.

---
 .../postgres_fdw/expected/postgres_fdw.out|  45 +++-
 contrib/postgres_fdw/sql/postgres_fdw.sql |  47 
 src/backend/commands/copyfrom.c   | 211 --
 src/backend/executor/execMain.c   |  45 
 src/backend/executor/execPartition.c  |   8 +
 src/include/commands/copyfrom_internal.h  |  10 -
 src/include/executor/executor.h   |   1 +
 src/include/nodes/execnodes.h |   5 +-
 8 files changed, 238 insertions(+), 134 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 44457f930c..aced9a6428 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8435,6 +8435,7 @@ drop table loct2;
 -- ===
 -- test COPY FROM
 -- ===
+alter server loopback options (add batch_size '2');
 create table loc2 (f1 int, f2 text);
 alter table loc2 set (autovacuum_enabled = 'false');
 create foreign table rem2 (f1 int, f2 text) server loopback options(table_name 
'loc2');
@@ -8457,7 +8458,7 @@ copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
 CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1 xyzzy"
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 +-
@@ -8468,6 +8469,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+   server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+   server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+   begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+   for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
for each statement execute procedure trigger_func();
@@ -8576,6 +8590,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2), 
($3, $4)
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
++
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
@@ -8592,6 +8634,7 @@ select * from rem3;
 
 drop foreign table rem3;
 drop table loc3;
+alter server loopback options (drop batch_size);
 -- ===
 -- test for TRUNCATE
 -- ===
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql 
b/contrib/postgres_fdw/

Re: Fast COPY FROM based on batch insert

2022-07-08 Thread Andrey Lepikhov

On 8/7/2022 05:12, Ian Barwick wrote:
     ERROR:  bind message supplies 0 parameters, but prepared statement 
"pgsql_fdw_prep_178" requires 6
     CONTEXT:  remote SQL command: INSERT INTO public.foo_part_1(t, v1, 
v2, v3, v4, v5) VALUES ($1, $2, $3, $4, $5, $6)

     COPY foo, line 88160
Thanks, I got it. MultiInsertBuffer are created on the first non-zero 
flush of tuples into the partition and isn't deleted from the buffers 
list until the end of COPY. And on a subsequent flush in the case of 
empty buffer we catch the error.
Your fix is correct, but I want to propose slightly different change 
(see in attachment).


--
regards,
Andrey Lepikhov
Postgres Professionaldiff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 245a260982..203289f7f2 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -329,7 +329,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
   
resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize != NULL);
 
/* Flush into foreign table or partition */
-   do {
+   while(sent < nused)
+   {
int size = (resultRelInfo->ri_BatchSize < nused - sent) 
?
resultRelInfo->ri_BatchSize : 
(nused - sent);
int inserted = size;
@@ -340,7 +341,7 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,

 NULL,

 );
sent += size;
-   } while (sent < nused);
+   }
}
else
{

Re: Fast COPY FROM based on batch insert

2022-07-07 Thread Andrey Lepikhov

On 7/7/2022 06:14, Ian Barwick wrote:

2022年3月24日(木) 15:44 Andrey V. Lepikhov :
 >
 > On 3/22/22 06:54, Etsuro Fujita wrote:
 > > On Fri, Jun 4, 2021 at 5:26 PM Andrey Lepikhov
 > >  wrote:
 > >> We still have slow 'COPY FROM' operation for foreign tables in 
current

 > >> master.
 > >> Now we have a foreign batch insert operation And I tried to 
rewrite the

 > >> patch [1] with this machinery.
 > >
 > > The patch has been rewritten to something essentially different, but
 > > no one reviewed it.  (Tsunakawa-san gave some comments without looking
 > > at it, though.)  So the right status of the patch is “Needs review”,
 > > rather than “Ready for Committer”?  Anyway, here are a few review
 > > comments from me:
 > >
 > > * I don’t think this assumption is correct:
 > >
 > > @@ -359,6 +386,12 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo 
*miinfo,
 > >   
(resultRelInfo->ri_TrigDesc->trig_insert_after_row ||

 > >    resultRelInfo->ri_TrigDesc->trig_insert_new_table))
 > >  {
 > > +   /*
 > > +    * AFTER ROW triggers aren't allowed with the foreign 
bulk insert

 > > +    * method.
 > > +    */
 > > +   Assert(resultRelInfo->ri_RelationDesc->rd_rel->relkind !=
 > > RELKIND_FOREIGN_TABLE);
 > > +
 > >
 > > In postgres_fdw we disable foreign batch insert when the target table
 > > has AFTER ROW triggers, but the core allows it even in that case.  No?
 > Agree
 >
 > > * To allow foreign multi insert, the patch made an invasive change to
 > > the existing logic to determine whether to use multi insert for the
 > > target relation, adding a new member ri_usesMultiInsert to the
 > > ResultRelInfo struct, as well as introducing a new function
 > > ExecMultiInsertAllowed().  But I’m not sure we really need such a
 > > change.  Isn’t it reasonable to *adjust* the existing logic to allow
 > > foreign multi insert when possible?
 > Of course, such approach would look much better, if we implemented it.
 > I'll ponder how to do it.
 >
 > > I didn’t finish my review, but I’ll mark this as “Waiting on Author”.
 > I rebased the patch onto current master. Now it works correctly. I'll
 > mark it as "Waiting for review".

I took a look at this patch as it would a useful optimization to have.

It applies cleanly to current HEAD, but as-is, with a large data set, it
reproducibly fails like this (using postgres_fdw):

     postgres=# COPY foo FROM '/tmp/fast-copy-from/test.csv' WITH 
(format csv);
     ERROR:  bind message supplies 0 parameters, but prepared statement 
"pgsql_fdw_prep_19422" requires 6
     CONTEXT:  remote SQL command: INSERT INTO public.foo_part_1(t, v1, 
v2, v3, v4, v5) VALUES ($1, $2, $3, $4, $5, $6)

     COPY foo, line 17281589

This occurs because not all multi-insert buffers being flushed actually 
contain
tuples; the fix is simply not to call ExecForeignBatchInsert() if that's 
the case,

e.g:


     /* Flush into foreign table or partition */
     do {
     int size = (resultRelInfo->ri_BatchSize < nused - sent) ?
     resultRelInfo->ri_BatchSize : (nused - sent);

     if (size)
     {
     int inserted = size;

 
resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert(estate,
  
resultRelInfo,
  
[sent],

  NULL,
  
);

     sent += size;
     }
     } while (sent < nused);


There might a case for arguing that the respective FDW should check that 
it has
actually received some tuples to insert, but IMHO it's much preferable 
to catch

this as early as possible and avoid a superfluous call.

FWIW, with the above fix in place, with a simple local test the patch 
produces a
consistent speed-up of about 8 times compared to the existing 
functionality.

Thank you for the attention to the patch.
I have a couple of questions:
1. It's a problem for me to reproduce the case you reported. Can you 
give more details on the reproduction?
2. Have you tried to use previous version, based on bulk COPY machinery, 
not bulk INSERT? Which approach looks better and have better performance 
in your opinion?


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Removing unneeded self joins

2022-06-30 Thread Andrey Lepikhov

On 19/5/2022 16:47, Ronan Dunklau wrote:

I'll take a look at that one.

New version of the patch, rebased on current master:
1. pgindent over the patch have passed.
2. number of changed files is reduced.
3. Some documentation and comments is added.

--
regards,
Andrey Lepikhov
Postgres ProfessionalFrom 9ce71f1d0ffefa9d77edfa30fc189bc0425ebbbe Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Thu, 15 Jul 2021 15:26:13 +0300
Subject: [PATCH] Remove self-joins.

A Self Join Removal (SJR) feature removes inner join of plane table to itself
in a query plan if can be proved that the join can be replaced with a scan.
The proof based on innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.

In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation.
3. If uniqueness of outer relation deduces from baserestrictinfo clause like
'x=const' on unique field(s), check that inner has the same clause with the
same constant(s).
4. Compare RowMarks of inner and outer relations. They must have the same
strength.

Some regression tests changed due to self-join removal logic.

[1] 
https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
 doc/src/sgml/config.sgml  |   16 +
 src/backend/optimizer/path/indxpath.c |   39 +
 src/backend/optimizer/plan/analyzejoins.c | 1045 -
 src/backend/optimizer/plan/planmain.c |5 +
 src/backend/utils/misc/guc.c  |   10 +
 src/include/optimizer/paths.h |3 +
 src/include/optimizer/planmain.h  |6 +
 src/test/regress/expected/equivclass.out  |   32 +
 src/test/regress/expected/join.out|  686 ++
 src/test/regress/expected/sysviews.out|3 +-
 src/test/regress/sql/equivclass.sql   |   16 +
 src/test/regress/sql/join.sql |  305 ++
 src/tools/pgindent/typedefs.list  |2 +
 13 files changed, 2142 insertions(+), 26 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 48478b1024..2226117c62 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5287,6 +5287,22 @@ ANY num_sync ( 
+  enable_self_join_removal (boolean)
+  
+   enable_self_join_removal configuration 
parameter
+  
+  
+  
+   
+Enables or disables the query planner's optimization which analyses
+query tree and replaces self joins with semantically equivalent single
+scans. Take into consideration only plain tables.
+The default is on.
+   
+  
+ 
+
  
   enable_seqscan (boolean)
   
diff --git a/src/backend/optimizer/path/indxpath.c 
b/src/backend/optimizer/path/indxpath.c
index 0ef70ad7f1..2eb05c79ce 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3498,8 +3498,24 @@ bool
 relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
  List *restrictlist,
  List *exprlist, List 
*oprlist)
+{
+   return relation_has_unique_index_ext(root, rel, restrictlist,
+   
 exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * if extra_clauses isn't NULL, return baserestrictinfo clauses which were
+ * used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List 
*oprlist,
+ List **extra_clauses)
 {
ListCell   *ic;
+   List   *exprs;
 
Assert(list_length(exprlist) == list_length(oprlist));
 
@@ -3554,6 +3570,8 @@ relation_has_unique_index_for(PlannerInfo *root, 
RelOptInfo *rel,
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
 
+   exprs = NIL;
+
/*
 * If the index is not unique, or not immediately enforced, or 
if it's
 * a partial index that doesn't match the query, it's useless 
here.
@@ -3601,6 +3619,23 @@ relation_has_unique_index_for(PlannerInfo *root, 
RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true;

Re: Postgres do not allow to create many tables with more than 63-symbols prefix

2022-06-26 Thread Andrey Lepikhov

On 6/27/22 06:38, Masahiko Sawada wrote:

On Fri, Jun 24, 2022 at 2:12 PM Andrey Lepikhov
 wrote:

On 6/23/22 07:03, Masahiko Sawada wrote:
  > On Sat, Jun 4, 2022 at 4:03 AM Andrey Lepikhov
  >  wrote:
  >> It is very corner case, of course. But solution is easy and short. So,
  >> why not to fix? - See the patch in attachment.
  >
  > While this seems to be a good improvement, I think it's not a bug.
  > Probably we cannot backpatch it as it will end up having type names
  > defined by different naming rules. I'd suggest discussing it on
  > -hackers.
Done.


Thank for updating the patch. Please register this item to the next CF
if not yet.

Done [1].


  > Regarding the patch, I think we can merge makeUniqueTypeName() to
  > makeArrayTypeName() as there is no caller of makeUniqueTypeName() who
  > pass tryOriginal = true.
I partially agree with you. But I have one reason to leave
makeUniqueTypeName() separated:
It may be used in other codes with auto generated types. For example, I
think, the DefineRelation routine should choose composite type instead
of using the same name as the table.


Okay.

I have one comment on v2 patch:

  +   for(;;)
  {
  -   dest[i - 1] = '_';
  -   strlcpy(dest + i, typeName, NAMEDATALEN - i);
  -   if (namelen + i >= NAMEDATALEN)
  -   truncate_identifier(dest, NAMEDATALEN, false);
  -
  if (!SearchSysCacheExists2(TYPENAMENSP,
  -  CStringGetDatum(dest),
  +  CStringGetDatum(type_name),
 ObjectIdGetDatum(typeNamespace)))
  -   return pstrdup(dest);
  +   return type_name;
  +
  +   /* Previous attempt was failed. Prepare a new one. */
  +   pfree(type_name);
  +   snprintf(suffix, sizeof(suffix), "%d", ++pass);
  +   type_name = makeObjectName("", typeName, suffix);
  }

  return NULL;

I think it's better to break from the loop instead of returning from
there. That way, we won't need "return NULL".

Agree. Done.

[1] https://commitfest.postgresql.org/38/3712/

--
Regards
Andrey Lepikhov
Postgres ProfessionalFrom 73b80676fff98e9edadab2576cf22778c6b5168b Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Fri, 3 Jun 2022 21:40:01 +0300
Subject: [PATCH] Allow postgresql to generate more relations with the same 63
 symbols long prefix. Rewrite the makeUniqueTypeName routine - generator of
 unique name is based on the same idea as the ChooseRelationName() routine.

Authors: Dmitry Koval, Andrey Lepikhov
---
 src/backend/catalog/pg_type.c | 37 ---
 src/test/regress/expected/alter_table.out |  6 ++--
 2 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/src/backend/catalog/pg_type.c b/src/backend/catalog/pg_type.c
index 03788cb975..405553013e 100644
--- a/src/backend/catalog/pg_type.c
+++ b/src/backend/catalog/pg_type.c
@@ -26,6 +26,7 @@
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_proc.h"
 #include "catalog/pg_type.h"
+#include "commands/defrem.h"
 #include "commands/typecmds.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
@@ -936,17 +937,17 @@ makeMultirangeTypeName(const char *rangeTypeName, Oid typeNamespace)
  * makeUniqueTypeName
  *		Generate a unique name for a prospective new type
  *
- * Given a typeName, return a new palloc'ed name by prepending underscores
- * until a non-conflicting name results.
+ * Given a typeName, return a new palloc'ed name by prepending underscore
+ * and (if needed) adding a suffix to the end of the type name.
  *
  * If tryOriginal, first try with zero underscores.
  */
 static char *
 makeUniqueTypeName(const char *typeName, Oid typeNamespace, bool tryOriginal)
 {
-	int			i;
-	int			namelen;
-	char		dest[NAMEDATALEN];
+	int		pass = 0;
+	char	suffix[NAMEDATALEN];
+	char   *type_name;
 
 	Assert(strlen(typeName) <= NAMEDATALEN - 1);
 
@@ -956,23 +957,25 @@ makeUniqueTypeName(const char *typeName, Oid typeNamespace, bool tryOriginal)
 			   ObjectIdGetDatum(typeNamespace)))
 		return pstrdup(typeName);
 
+	/* Prepare initial object name. Just for compatibility. */
+	type_name = makeObjectName("", typeName, NULL);
+
 	/*
-	 * The idea is to prepend underscores as needed until we make a name that
-	 * doesn't collide with anything ...
+	 * The idea of unique name generation is similar to ChooseRelationName()
+	 * implementation logic.
 	 */
-	namelen = strlen(typeName);
-	for (i = 1; i < NAMEDATALEN - 1; i++)
+	for(;;)
 	{
-		dest[i - 1] = '_';
-		strlcpy(dest + i, typeName, NAMEDATALEN - i);
-		if (namelen + i >= NAMEDATALEN)
-			truncate_identifier(dest, NAMEDATALEN, false);
-
 		if (!SearchSysCacheExists2(TYPENAMENSP,
-   CStringGetDatum(dest),
+   CStringGetDatum(type_name),
    ObjectIdGetDatum(typeName

Re: Implement hook for self-join simplification

2022-06-24 Thread Andrey Lepikhov

On 24/6/2022 23:43, Leif Harald Karlsen wrote:
Thank you for the quick answer, and for the pointer to the patch! This 
looks like just the thing I need! 
On a more general note: What would, in general, be the best way to 
implement such optimizations? Is there a good way to do this as an 
extension, or is a patch the preferred way?

According to my experience, it depends on your needings.
For example, self-join-removal feature, or my current project - 
flattening of nested subqueries - is much more optimal to implement as a 
patch, because you can do it so early as possible and can generalize 
parts of the core code and thus, reduce size of your code a lot.
But if you want to use your code with many PG versions, even already 
working in production or you make just a research, without immediate 
practical result - your choice is an extension.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Implement hook for self-join simplification

2022-06-24 Thread Andrey Lepikhov

On 24/6/2022 18:58, Leif Harald Karlsen wrote:
I have a made a small deductive database on top of PostgreSQL for 
educational/research purposes. In this setting, due to certain 
VIEW-constructions, queries often end up being self-joins on primary 
keys, e.g.:

SELECT t1.id, t2.val
FROM t AS t1 JOIN t AS t2 USING (id);

where t(id) is a primary key. This query is equivalent to the much more 
efficient:

SELECT id, val FROM t AS t1;

However, PostgreSQL currently does not seem to implement this 
simplification. Therefore, I have looked into writing an extension that 
performs this, but I am struggling a bit with finding out when this 
simplification should be done, i.e. which hook I should implement.
It is true, but you can use a proposed patch that adds such 
functionality [1].


I tried to reproduce your case:
CREATE TABLE t(id int PRIMARY KEY, val text);
explain verbose
SELECT t1.id, t2.val FROM t AS t1 JOIN t AS t2 USING (id);

With this patch you will get a plan:
 Seq Scan on public.t t2
   Output: t2.id, t2.val
   Filter: (t2.id IS NOT NULL)

The approach, implemented in this patch looks better because removes 
self-joins on earlier stage than the path generation stage. Feel free to 
use it in your research.


[1] 
https://www.postgresql.org/message-id/a1d6290c-44e0-0dfc-3fca-66a68b310...@postgrespro.ru


--
regards,
Andrey Lepikhov
Postgres Professional




Postgres do not allow to create many tables with more than 63-symbols prefix

2022-06-23 Thread Andrey Lepikhov

Moved from the pgsql-bugs mailing list [1].

On 6/23/22 07:03, Masahiko Sawada wrote:
> Hi,
>
> On Sat, Jun 4, 2022 at 4:03 AM Andrey Lepikhov
>  wrote:
>>
>> According to subj you can try to create many tables (induced by the case
>> of partitioned table) with long prefix - see 6727v.sql for reproduction.
>> But now it's impossible because of logic of the makeUniqueTypeName()
>> routine.
>> You get the error:
>> ERROR:  could not form array type name for type ...
>>
>> It is very corner case, of course. But solution is easy and short. So,
>> why not to fix? - See the patch in attachment.
>
> While this seems to be a good improvement, I think it's not a bug.
> Probably we cannot backpatch it as it will end up having type names
> defined by different naming rules. I'd suggest discussing it on
> -hackers.
Done.

> Regarding the patch, I think we can merge makeUniqueTypeName() to
> makeArrayTypeName() as there is no caller of makeUniqueTypeName() who
> pass tryOriginal = true.
I partially agree with you. But I have one reason to leave 
makeUniqueTypeName() separated:
It may be used in other codes with auto generated types. For example, I 
think, the DefineRelation routine should choose composite type instead 
of using the same name as the table.


> Also looking at other ChooseXXXName()
> functions, we don't care about integer overflow. Is it better to make
> it consistent with them?
Done.

[1] 
https://www.postgresql.org/message-id/flat/121e286f-3796-c9d7-9eab-6fb8e0b9c701%40postgrespro.ru


--
Regards
Andrey Lepikhov
Postgres ProfessionalFrom ae8552b8f9474a6fae893bce501c534cabcfaa13 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Fri, 3 Jun 2022 21:40:01 +0300
Subject: [PATCH] Allow postgresql to generate more relations with the same 63
 symbols long prefix. Rewrite the makeUniqueTypeName routine - generator of
 unique name is based on the same idea as the ChooseRelationName() routine.

Authors: Dmitry Koval, Andrey Lepikhov
---
 src/backend/catalog/pg_type.c | 35 ---
 src/test/regress/expected/alter_table.out |  6 ++--
 2 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/src/backend/catalog/pg_type.c b/src/backend/catalog/pg_type.c
index 03788cb975..bae6998360 100644
--- a/src/backend/catalog/pg_type.c
+++ b/src/backend/catalog/pg_type.c
@@ -26,6 +26,7 @@
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_proc.h"
 #include "catalog/pg_type.h"
+#include "commands/defrem.h"
 #include "commands/typecmds.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
@@ -936,17 +937,17 @@ makeMultirangeTypeName(const char *rangeTypeName, Oid typeNamespace)
  * makeUniqueTypeName
  *		Generate a unique name for a prospective new type
  *
- * Given a typeName, return a new palloc'ed name by prepending underscores
- * until a non-conflicting name results.
+ * Given a typeName, return a new palloc'ed name by prepending underscore
+ * and (if needed) adding a suffix to the end of the type name.
  *
  * If tryOriginal, first try with zero underscores.
  */
 static char *
 makeUniqueTypeName(const char *typeName, Oid typeNamespace, bool tryOriginal)
 {
-	int			i;
-	int			namelen;
-	char		dest[NAMEDATALEN];
+	int		pass = 0;
+	char	suffix[NAMEDATALEN];
+	char   *type_name;
 
 	Assert(strlen(typeName) <= NAMEDATALEN - 1);
 
@@ -956,22 +957,24 @@ makeUniqueTypeName(const char *typeName, Oid typeNamespace, bool tryOriginal)
 			   ObjectIdGetDatum(typeNamespace)))
 		return pstrdup(typeName);
 
+	/* Prepare initial object name. Just for compatibility. */
+	type_name = makeObjectName("", typeName, NULL);
+
 	/*
-	 * The idea is to prepend underscores as needed until we make a name that
-	 * doesn't collide with anything ...
+	 * The idea of unique name generation is similar to ChooseRelationName()
+	 * implementation logic.
 	 */
-	namelen = strlen(typeName);
-	for (i = 1; i < NAMEDATALEN - 1; i++)
+	for(;;)
 	{
-		dest[i - 1] = '_';
-		strlcpy(dest + i, typeName, NAMEDATALEN - i);
-		if (namelen + i >= NAMEDATALEN)
-			truncate_identifier(dest, NAMEDATALEN, false);
-
 		if (!SearchSysCacheExists2(TYPENAMENSP,
-   CStringGetDatum(dest),
+   CStringGetDatum(type_name),
    ObjectIdGetDatum(typeNamespace)))
-			return pstrdup(dest);
+			return type_name;
+
+		/* Previous attempt was failed. Prepare a new one. */
+		pfree(type_name);
+		snprintf(suffix, sizeof(suffix), "%d", ++pass);
+		type_name = makeObjectName("", typeName, suffix);
 	}
 
 	return NULL;
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index 5ede56d9b5..e634dbd86d 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -197

Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity

2022-06-15 Thread Andrey Lepikhov

On 15/6/2022 21:45, Imseih (AWS), Sami wrote:

Adding a plan_id to pg_stat_activity allows users
to determine if a plan for a particular statement
has changed and if the new plan is performing better
or worse for a particular statement.
There are several ways the plan_id in pg_stat_activity

In general, your idea is quite useful.
But, as discussed earlier [1] extensions would implement many useful 
things if they could add into a plan some custom data.
Maybe implement your feature with some private list of nodes in plan 
structure instead of single-purpose plan_id field?


[1] 
https://www.postgresql.org/message-id/flat/e0de3423-4bba-1e69-c55a-f76bf18dbd74%40postgrespro.ru


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Compare variables of composite type with slightly different column types

2022-05-28 Thread Andrey Lepikhov

On 26/5/2022 14:25, Andrey Lepikhov wrote:
I guess, here the compatible_oper() routine can be used to find a 
appropriate operator, or something like that can be invented.

I looked into the 2cd7084 and a4424c5, but don't found any rationale.


In accordance to this idea I prepared a code. For a demo only.

--
regards,
Andrey Lepikhov
Postgres ProfessionalFrom 71cf1a9148b20921aa60d7ae2062620c515557f8 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Sat, 28 May 2022 22:01:01 +0300
Subject: [PATCH] Try to search compatible operator if columns of compared rows
 is differ.

---
 src/backend/utils/adt/rowtypes.c   | 36 +++---
 src/test/regress/expected/rowtypes.out | 11 +---
 src/test/regress/sql/rowtypes.sql  |  2 +-
 3 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c
index db843a0fbf..432b44ec8c 100644
--- a/src/backend/utils/adt/rowtypes.c
+++ b/src/backend/utils/adt/rowtypes.c
@@ -18,14 +18,17 @@
 
 #include "access/detoast.h"
 #include "access/htup_details.h"
+#include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
 #include "common/hashfn.h"
 #include "funcapi.h"
 #include "libpq/pqformat.h"
 #include "miscadmin.h"
+#include "parser/parse_oper.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
 #include "utils/lsyscache.h"
+#include "utils/syscache.h"
 #include "utils/typcache.h"
 
 
@@ -1071,6 +1074,8 @@ record_eq(PG_FUNCTION_ARGS)
int i1;
int i2;
int j;
+   Oid op_func = InvalidOid;
+   FmgrInfoop_func_finfo;
 
check_stack_depth();/* recurses for record-type columns */
 
@@ -1173,12 +1178,27 @@ record_eq(PG_FUNCTION_ARGS)
 * Have two matching columns, they must be same type
 */
if (att1->atttypid != att2->atttypid)
-   ereport(ERROR,
+   {
+   Operator op;
+
+   op = compatible_oper(NULL, list_make1(makeString("=")),
+
att1->atttypid, att2->atttypid, false, -1);
+
+   if (!op)
+   ereport(ERROR,
(errcode(ERRCODE_DATATYPE_MISMATCH),
 errmsg("cannot compare dissimilar 
column types %s and %s at record column %d",

format_type_be(att1->atttypid),

format_type_be(att2->atttypid),
j + 1)));
+   else
+   {
+   op_func = oprfuncid(op);
+   fmgr_info(op_func, _func_finfo);
+   ReleaseSysCache(op);
+   }
+
+   }
 
/*
 * If they're not same collation, we don't complain here, but 
the
@@ -1217,8 +1237,18 @@ record_eq(PG_FUNCTION_ARGS)
}
 
/* Compare the pair of elements */
-   InitFunctionCallInfoData(*locfcinfo, 
>eq_opr_finfo, 2,
-
collation, NULL, NULL);
+   if (!OidIsValid(op_func))
+   /* Compare the pair of elements */
+   InitFunctionCallInfoData(*locfcinfo, 
>eq_opr_finfo, 2,
+   
 collation, NULL, NULL);
+   else
+   {
+   /* XXX: Some collation matching logic needed. */
+   collation = (collation != InvalidOid) ? 
collation : DEFAULT_COLLATION_OID;
+   InitFunctionCallInfoData(*locfcinfo, 
_func_finfo, 2,
+   
 collation, NULL, NULL);
+   }
+
locfcinfo->args[0].value = values1[i1];
locfcinfo->args[0].isnull = false;
locfcinfo->args[1].value = values2[i2];
diff --git a/src/test/regress/expected/rowtypes.out 
b/src/test/regress/expected/rowtypes.out
index a4cc2d8c12..8a3c4e9454 100644
--- a/src/test/regress/expected/rowtypes.out
+++ b/src/test/regress/expected/rowtypes.out
@@ -423,8 +423,12 @@ select a,b from test_table where (a,b) > ('a','a') order 
by a,b;
 
 reset enable_sort;
 -- Check row compar

Compare variables of composite type with slightly different column types

2022-05-26 Thread Andrey Lepikhov

Hi,
Researching on join selectivity improvement I stuck into the code in 
rowtypes.c:


/*
 * Have two matching columns, they must be same type
 */
if (att1->atttypid != att2->atttypid)
  ereport(ERROR, ...

Why, for example, isn't allowed next trivial query:

SELECT *
FROM
  (SELECT ROW(1::integer, 'robert'::text)) AS s1,
  (SELECT ROW(1::bigint, 'robert'::name)) AS s2
WHERE s1 = s2;

I guess, here the compatible_oper() routine can be used to find a 
appropriate operator, or something like that can be invented.

I looked into the 2cd7084 and a4424c5, but don't found any rationale.

--
Regards
Andrey Lepikhov
Postgres Professional




Re: Removing unneeded self joins

2022-05-19 Thread Andrey Lepikhov

On 5/17/22 19:14, Ronan Dunklau wrote:

Le vendredi 13 mai 2022, 07:07:47 CEST Andrey Lepikhov a écrit :

New version of the feature.
Here a minor bug with RowMarks is fixed. A degenerated case is fixed,
when uniqueness of an inner deduced not from join quals, but from a
baserestrictinfo clauses 'x=const', where x - unique field.
Code, dedicated to solve second issue is controversial, so i attached
delta.txt for quick observation.
Maybe we should return to previous version of code, when we didn't split
restriction list into join quals and base quals?


Hello,

I tried to find problematic cases, which would make the planning time grow
unacceptably, and couldn't devise it.

The worst case scenario I could think of was as follows:

  - a query with many different self joins
  - an abundance of unique indexes on combinations of this table columns to
consider
  - additional predicates on the where clause on columns.
Looking into the patch I can imagine, that the most difficult case is 
when a set of relations with the same OID is huge, but only small part 
of them (or nothing) can be removed.
Also, removing a clause from restrictinfo list or from equivalence class 
adds non-linear complexity. So, you can dig this way ).


The base table I used for this was a table with 40 integers. 39 unique indexes
were defined on every combination of (c1, cX) with cX being columns c2 to c40.

I turned geqo off, set from_collapse_limit and join_collapse_limit to
unreasonably high values (30), and tried to run queries of the form:

SELECT * FROM test_table t1
JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX
...
JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX.

So no self join can be eliminated in that case.
I think, you should compare t1.cX with tX.cX to eliminate self-join. 
Cross-unique-index proof isn't supported now.

The performance was very similar with or without the GUC enabled. I tested the
same thing without the patch, since the test for uniqueness has been slightly
altered and incurs a new allocation, but it doesn't seem to change.

One interesting side effect of this patch, is that removing any unneeded self
join cuts down the planification time very significantly, as we lower the number
of combinations to consider.

Even more - removing a join we improve cardinality estimation.


As for the code:

  - Comments on relation_has_unique_index_ext and relation_has_unique_index_for
should be rewritten, as relation_has_unique_index_for is now just a special
case of relation_has_unique_index_ext. By the way, the comment would probably
be better read as: "but if extra_clauses isn't NULL".
  - The whole thing about "extra_clauses", ie, baserestrictinfos which were
used to determine uniqueness, is not very clear. Most functions where the new
argument has been added have not seen an update in their comments, and the
name itself doesn't really convey the intented meaning: perhaps
required_non_join_clauses ?

The way this works should be explained a bit more thoroughly, for example in
remove_self_joins_one_group the purpose of uclauses should be explained. The
fact that degenerate_case returns true when we don't have any additional base
restrict info is also confusing, as well as the degenerate_case name.

Agree,
but after this case thoughts wander in my head: should we make one step 
back to pre-[1] approach? It looks like we have quite similar changes, 
but without special function for a 'degenerate case' detection and 
restrictlist splitting.


I'll update if I think of more interesting things to add.

Thank you for your efforts!

See in attachment next version which fixes mistakes detected by 
z...@yugabyte.com.


[1] 
https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com


--
Regards
Andrey Lepikhov
Postgres ProfessionalFrom 1774b4bcfe337b151b76c7f357dc6748755bf1d9 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Thu, 15 Jul 2021 15:26:13 +0300
Subject: [PATCH] Remove self-joins.

A Self Join Removal (SJR) feature removes inner join of plane table to itself
in a query plan if can be proved that the join can be replaced with a scan.
The proof based on innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.

In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation.
3. If uniqueness of outer relation deduces from baserestrictinfo clause like
'x=const' on unique field(s), check that inner has the same clause with the
same constant(s).
4. Compare RowMarks of inner and outer relations. They must

Re: Removing unneeded self joins

2022-03-04 Thread Andrey Lepikhov

On 1/3/2022 03:03, Greg Stark wrote:

On Thu, 1 Jul 2021 at 02:38, Ronan Dunklau  wrote:


Well in some cases they can't, when the query is not emitting redundant
predicates by itself but they are added by something else like a view or a RLS
policy.
Maybe it would be worth it to allow spending a bit more time planning for
those cases ?


Yeah, I'm generally in favour of doing more work in the optimizer to
save query authors work writing queries.

My question is whether it handles cases like:

select b.x,c.y
   from t
join t2 as b on (b.id = t.id)
   join t2 as c on (c.id = t.id)

That is, if you join against the same table twice on the same qual.
Does the EC mechanism turn this into a qual on b.id = c.id and then
turn this into a self-join that can be removed?
Yes, the self-join removal machinery uses EC mechanism as usual to get 
all join clauses. So, this case works (See demo in attachment).


Also, in new version of the patch I fixed one stupid bug: checking a 
self-join candidate expression operator - we can remove only expressions 
like F(arg1) = G(arg2).


--
regards,
Andrey Lepikhov
Postgres ProfessionalFrom 70398361a0a0d9c6c3c7ddd1fd305ac11138e7b1 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Thu, 15 Jul 2021 15:26:13 +0300
Subject: [PATCH] Remove self-joins.

Remove inner joins of a relation to itself if could prove that the join
can be replaced with a scan. We can prove the uniqueness
using the existing innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.

In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation. So proved, that this join is self-join and can be replaced by
a scan.

Some regression tests changed due to self-join removal logic.

[1] 
https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
 src/backend/optimizer/plan/analyzejoins.c | 888 +-
 src/backend/optimizer/plan/planmain.c |   5 +
 src/backend/optimizer/util/joininfo.c |   3 +
 src/backend/optimizer/util/relnode.c  |  26 +-
 src/backend/utils/misc/guc.c  |  10 +
 src/include/optimizer/pathnode.h  |   4 +
 src/include/optimizer/planmain.h  |   2 +
 src/test/regress/expected/equivclass.out  |  32 +
 src/test/regress/expected/join.out| 426 +++
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/sql/equivclass.sql   |  16 +
 src/test/regress/sql/join.sql | 197 +
 12 files changed, 1583 insertions(+), 29 deletions(-)

diff --git a/src/backend/optimizer/plan/analyzejoins.c 
b/src/backend/optimizer/plan/analyzejoins.c
index 337f470d58..c5ac8e2bd4 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
  */
 #include "postgres.h"
 
+#include "catalog/pg_class.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/joininfo.h"
@@ -32,10 +33,12 @@
 #include "optimizer/tlist.h"
 #include "utils/lsyscache.h"
 
+bool enable_self_join_removal;
+
 /* local functions */
 static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
 static void remove_rel_from_query(PlannerInfo *root, int relid,
- Relids 
joinrelids);
+ Relids 
joinrelids, int subst_relid);
 static List *remove_rel_from_joinlist(List *joinlist, int relid, int 
*nremoved);
 static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
 static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
@@ -47,6 +50,9 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
   RelOptInfo 
*innerrel,
   JoinType 
jointype,
   List 
*restrictlist);
+static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
+static Bitmapset* change_relid(Relids relids, Index oldId, Index newId);
+static void change_varno(Expr *expr, Index oldRelid, Index newRelid);
 
 
 /*
@@ -86,7 +92,7 @@ restart:
 
remove_rel_from_query(root, innerrelid,
  
bms_union(sjinfo->min_lefthand,
-   
sjinfo->min_righthand));
+

Re: Merging statistics from children instead of re-sampling everything

2022-02-10 Thread Andrey Lepikhov

On 21/1/2022 01:25, Tomas Vondra wrote:

But I don't have a very good idea what to do about statistics that we
can't really merge. For some types of statistics it's rather tricky to
reasonably merge the results - ndistinct is a simple example, although
we could work around that by building and merging hyperloglog counters.
I think, as a first step on this way we can reduce a number of pulled 
tuples. We don't really needed to pull all tuples from a remote server. 
To construct a reservoir, we can pull only a tuple sample. Reservoir 
method needs only a few arguments to return a sample like you read 
tuples locally. Also, to get such parts of samples asynchronously, we 
can get size of each partition on a preliminary step of analysis.
In my opinion, even this solution can reduce heaviness of a problem 
drastically.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Multiple Query IDs for a rewritten parse tree

2022-01-31 Thread Andrey Lepikhov

On 29/1/2022 12:51, Julien Rouhaud wrote:

4. We should reserve position of default in-core generator


 From the discussion above I was under the impression that the core
generator should be distinguished by a predefined kind.


I don't really like this approach.  IIUC, this patch as-is is meant to break
pg_stat_statements extensibility.  If kind == 0 is reserved for in-core queryid
then you can't use pg_stat_statement with a different queryid generator
anymore.


Yes, it is one more problem. Maybe if we want to make it extensible, we 
could think about hooks in the pg_stat_statements too?



The patch also reserves kind == -1 for pg_stat_statements internal purpose, and
I'm not really sure why that's needed.
My idea - tags with positive numbers are reserved for generation 
results, that is performance-critical.
As I see during the implementation, pg_stat_statements makes additional 
changes on queryId (no matter which ones). Because our purpose is to 
avoid interference in this place, I invented negative values, where 
extensions can store their queryIds, based on any generator or not. 
Maybe it is redundant - main idea here was to highlight the issue.


I'm also unsure of how are extensions supposed to cooperate in general, as
I feel that queryids should be implemented for some "intent" (like monitoring,
planning optimization...).  That being said I'm not sure if e.g. AQO heuristics
are too specific for its need or if it could be shared with other extension
that might be dealing with similar concerns.

I think, it depends on a specific purpose of an extension.

--
regards,
Andrey Lepikhov
Postgres Professional




Re: POC: GROUP BY optimization

2022-01-23 Thread Andrey Lepikhov

On 22/1/2022 01:34, Tomas Vondra wrote:
The other thing we could do is reduce the coefficient gradually - so 
it'd be 1.5 for the first pathkey, then 1.25 for the next one, and so 
on. But it seems somewhat arbitrary (I certainly don't have some sound 
theoretical justification ...).


I think, it hasn't a reason to increase complexity without any theory at 
the bottom. Simple solution allows to realize problems much faster, if 
they arise.


... I've skipped the path_save 
removal in planner.c, because that seems incorrect - if there are 
multiple pathkeys, we must start with the original path, not the 
modified one we built in the last iteration. Or am I missing something

You are right, I misunderstood the idea of path_save variable.

--
regards,
Andrey Lepikhov
Postgres Professional




Re: POC: GROUP BY optimization

2022-01-21 Thread Andrey Lepikhov

On 7/22/21 3:58 AM, Tomas Vondra wrote:

4) I'm not sure it's actually a good idea to pass tuplesPerPrevGroup to
estimate_num_groups_incremental. In principle yes, if we use "group
size" from the previous step, then the returned value is the number of
new groups after adding the "new" pathkey.
But even if we ignore the issues with amplification mentioned in (3),
there's an issue with non-linear behavior in estimate_num_groups,
because at some point it's calculating

D(N,n,p) = n * (1 - ((N-p)/N)^(N/n))

where N - total rows, p - sample size, n - number of distinct values.
And if we have (N1,n1) and (N2,n2) then the ratio of calculated
estimated (which is pretty much what calculating group size does)

D(N2,n2,p2) / D(N1,n1,p1)

which will differ depending on p1 and p2. And if we're tweaking the
tuplesPerPrevGroup all the time, that's really annoying, as it may make
the groups smaller or larger, which is unpredictable and annoying, and I
wonder if it might go against the idea of penalizing tuplesPerPrevGroup
to some extent.
We could simply use the input "tuples" value here, and then divide the
current and previous estimate to calculate the number of new groups.
tuplesPerPrevGroup is only a top boundary for the estimation. I think, 
we should use result of previous estimation as a limit for the next 
during incremental estimation. Maybe we simply limit the 
tuplesPerPrevGroup value to ensure of monotonic non-growth of this 
value? - see in attachment patch to previous fixes.


--
regards,
Andrey Lepikhov
Postgres Professionaldiff --git a/src/backend/optimizer/path/costsize.c 
b/src/backend/optimizer/path/costsize.c
index 70af9c91d5..4e26cd275d 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -2025,8 +2025,10 @@ compute_cpu_sort_cost(PlannerInfo *root, List *pathkeys, 
int nPresortedKeys,
/*
 * Real-world distribution isn't uniform but now we don't have 
a way to
 * determine that, so, add multiplier to get closer to worst 
case.
+* Guard this value against irrational decisions.
 */
-   tuplesPerPrevGroup = ceil(1.5 * tuplesPerPrevGroup / nGroups);
+   tuplesPerPrevGroup = Min(tuplesPerPrevGroup,
+ceil(1.5 * 
tuplesPerPrevGroup / nGroups));
 
/*
 * We could skip all following columns for cost estimation, 
because we


Re: Multiple Query IDs for a rewritten parse tree

2022-01-10 Thread Andrey Lepikhov

On 10/1/2022 15:39, Julien Rouhaud wrote:

On Mon, Jan 10, 2022 at 03:24:46PM +0500, Andrey Lepikhov wrote:

On 10/1/2022 13:56, Julien Rouhaud wrote:
Yes. the same input query string doesn't prove that frozen query plan can be
used, because rewrite rules could be changed. So we use only a query tree.


Yes, I'm fully aware of that.  I wasn't asking about using the input query
string but the need for generating a dedicated normalized output query string
that would be potentially different from the one generated by
pg_stat_statements (or similar).

Thanks, now I got it. I don't remember a single situation where we would 
need to normalize a query string.

But then, if generating 2 queryid is a better for performance, does the
extension really need an additional jumble state and/or normalized query
string?

Additional Jumble state isn't necessary, but it would be better for
performance to collect pointers to all constant nodes during a process of
hash generation.


I agree, but it's even better for performance if this is collected only once.
I still don't know if this extension (or any extension) would require something
different from a common jumble state that would serve for all purpose or if we
can work with a single jumble state and only handle multiple queryid.
I think, JumbleState in highly monitoring-specific, maybe even 
pg_stat_statements specific (maybe you can designate another examples). 
I haven't ideas how it can be used in arbitrary extension.
But, it is not so difficult to imagine an implementation (as part of 
Tom's approach, described earlier) such kind of storage for each 
generation method.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Multiple Query IDs for a rewritten parse tree

2022-01-10 Thread Andrey Lepikhov

On 10/1/2022 13:56, Julien Rouhaud wrote:

On Mon, Jan 10, 2022 at 12:37:34PM +0500, Andrey V. Lepikhov wrote:

I think, pg_stat_statements can live with an queryId generator of the
sr_plan extension. But It replaces all constants with $XXX parameter at the
query string. In our extension user defines which plan is optimal and which
constants can be used as parameters in the plan.


I don't know the details of that extension, but I think that it should work as
long as you have the constants information in the jumble state, whatever the
resulting normalized query string is right?
Yes. the same input query string doesn't prove that frozen query plan 
can be used, because rewrite rules could be changed. So we use only a 
query tree. Here we must have custom jumbling implementation.

queryId in this extension defines two aspects:
1. How many input queries will be compared with a query tree template of 
the frozen statement.

2. As a result, performance overheads on unsuccessful comparings.



One drawback I see here - creating or dropping of my extension changes
behavior of pg_stat_statements that leads to distortion of the DB load
profile. Also, we haven't guarantees, that another extension will work
correctly (or in optimal way) with such queryId.


But then, if generating 2 queryid is a better for performance, does the
extension really need an additional jumble state and/or normalized query
string?
Additional Jumble state isn't necessary, but it would be better for 
performance to collect pointers to all constant nodes during a process 
of hash generation.


--
regards,
Andrey Lepikhov
Postgres Professional




Multiple Query IDs for a rewritten parse tree

2022-01-08 Thread Andrey Lepikhov

On 5/1/2022 10:13, Tom Lane wrote:
> I feel like we need to get away from the idea that there is just
> one query hash, and somehow let different extensions attach
> differently-calculated hashes to a query.  I don't have any immediate
> ideas about how to do that in a reasonably inexpensive way.

Now, queryId field represents an query class (depending on an jumbling 
implementation). It is used by extensions as the way for simple tracking 
a query from a parse tree creation point to the end of its life along 
all hook calls, which an extension uses (remember about possible plan 
caching).


I know at least two different cases of using queryId:
1) for monitoring purposes - pg_stat_statements is watching how often 
queries of a class emerge in the database and collects a stat on each class.
2) adaptive purposes - some extensions influence a planner decision 
during the optimization stage and want to learn on a performance shift 
at the end of execution stage.


Different purposes may require different jumbling implementations. But 
users can want to use such extensions at the same time. So we should 
allow to attach many different query IDs to a query (maybe better to 
call it as 'query label'?).


Thinking for a while I invented three different ways to implement it:
1. queryId will be a trivial 64-bit counter. So, each extension can 
differ each query from any other, track it along all hooks, use an 
jumbling code and store an queryId internally. Here only one big problem 
I see - increasing overhead in the case of many consumers of queryId 
feature.


2. Instead of simple queryId we can store a list of pairs (QueryId, 
funcOid). An extension can register a callback for queryId generation 
and the core will form a list of queryIds right after an query tree 
rewriting. funcOid is needed to differ jumbling implementations. Here we 
should invent an additional node type for an element of the list.


3. Instead of queryId we could add a multi-purpose 'private' list in the 
Query struct. Any extension can add to this list additional object(s) 
(with registered object type, of course). As an example, i can imagine a 
kind of convention for queryIds in such case - store a String node with 
value: ' - '.

This way we should implement a registered callback mechanism too.

I think, third way is the cheapest, flexible and simple for implementation.

Any thoughts, comments, criticism ?

--
regards,
Andrey Lepikhov
Postgres Professional




Re: Add index scan progress to pg_stat_progress_vacuum

2021-12-23 Thread Andrey Lepikhov

On 21/12/2021 00:05, Peter Geoghegan wrote:

* Some index AMs don't work like nbtree and GiST in that they cannot
do their scan sequentially -- they have to do something like a
logical/keyspace order scan instead, which is *totally* different to
heapam (not just a bit different). There is no telling how many times
each page will be accessed in these other index AMs, and in what
order, even under optimal conditions. We should arguably not even try
to provide any granular progress information here, since it'll
probably be too messy.


Maybe we could add callbacks into AM interface for 
send/receive/representation implementation of progress?
So AM would define a set of parameters to send into stat collector and 
show to users.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Clarifying/rationalizing Vars' varno/varattno/varnoold/varoattno

2021-12-22 Thread Andrey Lepikhov

On 22/12/2021 20:42, Tom Lane wrote:

Andrey Lepikhov  writes:

On 5/2/2020 01:24, Tom Lane wrote:

I've not written any actual code, but am close to being ready to.



This thread gives us hope to get started on solving some of the basic
planner problems.
But there is no activity for a long time, as I see. Have You tried to
implement this idea? Is it actual now?


It's been on the back burner for awhile :-(.  I've not forgotten
about it though.
I would try to develop this feature. Idea is clear for me, but 
definition of the NullableVars structure is not obvious. Maybe you can 
prepare a sketch of this struct or you already have some draft code?


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Clarifying/rationalizing Vars' varno/varattno/varnoold/varoattno

2021-12-22 Thread Andrey Lepikhov

On 5/2/2020 01:24, Tom Lane wrote:

I've not written any actual code, but am close to being ready to.
This thread gives us hope to get started on solving some of the basic 
planner problems.
But there is no activity for a long time, as I see. Have You tried to 
implement this idea? Is it actual now?


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Make query ID more portable

2021-10-14 Thread Andrey Lepikhov

On 14/10/21 10:40, Julien Rouhaud wrote:

On Thu, Oct 14, 2021 at 12:37 PM Andrey Lepikhov
 wrote:


On 12/10/21 18:45, Bruce Momjian wrote:

On Tue, Oct 12, 2021 at 09:40:47AM -0400, Tom Lane wrote:

Andrey Lepikhov  writes:

I think that there are just too many arbitrary decisions that could be
made on what exactly should be a query identifier to have a single
in-core implementation.
Yes, and I use such custom decision too. But core jumbling code 
implements good idea and can be generalized for reuse. Patch from 
previous letter and breaking down of JumbleState can allow coders to 
implement their codes based on queryjumble.c module with smaller changes.



 If you do sharding, you already have to
properly configure each node, so configuring your custom query id
extension shouldn't be a big problem.
My project is about adaptive query optimization techniques. It is not 
obvious how to match (without a field in Query struct) a post parse and 
an execution phases because of nested queries.
Also, if we use queryId in an extension, we interfere with 
pg_stat_statements.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Make query ID more portable

2021-10-13 Thread Andrey Lepikhov

On 12/10/21 18:40, Tom Lane wrote:

Andrey Lepikhov  writes:

But core jumbling code is good, fast and much easier in support.

A bigger issue is that query ID stability isn't something we are going
to promise on a large scale --- for example, what if a new release adds
some new fields to struct Query?  So I'm not sure that "query IDs should
survive dump/reload" is a useful goal to consider.  It's certainly not
something that could be reached by anything even remotely like the
existing code.

Thank you for the explanation.
I think the problem of queryId is that is encapsulates two different 
meanings:
1. It allows an extension to match an query on post parse and execution 
stages. In this sense, queryId should be as unique as possible for each 
query.
2. For pg_stat_statements purposes (and my project too) it represents an 
query class and should be stable against permutations of range table 
entries, clauses, e.t.c. For example:

"SELECT * FROM a,b;" and "SELECT * FROM b,a;" should have the same queryId.

This issue may be solved in an extension with next approach:
1. Force as unique value for queryId as extension wants in a post parse hook
2. Generalize the JumbleQuery routine code to generate a kind of query 
class signature.


The attached patch is a first sketch for such change.

--
regards,
Andrey Lepikhov
Postgres ProfessionalFrom ef4033935b25c90fd8c5b6f10a0646a7ede385e3 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 13 Oct 2021 10:22:53 +0500
Subject: [PATCH] Add callback for jumbling of an OID value.

Query jumbling machinery depends on OIDs of objects. It is fine for
specific instance, because allows to survive an object rename. But it
is bad for a cross instance case. Even if you have the same code
version, the same schema, you can't guarantee stability of oids.

So, an extension can't rely on this technique

This patch changes an interface of the JumbleQuery routine to allow
an extension to use it for generation of an query signature and
use specific algorithm for oids jumbling.
---
 src/backend/commands/explain.c   |  6 +--
 src/backend/parser/analyze.c | 12 ++---
 src/backend/tcop/postgres.c  |  6 +--
 src/backend/utils/misc/queryjumble.c | 81 ++--
 src/include/utils/queryjumble.h  | 14 -
 5 files changed, 67 insertions(+), 52 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 10644dfac4..b2f10e828e 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -166,7 +166,7 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 {
ExplainState *es = NewExplainState();
TupOutputState *tstate;
-   JumbleState *jstate = NULL;
+   JumbleState jstate;
Query  *query;
List   *rewritten;
ListCell   *lc;
@@ -246,10 +246,10 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 
query = castNode(Query, stmt->query);
if (IsQueryIdEnabled())
-   jstate = JumbleQuery(query, pstate->p_sourcetext);
+   query->queryId = JumbleQuery(query, pstate->p_sourcetext, NULL, 
);
 
if (post_parse_analyze_hook)
-   (*post_parse_analyze_hook) (pstate, query, jstate);
+   (*post_parse_analyze_hook) (pstate, query, );
 
/*
 * Parse analysis was done already, but we still have to run the rule
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 146ee8dd1e..f9c3991a94 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -113,7 +113,7 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 {
ParseState *pstate = make_parsestate(NULL);
Query  *query;
-   JumbleState *jstate = NULL;
+   JumbleState jstate;
 
Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -127,10 +127,10 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
query = transformTopLevelStmt(pstate, parseTree);
 
if (IsQueryIdEnabled())
-   jstate = JumbleQuery(query, sourceText);
+   query->queryId = JumbleQuery(query, sourceText, NULL, );
 
if (post_parse_analyze_hook)
-   (*post_parse_analyze_hook) (pstate, query, jstate);
+   (*post_parse_analyze_hook) (pstate, query, );
 
free_parsestate(pstate);
 
@@ -152,7 +152,7 @@ parse_analyze_varparams(RawStmt *parseTree, const char 
*sourceText,
 {
ParseState *pstate = make_parsestate(NULL);
Query  *query;
-   JumbleState *jstate = NULL;
+   JumbleState jstate;
 
Assert(sourceText != NULL); /* required as of 8.4 */
 
@@ -166,10 +166,10 @@ parse_analyze_varparams(RawStmt *parseTree, const char 
*sourceText,
check_variable_parameters(pstate, query);
 
if (IsQueryIdEnabled())
-   jstate = JumbleQuery(query, sourceText);
+

Re: Make query ID more portable

2021-10-13 Thread Andrey Lepikhov

On 12/10/21 18:45, Bruce Momjian wrote:

On Tue, Oct 12, 2021 at 09:40:47AM -0400, Tom Lane wrote:

Andrey Lepikhov  writes:

But core jumbling code is good, fast and much easier in support.

Also, the current code handles renames of schemas and objects, but this
would not.
Yes, It is good option if an extension works only in the context of one 
node. But my efforts are directed to the cross-instance usage of a 
monitoring data. As an example, it may be useful for sharding.
Also, I guess for an user is essential to think that if he changed a 
name of any object he also would change queries and reset monitoring 
data, related on this object.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Make query ID more portable

2021-10-12 Thread Andrey Lepikhov

On 12/10/21 13:35, Julien Rouhaud wrote:

Hi,

On Tue, Oct 12, 2021 at 4:12 PM Andrey V. Lepikhov
 wrote:

See the patch in attachment as an POC. Main idea here is to break
JumbleState down to a 'clocations' part that can be really interested in
a post parse hook and a 'context data', that needed to build query or
subquery signature (hash) and, I guess, isn't really needed in any
extensions.


There have been quite a lot of threads about that in the past, and
almost every time people wanted to change how the hash was computed.
So it seems to me that extensions would actually be quite interested
in that.  This is even more the case now that an extension can be used
to replace the queryid calculation only and keep the rest of the
extension relying on it as is.
Yes, I know. I have been using such self-made queryID for four years. 
And I will use it further.
But core jumbling code is good, fast and much easier in support. The 
purpose of this work is extending of jumbling to use in more flexible 
way to avoid meaningless copying of this code to an extension.

I think, it adds not much complexity and overhead.


I think the biggest change in your patch is:

   case RTE_RELATION:
- APP_JUMB(rte->relid);
- JumbleExpr(jstate, (Node *) rte->tablesample);
+ {
+ char *relname = regclassout_ext(rte->relid, true);
+
+ APP_JUMB_STRING(relname);
+ JumbleExpr(jstate, (Node *) rte->tablesample, ctx);
   APP_JUMB(rte->inh);
   break;

Have you done any benchmark on OLTP workload?  Adding catalog access
there is likely to add significant overhead.
Yes, I should do benchmarking. But I guess, main goal of Query ID is 
monitoring, that can be switched off, if necessary.

This part made for a demo. It can be replaced by a hook, for example.


Also, why only using the fully qualified relation name for stable
hashes?  At least operators and functions should also be treated the
same way.  If you do that you will probably have way too much overhead
to be usable in a busy production environment.  Why not using the new
possibility of 3rd party extension for the queryid calculation that
exactly suits your need?

I fully agree with these arguments. This code is POC. Main part here is 
breaking down JumbleState, using a local context for subqueries and 
sorting of a range table entries hashes.
I think, we can call one routine (APP_JUMB_OBJECT(), as an example) for 
all oids in this code. It would allow an extension to intercept this 
call and replace oid with an arbitrary value.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Increase value of OUTER_VAR

2021-09-15 Thread Andrey Lepikhov

On 14/9/21 16:37, Aleksander Alekseev wrote:

Hi Andrey,


only performance issues


That's interesting. Any chance you could share the hardware
description, the configuration file, and steps to reproduce with us?

I didn't control execution time exactly. Because it is a join of two 
empty tables. As I see, this join used most part of 48GB RAM memory, 
planned all day on a typical 6 amd cores computer.
I guess this is caused by sequental traversal of the partition list in 
some places in the optimizer.
If it makes practical sense, I could investigate reasons for such poor 
performance.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Asymmetric partition-wise JOIN

2021-09-15 Thread Andrey Lepikhov

On 14/9/21 11:37, Andrey V. Lepikhov wrote:

Thank you for this good catch!
The problem was in the adjust_child_relids_multilevel routine. The 
tmp_result variable sometimes points to original required_outer.
This patch adds new ways which optimizer can generate plans. One 
possible way is optimizer reparameterizes an inner by a plain relation 
from the outer (maybe as a result of join of the plain relation and 
partitioned relation). In this case we have to compare tmp_result with 
original pointer to realize, it was changed or not.
The patch in attachment fixes this problem. Additional regression test 
added.


I thought more and realized there isn't necessary to recurse in the 
adjust_child_relids_multilevel() routine if required_outer contains only

normal_relids.
Also, regression tests were improved a bit.

--
regards,
Andrey Lepikhov
Postgres Professional
>From ee627d07282629a85785a63341cf875bfb0decb2 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov 
Date: Fri, 2 Apr 2021 11:02:20 +0500
Subject: [PATCH] Asymmetric partitionwise join.

Teach optimizer to consider partitionwise join of non-partitioned
table with each partition of partitioned table.

Disallow asymmetric machinery for joining of two partitioned (or appended)
relations because it could cause huge consumption of CPU and memory
during reparameterization of NestLoop path.

Change logic of the multilevel child relids adjustment, because this
feature allows the optimizer to plan in new way.
---
 src/backend/optimizer/path/joinpath.c|   9 +
 src/backend/optimizer/path/joinrels.c| 187 
 src/backend/optimizer/plan/setrefs.c |  17 +-
 src/backend/optimizer/util/appendinfo.c  |  44 +-
 src/backend/optimizer/util/pathnode.c|   9 +-
 src/backend/optimizer/util/relnode.c |  19 +-
 src/include/optimizer/paths.h|   7 +-
 src/test/regress/expected/partition_join.out | 425 +++
 src/test/regress/sql/partition_join.sql  | 180 
 9 files changed, 867 insertions(+), 30 deletions(-)

diff --git a/src/backend/optimizer/path/joinpath.c 
b/src/backend/optimizer/path/joinpath.c
index 6407ede12a..32618ebbd5 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -335,6 +335,15 @@ add_paths_to_joinrel(PlannerInfo *root,
if (set_join_pathlist_hook)
set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
   jointype, );
+
+   /*
+* 7. If outer relation is delivered from partition-tables, consider
+* distributing inner relation to every partition-leaf prior to
+* append these leafs.
+*/
+   try_asymmetric_partitionwise_join(root, joinrel,
+ 
outerrel, innerrel,
+ 
jointype, );
 }
 
 /*
diff --git a/src/backend/optimizer/path/joinrels.c 
b/src/backend/optimizer/path/joinrels.c
index 8b69870cf4..9453258f83 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,6 +16,7 @@
 
 #include "miscadmin.h"
 #include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
 #include "optimizer/joininfo.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
@@ -1552,6 +1553,192 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo 
*rel1, RelOptInfo *rel2,
}
 }
 
+/*
+ * Build RelOptInfo on JOIN of each partition of the outer relation and the 
inner
+ * relation. Return List of such RelOptInfo's. Return NIL, if at least one of
+ * these JOINs is impossible to build.
+ */
+static List *
+extract_asymmetric_partitionwise_subjoin(PlannerInfo *root,
+   
 RelOptInfo *joinrel,
+   
 AppendPath *append_path,
+   
 RelOptInfo *inner_rel,
+   
 JoinType jointype,
+   
 JoinPathExtraData *extra)
+{
+   List*result = NIL;
+   ListCell*lc;
+
+   foreach (lc, append_path->subpaths)
+   {
+   Path*child_path = lfirst(lc);
+   RelOptInfo  *child_rel = child_path->parent;
+   Relids  child_joinrelids;
+   Relids  parent_relids;
+   RelOptInfo  *child_joinrel;
+   SpecialJoinInfo *child_sjinfo;
+   List*child_restrictlist;
+
+   child_joinrelids = bms_union(child_rel-&g

Re: Postgres picks suboptimal index after building of an extended statistics

2021-08-30 Thread Andrey Lepikhov

On 12/8/21 04:26, Tomas Vondra wrote:

On 8/11/21 2:48 AM, Peter Geoghegan wrote:

On Wed, Jun 23, 2021 at 7:19 AM Andrey V. Lepikhov
I agree the current behavior is unfortunate, but I'm not convinced the 
proposed patch is fixing the right place - doesn't this mean the index 
costing won't match the row estimates displayed by EXPLAIN?

I rewrote the patch. It's now simpler and shorter. May be more convenient.
Also, it includes a regression test to detect the problem in future.


I wonder if we should teach clauselist_selectivity about UNIQUE indexes, 
and improve the cardinality estimates directly, not just costing for 
index scans.
I tried to implement this in different ways. But it causes additional 
overhead and code complexity - analyzing a list of indexes and match 
clauses of each index with input clauses in each selectivity estimation.

I don't like that way and propose a new patch in attachment.

--
regards,
Andrey Lepikhov
Postgres Professional
From 41ce6007cd552afd1a73983f0b9c9cac0e125d58 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Mon, 30 Aug 2021 11:21:57 +0500
Subject: [PATCH] Estimating number of fetched rows in a btree index we save
 selectivity estimation in the costs structure. It will be used by the
 genericcostestimate routine as a top bound for estimation of total tuples,
 visited in the main table.

This code fix the problem with unique index, when we know for sure
that no more than one tuple can be fetched, but clauselist_selectivity
gives us much less accurate estimation because of many possible reasons.
A regression test is added as a demonstration of the problem.
---
 src/backend/utils/adt/selfuncs.c| 18 ++--
 src/test/regress/expected/stats_ext.out | 38 +
 src/test/regress/sql/stats_ext.sql  | 34 ++
 3 files changed, 88 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index c2aeb4b947..dd1cadad61 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -6074,6 +6074,14 @@ genericcostestimate(PlannerInfo *root,
 */
numIndexTuples = rint(numIndexTuples / num_sa_scans);
}
+   else if (costs->indexSelectivity > 0. &&
+   indexSelectivity > costs->indexSelectivity)
+   /*
+* If caller give us an estimation of amount of fetched index 
tuples,
+* it could give the selectivity estimation. In this case 
amount of
+* returned tuples can't be more than amount of fetched tuples.
+*/
+   indexSelectivity = costs->indexSelectivity;
 
/*
 * We can bound the number of tuples by the index size in any case. 
Also,
@@ -6258,6 +6266,7 @@ btcostestimate(PlannerInfo *root, IndexPath *path, double 
loop_count,
boolfound_is_null_op;
double  num_sa_scans;
ListCell   *lc;
+   Selectivity btreeSelectivity;
 
/*
 * For a btree scan, only leading '=' quals plus inequality quals for 
the
@@ -6362,19 +6371,23 @@ btcostestimate(PlannerInfo *root, IndexPath *path, 
double loop_count,
/*
 * If index is unique and we found an '=' clause for each column, we can
 * just assume numIndexTuples = 1 and skip the expensive
-* clauselist_selectivity calculations.  However, a ScalarArrayOp or
+* clauselist_selectivity calculations. However, a ScalarArrayOp or
 * NullTest invalidates that theory, even though it sets eqQualHere.
+* Value of btreeSelectivity is used as a top bound for selectivity
+* estimation of returned tuples in the genericcostestimate routine.
 */
if (index->unique &&
indexcol == index->nkeycolumns - 1 &&
eqQualHere &&
!found_saop &&
!found_is_null_op)
+   {
numIndexTuples = 1.0;
+   btreeSelectivity = 1. / index->rel->tuples;
+   }
else
{
List   *selectivityQuals;
-   Selectivity btreeSelectivity;
 
/*
 * If the index is partial, AND the index predicate with the
@@ -6402,6 +6415,7 @@ btcostestimate(PlannerInfo *root, IndexPath *path, double 
loop_count,
 */
MemSet(, 0, sizeof(costs));
costs.numIndexTuples = numIndexTuples;
+   costs.indexSelectivity = btreeSelectivity;
 
genericcostestimate(root, path, loop_count, );
 
diff --git a/src/test/regress/expected/stats_ext.out 
b/src/test/regress/expected/stats_ext.out
index 7524e65142..b90463821f 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -1602,3 +1602,41 @@ NOTICE:  drop cascades to 2 other objects
 DETAIL:  drop cascades to table tst

Representation of SubPlan testexpr in EXPLAIN

2021-08-24 Thread Andrey Lepikhov

Hi,

EXPLAIN command doesn't show testexpr. Sometimes it is not easy to 
understand a query plan. That I mean:


CREATE TABLE a (x integer, y integer);
EXPLAIN (COSTS OFF, VERBOSE) SELECT x, y FROM a upper
  WHERE y IN (SELECT y FROM a WHERE upper.y = x);
EXPLAIN (COSTS OFF, VERBOSE) SELECT x, y FROM a upper
  WHERE x+y IN (SELECT y FROM a WHERE upper.y = x);

These two explains have the same representation:
Seq Scan on public.a upper
   Output: upper.x, upper.y
   Filter: (SubPlan 1)
   SubPlan 1
 ->  Seq Scan on public.a
   Output: a.y
   Filter: (upper.y = a.x)

It is a bit annoying when you don't have original query or don't trust 
competence of a user who sent you this explain.
In attachment - patch which solves this problem. I'm not completely sure 
that this option really needed and patch presents a proof of concept only.


--
regards,
Andrey Lepikhov
Postgres Professional
From db440802d84a96a98d0fb30672232e81e10c4598 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov 
Date: Tue, 24 Aug 2021 10:59:00 +0500
Subject: [PATCH] Improve EXPLAIN presentation

---
 src/backend/utils/adt/ruleutils.c | 101 +++---
 src/test/regress/expected/insert_conflict.out |   2 +-
 src/test/regress/expected/join.out|   2 +-
 src/test/regress/expected/rowsecurity.out |   6 +-
 src/test/regress/expected/select_parallel.out |  12 +--
 src/test/regress/expected/subselect.out   |  40 +++
 src/test/regress/expected/updatable_views.out |  24 ++---
 7 files changed, 106 insertions(+), 81 deletions(-)

diff --git a/src/backend/utils/adt/ruleutils.c 
b/src/backend/utils/adt/ruleutils.c
index 4df8cc5abf..1c25769633 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -455,7 +455,10 @@ static void get_const_expr(Const *constval, 
deparse_context *context,
   int showtype);
 static void get_const_collation(Const *constval, deparse_context *context);
 static void simple_quote_literal(StringInfo buf, const char *val);
+static void get_subplan_expr(SubPlan *subplan, deparse_context *context);
 static void get_sublink_expr(SubLink *sublink, deparse_context *context);
+static bool get_subselect_expr(SubLinkType subLinkType, Node *testexpr,
+  deparse_context 
*context);
 static void get_tablefunc(TableFunc *tf, deparse_context *context,
  bool showimplicit);
 static void get_from_clause(Query *query, const char *prefix,
@@ -8536,20 +8539,7 @@ get_rule_expr(Node *node, deparse_context *context,
break;
 
case T_SubPlan:
-   {
-   SubPlan*subplan = (SubPlan *) node;
-
-   /*
-* We cannot see an already-planned subplan in 
rule deparsing,
-* only while EXPLAINing a query plan.  We 
don't try to
-* reconstruct the original SQL, just reference 
the subplan
-* that appears elsewhere in EXPLAIN's result.
-*/
-   if (subplan->useHashTable)
-   appendStringInfo(buf, "(hashed %s)", 
subplan->plan_name);
-   else
-   appendStringInfo(buf, "(%s)", 
subplan->plan_name);
-   }
+   get_subplan_expr((SubPlan *) node, context);
break;
 
case T_AlternativeSubPlan:
@@ -10358,6 +10348,31 @@ simple_quote_literal(StringInfo buf, const char *val)
 }
 
 
+static void
+get_subplan_expr(SubPlan *subplan, deparse_context *context)
+{
+   StringInfo  buf = context->buf;
+
+   if (subplan->testexpr)
+   appendStringInfoChar(buf, '(');
+
+   (void) get_subselect_expr(subplan->subLinkType, subplan->testexpr, 
context);
+
+   /*
+* We cannot see an already-planned subplan in rule deparsing,
+* only while EXPLAINing a query plan.  We don't try to
+* reconstruct the original SQL, just reference the subplan
+* that appears elsewhere in EXPLAIN's result.
+*/
+   if (subplan->useHashTable)
+   appendStringInfo(buf, "(hashed %s)", subplan->plan_name);
+   else
+   appendStringInfo(buf, "(%s)", subplan->plan_name);
+
+   if (subplan->testexpr)
+   appendStringInfoChar(buf, ')');
+}
+
 /* --
  * get_sublink_expr- Parse back a sublink
  * --
@@ -10367,14 +10382,35 @@ get_sublink_expr(SubLink *sublink, deparse_context 
*context)
 {
StringInfo  buf = context->buf;
Query  *query = 

  1   2   3   >