Re: using extended statistics to improve join estimates

2024-05-22 Thread Andrei Lepikhov

On 5/23/24 09:04, Andy Fan wrote:

Andrei Lepikhov  writes:

* c) No extended stats with MCV. If there are multiple join clauses,
* we can try using ndistinct coefficients and do what eqjoinsel does.


OK, I didn't pay enough attention to this comment before. and yes, I get
the same conclusion as you -  there is no implementation of this.

and if so, I think we should remove the comments and do the
implementation in the next patch.

I have an opposite opinion about it:
1. distinct estimation is more universal thing - you can use it 
precisely on any subset of columns.
2. distinct estimation is faster - it just a number, you don't need to 
detoast huge array of values and compare them one-by-one.


So, IMO, it is essential part of join estimation and it should be 
implemented like in eqjoinsel.

Do you think the hook proposal is closely connected with the current
topic? IIUC it's seems not. So a dedicated thread to explain the problem
to slove and the proposal and the follwing discussion should be helpful
for both topics. I'm just worried about mixing the two in one thread
would make things complexer unnecessarily.

Sure.

--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC: GROUP BY optimization

2024-05-21 Thread Andrei Lepikhov

On 20/5/2024 15:54, jian he wrote:

As mentioned previously,
both A and B can use Incremental Sort, set_cheapest will choose the A
instead of B,
because A step generated path **first** satisfies increment sort.

Yeah, I agree with your analysis.
Looking into the cost_incremental_sort, I see that we have ten 
group_tuples. This value doesn't depend on how many presorted keys (1 or 
2) we use. This is caused by default estimation.
Given the current circumstances, it seems unlikely that we can make any 
significant changes without introducing a new sort cost model that 
accounts for the number of sorted columns.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: using extended statistics to improve join estimates

2024-05-21 Thread Andrei Lepikhov

On 5/20/24 16:40, Andrei Lepikhov wrote:

On 20/5/2024 15:52, Andy Fan wrote:

+    if (clauselist_selectivity_hook)
+    *return* clauselist_selectivity_hook(root, clauses, ..)
Of course - library may estimate not all the clauses - it is a reason, 
why I added input/output parameter 'estimatedclauses' by analogy with 
statext_clauselist_selectivity.

Here is a polished and a bit modified version of the hook proposed.
Additionally, I propose exporting the statext_mcv_clauselist_selectivity 
routine, likewise dependencies_clauselist_selectivity. This could 
potentially enhance the functionality of the PostgreSQL estimation code.


To clarify the purpose, I want an optional, loaded as a library, more 
conservative estimation based on distinct statistics. Let's provide (a 
bit degenerate) example:


CREATE TABLE is_test(x1 integer, x2 integer, x3 integer, x4 integer);
INSERT INTO is_test (x1,x2,x3,x4)
   SELECT x%5,x%7,x%11,x%13 FROM generate_series(1,1E3) AS x;
INSERT INTO is_test (x1,x2,x3,x4)
   SELECT 14,14,14,14 FROM generate_series(1,100) AS x;
CREATE STATISTICS ist_stat (dependencies,ndistinct)
  ON x1,x2,x3,x4 FROM is_test;
ANALYZE is_test;
EXPLAIN (ANALYZE, COSTS ON, SUMMARY OFF, TIMING OFF)
SELECT * FROM is_test WHERE x1=14 AND x2=14 AND x3=14 AND x4=14;
DROP TABLE is_test CASCADE;

I see:
(cost=0.00..15.17 rows=3 width=16) (actual rows=100 loops=1)

Dependency works great if it is the same for all the data in the 
columns. But we get underestimations if we have different laws for 
subsets of rows. So, if we don't have MCV statistics, sometimes we need 
to pass over dependency statistics and use ndistinct instead.


--
regards,
Andrei Lepikhov
Postgres Professional
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 0ab021c1e8..1508a1beea 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -128,6 +128,11 @@ clauselist_selectivity_ext(PlannerInfo *root,
 	ListCell   *l;
 	int			listidx;
 
+	if (clauselist_selectivity_hook)
+		s1 = clauselist_selectivity_hook(root, clauses, varRelid, jointype,
+		 sjinfo, ,
+		 use_extended_stats);
+
 	/*
 	 * If there's exactly one clause, just go directly to
 	 * clause_selectivity_ext(). None of what we might do below is relevant.
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 99fdf208db..b1722f5a60 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -1712,7 +1712,7 @@ statext_is_compatible_clause(PlannerInfo *root, Node *clause, Index relid,
  * 0-based 'clauses' indexes we estimate for and also skip clause items that
  * already have a bit set.
  */
-static Selectivity
+Selectivity
 statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varRelid,
    JoinType jointype, SpecialJoinInfo *sjinfo,
    RelOptInfo *rel, Bitmapset **estimatedclauses,
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 5f5d7959d8..ff98fda08c 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -146,6 +146,7 @@
 /* Hooks for plugins to get control when we ask for stats */
 get_relation_stats_hook_type get_relation_stats_hook = NULL;
 get_index_stats_hook_type get_index_stats_hook = NULL;
+clauselist_selectivity_hook_type clauselist_selectivity_hook = NULL;
 
 static double eqsel_internal(PG_FUNCTION_ARGS, bool negate);
 static double eqjoinsel_inner(Oid opfuncoid, Oid collation,
diff --git a/src/include/statistics/statistics.h b/src/include/statistics/statistics.h
index 7f2bf18716..436f30bdde 100644
--- a/src/include/statistics/statistics.h
+++ b/src/include/statistics/statistics.h
@@ -104,6 +104,14 @@ extern void BuildRelationExtStatistics(Relation onerel, bool inh, double totalro
 extern int	ComputeExtStatisticsRows(Relation onerel,
 	 int natts, VacAttrStats **vacattrstats);
 extern bool statext_is_kind_built(HeapTuple htup, char type);
+extern Selectivity statext_mcv_clauselist_selectivity(PlannerInfo *root,
+	  List *clauses,
+	  int varRelid,
+	  JoinType jointype,
+	  SpecialJoinInfo *sjinfo,
+	   RelOptInfo *rel,
+	   Bitmapset **estimatedclauses,
+	   bool is_or);
 extern Selectivity dependencies_clauselist_selectivity(PlannerInfo *root,
 	   List *clauses,
 	   int varRelid,
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index f2563ad1cb..253f584c65 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -148,6 +148,17 @@ typedef bool (*get_index_stats_hook_type) (PlannerInfo *root,
 		   VariableStatData *vardata);
 extern PGDLLIMPORT get_index_stats_hook_type get_index_stats_hook;
 
+/* Hooks for plugins to get control when we ask for selectivity estimation */
+typedef Selectivity

Re: using extended statistics to improve join estimates

2024-05-20 Thread Andrei Lepikhov

On 20/5/2024 15:52, Andy Fan wrote:


Hi Andrei,


On 4/3/24 01:22, Tomas Vondra wrote:

Cool! There's obviously no chance to get this into v18, and I have stuff
to do in this CF. But I'll take a look after that.

I'm looking at your patch now - an excellent start to an eagerly awaited
feature!
A couple of questions:
1. I didn't find the implementation of strategy 'c' - estimation by the
number of distinct values. Do you forget it?


What do you mean the "strategy 'c'"?

As described in 0001-* patch:
* c) No extended stats with MCV. If there are multiple join clauses,
* we can try using ndistinct coefficients and do what eqjoinsel does.




2. Can we add a clauselist selectivity hook into the core (something
similar the code in attachment)? It can allow the development and
testing of multicolumn join estimations without patching the core.


The idea LGTM. But do you want

+   if (clauselist_selectivity_hook)
+   s1 = clauselist_selectivity_hook(root, clauses, varRelid, 
jointype,
+

rather than

+   if (clauselist_selectivity_hook)
+   *return* clauselist_selectivity_hook(root, clauses, ..)
Of course - library may estimate not all the clauses - it is a reason, 
why I added input/output parameter 'estimatedclauses' by analogy with 
statext_clauselist_selectivity.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: using extended statistics to improve join estimates

2024-05-20 Thread Andrei Lepikhov

On 4/3/24 01:22, Tomas Vondra wrote:

Cool! There's obviously no chance to get this into v18, and I have stuff
to do in this CF. But I'll take a look after that.
I'm looking at your patch now - an excellent start to an eagerly awaited 
feature!

A couple of questions:
1. I didn't find the implementation of strategy 'c' - estimation by the 
number of distinct values. Do you forget it?
2. Can we add a clauselist selectivity hook into the core (something 
similar the code in attachment)? It can allow the development and 
testing of multicolumn join estimations without patching the core.


--
regards,
Andrei Lepikhov
Postgres Professional
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 0ab021c1e8..271d36a522 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -128,6 +128,9 @@ clauselist_selectivity_ext(PlannerInfo *root,
 	ListCell   *l;
 	int			listidx;
 
+	if (clauselist_selectivity_hook)
+		s1 = clauselist_selectivity_hook(root, clauses, varRelid, jointype,
+		 sjinfo, );
 	/*
 	 * If there's exactly one clause, just go directly to
 	 * clause_selectivity_ext(). None of what we might do below is relevant.
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 5f5d7959d8..ff98fda08c 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -146,6 +146,7 @@
 /* Hooks for plugins to get control when we ask for stats */
 get_relation_stats_hook_type get_relation_stats_hook = NULL;
 get_index_stats_hook_type get_index_stats_hook = NULL;
+clauselist_selectivity_hook_type clauselist_selectivity_hook = NULL;
 
 static double eqsel_internal(PG_FUNCTION_ARGS, bool negate);
 static double eqjoinsel_inner(Oid opfuncoid, Oid collation,
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index f2563ad1cb..ee28d3ba9b 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -148,6 +148,15 @@ typedef bool (*get_index_stats_hook_type) (PlannerInfo *root,
 		   VariableStatData *vardata);
 extern PGDLLIMPORT get_index_stats_hook_type get_index_stats_hook;
 
+/* Hooks for plugins to get control when we ask for selectivity estimation */
+typedef bool (*clauselist_selectivity_hook_type) (PlannerInfo *root,
+  List *clauses,
+  int varRelid,
+  JoinType jointype,
+  SpecialJoinInfo *sjinfo,
+  Bitmapset **estimatedclauses);
+extern PGDLLIMPORT clauselist_selectivity_hook_type clauselist_selectivity_hook;
+
 /* Functions in selfuncs.c */
 
 extern void examine_variable(PlannerInfo *root, Node *node, int varRelid,


Re: POC: GROUP BY optimization

2024-05-16 Thread Andrei Lepikhov

On 24.04.2024 13:25, jian he wrote:

hi.
I found an interesting case.

CREATE TABLE t1 AS
   SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS
z, i::int4 AS w
   FROM generate_series(1, 100) AS i;
CREATE INDEX t1_x_y_idx ON t1 (x, y);
ANALYZE t1;
SET enable_hashagg = off;
SET enable_seqscan = off;

EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,y,z;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,w,y;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,z,y;
the above part will use:
   ->  Incremental Sort
  Sort Key: x, $, $, $
  Presorted Key: x
  ->  Index Scan using t1_x_y_idx on t1

EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY z,y,w,x;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY w,y,z,x;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,z,x,w;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,w,x,z;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,x,z,w;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,x,w,z;

these will use:
   ->  Incremental Sort
  Sort Key: x, y, $, $
  Presorted Key: x, y
  ->  Index Scan using t1_x_y_idx on t1

I guess this is fine, but not optimal?
It looks like a bug right now - in current implementation we don't 
differentiate different orders. So:
1. Applying all the patches from the thread which I proposed as an 
answer to T.Lane last rebuke - does behavior still the same?.

2. Could you try to find the reason?

--
regards,
Andrei Lepikhov
Postgres Professional





Re: query_id, pg_stat_activity, extended query protocol

2024-05-15 Thread Andrei Lepikhov

On 15.05.2024 10:24, Imseih (AWS), Sami wrote:

I discovered the current state of queryId reporting and found that it
may be unlogical: Postgres resets queryId right before query execution
in simple protocol and doesn't reset it at all in extended protocol and
other ways to execute queries.


In exec_parse_message, exec_bind_message  and exec_execute_message,
the queryId is reset via pgstat_report_activity


I think we should generally report it when the backend executes a job
related to the query with that queryId. This means it would reset the
queryId at the end of the query execution.


When the query completes execution and the session goes into a state
other than "active", both the query text and the queryId should be of the
last executed statement. This is the documented behavior, and I believe
it's the correct behavior.

I discovered this case a bit.
As I can see, the origin of the problem is that the exec_execute_message 
report STATE_RUNNING, although  ExecutorStart was called in the 
exec_bind_message routine beforehand.
I'm unsure if it needs to call ExecutorStart in the bind code. But if we 
don't change the current logic, would it make more sense to move 
pgstat_report_query_id to the ExecutorRun routine?


--
regards, Andrei Lepikhov





Re: query_id, pg_stat_activity, extended query protocol

2024-05-15 Thread Andrei Lepikhov

On 15/5/2024 12:09, Michael Paquier wrote:

On Wed, May 15, 2024 at 03:24:05AM +, Imseih (AWS), Sami wrote:

I think we should generally report it when the backend executes a job
related to the query with that queryId. This means it would reset the
queryId at the end of the query execution.


When the query completes execution and the session goes into a state
other than "active", both the query text and the queryId should be of the
last executed statement. This is the documented behavior, and I believe
it's the correct behavior.

If we reset queryId at the end of execution, this behavior breaks. Right?


Idle sessions keep track of the last query string run, hence being
consistent in pg_stat_activity and report its query ID is user
friendly.  Resetting it while keeping the string is less consistent.
It's been this way for years, so I'd rather let it be this way.
Okay, that's what I precisely wanted to understand: queryId doesn't have 
semantics to show the job that consumes resources right now—it is mostly 
about convenience to know that the backend processes nothing except 
(probably) this query.


--
regards, Andrei Lepikhov





Re: query_id, pg_stat_activity, extended query protocol

2024-05-08 Thread Andrei Lepikhov

On 5/1/24 10:07, Imseih (AWS), Sami wrote:

Here is a new rev of the patch which deals with the scenario
mentioned by Andrei [1] in which the queryId may change
due to a cached query invalidation.


[1] 
https://www.postgresql.org/message-id/724348C9-8023-41BC-895E-80634E79A538%40amazon.com
I discovered the current state of queryId reporting and found that it 
may be unlogical: Postgres resets queryId right before query execution 
in simple protocol and doesn't reset it at all in extended protocol and 
other ways to execute queries.
I think we should generally report it when the backend executes a job 
related to the query with that queryId. This means it would reset the 
queryId at the end of the query execution.
However, the process of setting up the queryId is more complex. Should 
we set it at the beginning of query execution? This seems logical, but 
what about the planning process? If an extension plans a query without 
the intention to execute it for speculative reasons, should we still 
show the queryId? Perhaps we should reset the state right after planning 
to accurately reflect the current queryId.
See in the attachment some sketch for that - it needs to add queryId 
reset on abortion.


--
regards, Andrei Lepikhov
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4d7c92d63c..a4d38a157f 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -470,6 +470,12 @@ ExecutorEnd(QueryDesc *queryDesc)
 		(*ExecutorEnd_hook) (queryDesc);
 	else
 		standard_ExecutorEnd(queryDesc);
+
+	/*
+	 * Report at the end of query execution. Don't change it for nested
+	 * queries.
+	 */
+	pgstat_report_query_id(0, false);
 }
 
 void
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 032818423f..ba29dc5bc7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -58,6 +58,7 @@
 #include "parser/parse_relation.h"
 #include "parser/parsetree.h"
 #include "partitioning/partdesc.h"
+#include "pgstat.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/selfuncs.h"
@@ -280,6 +281,13 @@ planner(Query *parse, const char *query_string, int cursorOptions,
 		result = (*planner_hook) (parse, query_string, cursorOptions, boundParams);
 	else
 		result = standard_planner(parse, query_string, cursorOptions, boundParams);
+
+	/*
+	 * Reset queryId at the end of planning job. Executor code will set it up
+	 * again on the ExecutorStart call.
+	 */
+	pgstat_report_query_id(0, false);
+
 	return result;
 }
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 2dff28afce..10e2529cf6 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1108,8 +1108,6 @@ exec_simple_query(const char *query_string)
 		const char *cmdtagname;
 		size_t		cmdtaglen;
 
-		pgstat_report_query_id(0, true);
-
 		/*
 		 * Get the command name for use in status display (it also becomes the
 		 * default completion tag, down inside PortalRun).  Set ps_status and


Re: Removing unneeded self joins

2024-05-06 Thread Andrei Lepikhov

On 6/5/2024 21:44, Robert Haas wrote:

On Sat, May 4, 2024 at 10:46 PM Andrei Lepikhov
 wrote:

Having no objective negative feedback, we have no reason to change
anything in the design or any part of the code. It looks regrettable and
unusual.


To me, this sounds like you think it's someone else's job to tell you
what is wrong with the patch, or how to fix it, and if they don't,
then you should get to have the patch as part of PostgreSQL. But that
is not how we do things, nor should we. I agree that it sucks when you
need feedback and don't get it, and I've written about that elsewhere
and recently. But if you don't get feedback and as a result you can't
get the patch to an acceptable level, 

I'm really sorry that the level of my language caused a misunderstanding.
The main purpose of this work is to form a more or less certain view of 
the direction of the planner's development.
Right now, it evolves extensively - new structures, variables, 
alternative copies of the same node trees with slightly changed 
properties ... This way allows us to quickly introduce some planning 
features (a lot of changes in planner logic since PG16 is evidence of 
that) and with still growing computing resources it allows postgres to 
fit RAM and proper planning time. But maybe we want to be more modest? 
The Ashutosh's work he has been doing this year shows how sometimes 
expensive the planner is. Perhaps we want machinery that will check the 
integrity of planning data except the setrefs, which fail to detect that 
occasionally?
If an extensive approach is the only viable option, then it's clear that 
this and many other features are simply not suitable for Postgres 
Planner. It's disheartening that this patch didn't elicit such 
high-level feedback.


--
regards,
Andrei Lepikhov





Re: Removing unneeded self joins

2024-05-04 Thread Andrei Lepikhov

On 3/5/2024 20:55, Robert Haas wrote:

One of my most embarrassing gaffes in this area personally was
a448e49bcbe40fb72e1ed85af910dd216d45bad8. I don't know how I managed
to commit the original patch without realizing it was going to cause
an increase in the WAL size, but I can tell you that when I realized
it, my heart sank through the floor.

I discovered this feature and agree that it looks like a severe problem.
Unfortunately, in the case of the SJE patch, the committer and reviewers 
don't provide negative feedback. We see the only (I'm not sure I use the 
proper English phrase) 'negative feelings' from people who haven't 
reviewed or analysed it at all (at least, they didn't mention it).


Considering the situation, I suggest setting the default value of 
enable_self_join_removal to false in PG17 for added safety and then 
changing it to true in early PG18.


Having no objective negative feedback, we have no reason to change 
anything in the design or any part of the code. It looks regrettable and 
unusual.


After designing the feature, fixing its bugs, and reviewing joint 
patches on the commitfest, the question more likely lies in the planner 
design. For example, I wonder if anyone here knows why exactly the 
optimiser makes a copy of the whole query subtree in some places. 
Another example is PlannerInfo. Can we really control all the 
consequences of introducing, let's say, a new JoinDomain entity?


You also mentioned 2024.pgconf.dev. Considering the current migration 
policy in some countries, it would be better to work through the online 
presence as equivalent to offline. Without an online part of the 
conference, the only way to communicate and discuss is through this 
mailing list.


--
regards,
Andrei Lepikhov





Re: Removing unneeded self joins

2024-05-02 Thread Andrei Lepikhov

On 5/3/24 06:19, Tom Lane wrote:

Alexander Korotkov  writes:

I would appreciate your review of this patchset, and review from Andrei as well.


I hate to say this ... but if we're still finding bugs this
basic in SJE, isn't it time to give up on it for v17?

I might feel better about it if there were any reason to think
these were the last major bugs.  But you have already committed
around twenty separate fixes for the original SJE patch, and
now here you come with several more; so it doesn't seem like
the defect rate has slowed materially.  There can be no doubt
whatever that the original patch was far from commit-ready.

I think we should revert SJE for v17 and do a thorough design
review before trying again in v18.

I need to say I don't see any evidence of bad design.
I think this feature follows the example of 2489d76 [1], 1349d27 [2], 
and partitionwise join features — we get some issues from time to time, 
but these strengths and frequencies are significantly reduced.
First and foremost, this feature is highly isolated: like the PWJ 
feature, you can just disable (not enable?) SJE and it guarantees you 
will avoid the problems.
Secondly, this feature reflects the design decisions the optimiser has 
made before. It raises some questions: Do we really control the 
consistency of our paths and the plan tree? Maybe we hide our 
misunderstanding of its logic by extensively copying expression trees, 
sometimes without fundamental necessity. Perhaps the optimiser needs 
some abstraction layers or reconstruction to reduce the quickly growing 
complexity.
A good example here is [1]. IMO, the new promising feature it has 
introduced isn't worth the complexity it added to the planner.
SJE, much like OR <-> ANY transformation, introduces a fresh perspective 
into the planner: if we encounter a complex, redundant query, it may be 
more beneficial to invest in simplifying the internal query 
representation rather than adding new optimisations that will grapple 
with this complexity.
Also, SJE raised questions I've never seen before, like: Could we 
control the consistency of the PlannerInfo by changing something in the 
logic?
Considering the current state, I don't see any concrete outcomes or 
evidence that a redesign of the feature will lead us to a new path. 
However, I believe the presence of SJE in the core could potentially 
trigger improvements in the planner. As a result, I vote to stay with 
the feature. But remember, as an author, I'm not entirely objective, so 
let's wait for alternative opinions.


[1] Make Vars be outer-join-aware
[2] Improve performance of ORDER BY / DISTINCT aggregates

--
regards,
Andrei Lepikhov
Postgres Professional





Re: Removing unneeded self joins

2024-05-02 Thread Andrei Lepikhov

On 5/1/24 18:59, Alexander Korotkov wrote:

I think we probably could forbid SJE for the tables with TABLESAMPLE
altogether.  Please, check the attached patch.
Your patch looks good to me. I added some comments and test case into 
the join.sql.


One question for me is: Do we anticipate other lateral self-references 
except the TABLESAMPLE case? Looking into the extract_lateral_references 
implementation, I see the only RTE_SUBQUERY case to be afraid of. But we 
pull up subqueries before extracting lateral references. So, if we have 
a reference to a subquery, it means we will not flatten this subquery 
and don't execute SJE. Do we need more code, as you have written in the 
first patch?


--
regards,
Andrei Lepikhov
Postgres Professional
From dac8afd2095586921dfcf5564e7f2979e89b77de Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" 
Date: Thu, 2 May 2024 16:17:52 +0700
Subject: [PATCH] Forbid self-join elimination on table sampling scans.

Having custom table sampling methods we can stuck into unpredictable issues
replacing join with scan operation. It may had sense to analyse possible
situations and enable SJE, but the real profit from this operation looks
too low.
---
 src/backend/optimizer/plan/analyzejoins.c |  8 
 src/backend/optimizer/plan/initsplan.c|  5 -
 src/test/regress/expected/join.out| 13 +
 src/test/regress/sql/join.sql |  5 +
 4 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 506fccd20c..bb89558dcd 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -2148,6 +2148,14 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids)
 			Assert(root->simple_rte_array[k]->relid ==
    root->simple_rte_array[r]->relid);
 
+			/*
+			 * To avoid corner cases with table sampling methods just forbid
+			 * SJE for table sampling entries.
+			 */
+			if (root->simple_rte_array[k]->tablesample ||
+root->simple_rte_array[r]->tablesample)
+continue;
+
 			/*
 			 * It is impossible to eliminate join of two relations if they
 			 * belong to different rules of order. Otherwise planner can't be
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index e2c68fe6f9..bf839bcaf6 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -415,7 +415,10 @@ extract_lateral_references(PlannerInfo *root, RelOptInfo *brel, Index rtindex)
 	if (!rte->lateral)
 		return;
 
-	/* Fetch the appropriate variables */
+	/* Fetch the appropriate variables.
+	 * Changes in this place may need changes in SJE logic, see
+	 * the remove_self_joins_one_group routine.
+	 */
 	if (rte->rtekind == RTE_RELATION)
 		vars = pull_vars_of_level((Node *) rte->tablesample, 0);
 	else if (rte->rtekind == RTE_SUBQUERY)
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 8b640c2fc2..63143fe55f 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6900,6 +6900,19 @@ where s1.x = 1;
Filter: (1 = 1)
 (9 rows)
 
+-- Check that SJE doesn't touch TABLESAMPLE joins
+explain (costs off)
+SELECT * FROM emp1 t1 NATURAL JOIN LATERAL
+  (SELECT * FROM emp1 t2 TABLESAMPLE SYSTEM (t1.code));
+ QUERY PLAN  
+-
+ Nested Loop
+   ->  Seq Scan on emp1 t1
+   ->  Sample Scan on emp1 t2
+ Sampling: system (t1.code)
+ Filter: ((t1.id = id) AND (t1.code = code))
+(5 rows)
+
 -- Check that PHVs do not impose any constraints on removing self joins
 explain (verbose, costs off)
 select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index c4c6c7b8ba..184fd0876b 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2652,6 +2652,11 @@ select 1 from emp1 t1 left join
 on true
 where s1.x = 1;
 
+-- Check that SJE doesn't touch TABLESAMPLE joins
+explain (costs off)
+SELECT * FROM emp1 t1 NATURAL JOIN LATERAL
+  (SELECT * FROM emp1 t2 TABLESAMPLE SYSTEM (t1.code));
+
 -- Check that PHVs do not impose any constraints on removing self joins
 explain (verbose, costs off)
 select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
-- 
2.39.2



Re: query_id, pg_stat_activity, extended query protocol

2024-04-27 Thread Andrei Lepikhov

On 27/4/2024 20:54, Imseih (AWS), Sami wrote:

But simplistic case with a prepared statement shows how the value of
queryId can be changed if you don't acquire all the objects needed for
the execution:




CREATE TABLE test();
PREPARE name AS SELECT * FROM test;
EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;
DROP TABLE test;
CREATE TABLE test();
EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;


Hmm, you raise a good point. Isn't this a fundamental problem
with prepared statements? If there is DDL on the
relations of the prepared statement query, shouldn't the prepared
statement be considered invalid at that point and raise an error
to the user?
I don't think so. It may be any object, even stored procedure, that can 
be changed. IMO, the right option here is to report zero (like the 
undefined value of queryId) until the end of the parsing stage.


--
regards, Andrei Lepikhov





Re: query_id, pg_stat_activity, extended query protocol

2024-04-23 Thread Andrei Lepikhov

On 4/23/24 12:49, Michael Paquier wrote:

On Tue, Apr 23, 2024 at 11:42:41AM +0700, Andrei Lepikhov wrote:

On 23/4/2024 11:16, Imseih (AWS), Sami wrote:

+   pgstat_report_query_id(linitial_node(Query, psrc->query_list)->queryId, 
true);
  set_ps_display("BIND");
@@ -2146,6 +2147,7 @@ exec_execute_message(const char *portal_name, long 
max_rows)
  debug_query_string = sourceText;
  pgstat_report_activity(STATE_RUNNING, sourceText);
+   pgstat_report_query_id(portal->queryDesc->plannedstmt->queryId, true);
  cmdtagname = GetCommandTagNameAndLen(portal->commandTag, );


In exec_bind_message, how can you be sure that queryId exists in query_list
before the call of GetCachedPlan(), which will validate and lock the plan?
What if some OIDs were altered in the middle?


I am also a bit surprised with the choice of using the first Query
available in the list for the ID, FWIW.

Did you consider using \bind to show how this behaves in a regression
test?
I'm not sure how to invent a test based on the \bind command - we need 
some pause in the middle.
But simplistic case with a prepared statement shows how the value of 
queryId can be changed if you don't acquire all the objects needed for 
the execution:


CREATE TABLE test();
PREPARE name AS SELECT * FROM test;
EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;
DROP TABLE test;
CREATE TABLE test();
EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name;

/*
QUERY PLAN
---
 Seq Scan on public.test (actual time=0.002..0.004 rows=0 loops=1)
 Query Identifier: 6750745711909650694

QUERY PLAN
---
 Seq Scan on public.test (actual time=0.004..0.004 rows=0 loops=1)
 Query Identifier: -2597546769858730762
*/

We have different objects which can be changed - I just have invented 
the most trivial example to discuss.


--
regards, Andrei Lepikhov





Re: query_id, pg_stat_activity, extended query protocol

2024-04-22 Thread Andrei Lepikhov

On 23/4/2024 11:16, Imseih (AWS), Sami wrote:

FWIW, I'd like to think that we could improve the situation, requiring
a mix of calling pgstat_report_query_id() while feeding on some query
IDs retrieved from CachedPlanSource->query_list. I have not in
details looked at how much could be achieved, TBH.


I was dealing with this today and found this thread. I spent some time
looking at possible solutions.

In the flow of extended query protocol, the exec_parse_message
reports the queryId, but subsequent calls to exec_bind_message
and exec_execute_message reset the queryId when calling
pgstat_report_activity(STATE_RUNNING,..) as you can see below.
   
  /*

   * If a new query is started, we reset the query identifier as it'll only
   * be known after parse analysis, to avoid reporting last query's
   * identifier.
   */
  if (state == STATE_RUNNING)
  beentry->st_query_id = UINT64CONST(0);


So, I think the simple answer is something like the below.
Inside exec_bind_message and exec_execute_message,
the query_id should be reported after pg_report_activity.

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 76f48b13d2..7ec2df91d5 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1678,6 +1678,7 @@ exec_bind_message(StringInfo input_message)
 debug_query_string = psrc->query_string;
  
 pgstat_report_activity(STATE_RUNNING, psrc->query_string);

+   pgstat_report_query_id(linitial_node(Query, psrc->query_list)->queryId, 
true);
  
 set_ps_display("BIND");
  
@@ -2146,6 +2147,7 @@ exec_execute_message(const char *portal_name, long max_rows)

 debug_query_string = sourceText;
  
 pgstat_report_activity(STATE_RUNNING, sourceText);

+   pgstat_report_query_id(portal->queryDesc->plannedstmt->queryId, true);
  
 cmdtagname = GetCommandTagNameAndLen(portal->commandTag, );



thoughts?
In exec_bind_message, how can you be sure that queryId exists in 
query_list before the call of GetCachedPlan(), which will validate and 
lock the plan? What if some OIDs were altered in the middle?


--
regards, Andrei Lepikhov





Re: POC: GROUP BY optimization

2024-04-21 Thread Andrei Lepikhov

On 4/12/24 06:44, Tom Lane wrote:

If this patch were producing better results I'd be more excited
about putting more work into it.  But on the basis of what I'm
seeing right now, I think maybe we ought to give up on it.
Let me show current cases where users have a profit with this tiny 
improvement (see queries and execution results in query.sql):
1. 'Not optimised query text' — when we didn't consider group-by 
ordering during database development.
2. 'Accidental pathkeys' - we didn't see any explicit orderings, but 
accidentally, the planner used merge join that caused some orderings and 
we can utilise it.
3. 'Uncertain scan path' — We have options regarding which index to use, 
and because of that, we can't predict the optimal group-by ordering 
before the start of query planning.
4. 'HashAgg V/S GroupAgg' — sometimes, the GroupAgg strategy outperforms 
HashAgg just because we don't need any ordering procedure at all.


And the last thing here — this code introduces the basics needed to add 
more sophisticated strategies, like ordering according to uniqueness or 
preferring to set constant-width columns first in grouping.


--
regards,
Andrei Lepikhov
Postgres Professional


query.sql
Description: application/sql


Re: POC: GROUP BY optimization

2024-04-18 Thread Andrei Lepikhov

On 4/12/24 06:44, Tom Lane wrote:

* Speaking of pathnodes.h, PathKeyInfo is a pretty uninformative node
type name, and the comments provided for it are not going to educate
anybody.  What is the "association" between the pathkeys and clauses?
I guess the clauses are supposed to be SortGroupClauses, but not all
pathkeys match up to SortGroupClauses, so what then?  This is very
underspecified, and fuzzy thinking about data structures tends to lead
to bugs.
I'm not the best in English documentation and naming style. So, feel 
free to check my version.


So I'm quite afraid that there are still bugs lurking here.
What's more, given that the plans this patch makes apparently
seldom win when you don't put a thumb on the scales, it might
take a very long time to isolate those bugs.  If the patch
produces a faulty plan (e.g. not sorted properly) we'd never
notice if that plan isn't chosen, and even if it is chosen
it probably wouldn't produce anything as obviously wrong as
a crash.

I added checkings on the proper order of pathkeys and clauses.
 If you really care about that, we should spend additional time and 
rewrite the code to generate an order of clauses somewhere in the plan 
creation phase. For example, during the create_group_plan call, we could 
use the processed_groupClause list, cycle through subpath->pathkeys, set 
the order, and detect whether the pathkeys list corresponds to the 
group-by or is enough to build a grouping plan.

Anyway, make this part of code more resistant to blunders is another story.

--
regards,
Andrei Lepikhov
Postgres Professional
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index aa80f6486c..9c079270ec 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -461,7 +461,7 @@ group_keys_reorder_by_pathkeys(List *pathkeys, List **group_pathkeys,
 /*
  * pathkeys_are_duplicate
  *		Check if give pathkeys are already contained the list of
- *		PathKeyInfo's.
+ *		GroupByOrdering's.
  */
 static bool
 pathkeys_are_duplicate(List *infos, List *pathkeys)
@@ -470,7 +470,7 @@ pathkeys_are_duplicate(List *infos, List *pathkeys)
 
 	foreach(lc, infos)
 	{
-		PathKeyInfo *info = lfirst_node(PathKeyInfo, lc);
+		GroupByOrdering *info = lfirst_node(GroupByOrdering, lc);
 
 		if (compare_pathkeys(pathkeys, info->pathkeys) == PATHKEYS_EQUAL)
 			return true;
@@ -482,7 +482,7 @@ pathkeys_are_duplicate(List *infos, List *pathkeys)
  * get_useful_group_keys_orderings
  *		Determine which orderings of GROUP BY keys are potentially interesting.
  *
- * Returns a list of PathKeyInfo items, each representing an interesting
+ * Returns a list of GroupByOrdering items, each representing an interesting
  * ordering of GROUP BY keys.  Each item stores pathkeys and clauses in the
  * matching order.
  *
@@ -495,15 +495,15 @@ pathkeys_are_duplicate(List *infos, List *pathkeys)
 List *
 get_useful_group_keys_orderings(PlannerInfo *root, Path *path)
 {
-	Query		   *parse = root->parse;
-	List		   *infos = NIL;
-	PathKeyInfo	   *info;
+	Query			   *parse = root->parse;
+	List			   *infos = NIL;
+	GroupByOrdering	   *info;
 
 	List	   *pathkeys = root->group_pathkeys;
 	List	   *clauses = root->processed_groupClause;
 
 	/* always return at least the original pathkeys/clauses */
-	info = makeNode(PathKeyInfo);
+	info = makeNode(GroupByOrdering);
 	info->pathkeys = pathkeys;
 	info->clauses = clauses;
 	infos = lappend(infos, info);
@@ -539,7 +539,7 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path)
 			(enable_incremental_sort || n == root->num_groupby_pathkeys) &&
 			!pathkeys_are_duplicate(infos, pathkeys))
 		{
-			info = makeNode(PathKeyInfo);
+			info = makeNode(GroupByOrdering);
 			info->pathkeys = pathkeys;
 			info->clauses = clauses;
 
@@ -564,7 +564,7 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path)
 			(enable_incremental_sort || n == list_length(root->sort_pathkeys)) &&
 			!pathkeys_are_duplicate(infos, pathkeys))
 		{
-			info = makeNode(PathKeyInfo);
+			info = makeNode(GroupByOrdering);
 			info->pathkeys = pathkeys;
 			info->clauses = clauses;
 
@@ -574,18 +574,29 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path)
 
 #ifdef USE_ASSERT_CHECKING
 	{
-		PathKeyInfo	   *pinfo = linitial_node(PathKeyInfo, infos);
+		GroupByOrdering	   *pinfo = linitial_node(GroupByOrdering, infos);
 		ListCell	   *lc;
 
 		/* Test consistency of info structures */
 		for_each_from(lc, infos, 1)
 		{
-			info = lfirst_node(PathKeyInfo, lc);
+			ListCell *lc1, *lc2;
+
+			info = lfirst_node(GroupByOrdering, lc);
 
 			Assert(list_length(info->clauses) == list_length(pinfo->clauses));
 			Assert(list_length(info->pathkeys) == list_length(pinfo->pathkeys));
 			Assert(list_difference(info->clauses, pinfo->clauses) == NIL);
 			Assert(list_difference_ptr(info->pathkeys, pinfo->pathkeys) == 

Re: POC: GROUP BY optimization

2024-04-17 Thread Andrei Lepikhov

On 4/12/24 06:44, Tom Lane wrote:

* It seems like root->processed_groupClause no longer has much to do
with the grouping behavior, which is scary given how much code still
believes that it does.  I suspect there are bugs lurking there, and

1. Analysing the code, processed_groupClause is used in:
- adjust_foreign_grouping_path_cost - to decide, do we need to add cost 
of sort to the foreign grouping.

- just for replacing relids during SJE process.
- create_groupingsets_plan(), preprocess_grouping_sets, 
remove_useless_groupby_columns - we don't apply this feature in the case 
of grouping sets.
- get_number_of_groups - estimation shouldn't depend on the order of 
clauses.

- create_grouping_paths - to analyse hashability of grouping clause
- make_group_input_target, make_partial_grouping_target - doesn't depend 
on the order of clauses
planner.c: add_paths_to_grouping_rel, create_partial_grouping_paths - in 
the case of hash grouping.


So, the only case we can impact current behavior lies in the 
postgres_fdw. But here we don't really know which plan will be chosen 
during planning on foreign node and stay the same behavior is legal for me.

am not comforted by the fact that the patch changed exactly nothing
in the pathnodes.h documentation of that field.  This comment looks
pretty silly now too:

 /* Preprocess regular GROUP BY clause, if any */
 root->processed_groupClause = list_copy(parse->groupClause);

What "preprocessing" is going on there?  This comment was adequate
before, when the code line invoked preprocess_groupclause which had
a bunch of commentary; but now it has to stand on its own and it's
not doing a great job of that.

Thanks for picking this place. I fixed it.
More interesting here is that we move decisions on the order of group-by 
clauses to the stage, where we already have underlying subtree and 
incoming path keys. In principle, we could return the preprocessing 
routine and arrange GROUP-BY with the ORDER-BY clause as it was before. 
But looking into the code, I found only one reason to do this: 
adjust_group_pathkeys_for_groupagg. Curious about how much profit we get 
here, I think we can discover it later with no hurry. A good outcome 
here will be a test case that can show the importance of arranging 
GROUP-BY and ORDER-BY at an early stage.


--
regards,
Andrei Lepikhov
Postgres Professional
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 5ea3705796..861656a192 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1424,7 +1424,10 @@ grouping_planner(PlannerInfo *root, double tuple_fraction,
 		}
 		else if (parse->groupClause)
 		{
-			/* Preprocess regular GROUP BY clause, if any */
+			/*
+			 * Make a copy of origin groupClause because we are going to
+			 * remove redundant clauses.
+			 */
 			root->processed_groupClause = list_copy(parse->groupClause);
 			/* Remove any redundant GROUP BY columns */
 			remove_useless_groupby_columns(root);


Re: POC: GROUP BY optimization

2024-04-16 Thread Andrei Lepikhov

On 4/12/24 06:44, Tom Lane wrote:

* I'm pretty unconvinced by group_keys_reorder_by_pathkeys (which
I notice has already had one band-aid added to it since commit).
In particular, it seems to believe that the pathkeys and clauses
lists match one-for-one, but I seriously doubt that that invariant
remains guaranteed after the cleanup steps

 /* append the remaining group pathkeys (will be treated as not sorted) */
 *group_pathkeys = list_concat_unique_ptr(new_group_pathkeys,
  *group_pathkeys);
 *group_clauses = list_concat_unique_ptr(new_group_clauses,
 *group_clauses);

For that to be reliable, the SortGroupClauses added to
new_group_clauses in the main loop have to be exactly those
that are associated with the same pathkeys in the old lists.
I doubt that that's necessarily true in the presence of redundant
grouping clauses.  (Maybe we can't get here with any redundant
grouping clauses, but still, we don't really guarantee uniqueness of
SortGroupClauses, much less that they are never copied which is what
you need if you want to believe that pointer equality is sufficient
for de-duping here.  PathKeys are explicitly made to be safe to compare
pointer-wise, but I know of no such guarantee for SortGroupClauses.)
I spent a lot of time inventing situations with SortGroupClause 
duplicates. Unfortunately, it looks impossible so far. But because we 
really don't guarantee uniqueness, I changed the code to survive in this 
case. Also, I added assertion checking to be sure we don't have logical 
mistakes here - see attachment.
About the band-aid mentioned above - as I see, 4169850 introduces the 
same trick in planner.c. So, it looks like result of design of the 
current code.


--
regards,
Andrei Lepikhov
Postgres Professional
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 5d9597adcd..aa80f6486c 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -380,15 +380,18 @@ group_keys_reorder_by_pathkeys(List *pathkeys, List **group_pathkeys,
 
 	/*
 	 * We're going to search within just the first num_groupby_pathkeys of
-	 * *group_pathkeys.  The thing is that root->group_pathkeys is passed as
+	 * *group_pathkeys. The thing is that root->group_pathkeys is passed as
 	 * *group_pathkeys containing grouping pathkeys altogether with aggregate
-	 * pathkeys.  If we process aggregate pathkeys we could get an invalid
+	 * pathkeys. If we process aggregate pathkeys we could get an invalid
 	 * result of get_sortgroupref_clause_noerr(), because their
-	 * pathkey->pk_eclass->ec_sortref doesn't referece query targetlist.  So,
+	 * pathkey->pk_eclass->ec_sortref doesn't reference query targetlist. So,
 	 * we allocate a separate list of pathkeys for lookups.
 	 */
 	grouping_pathkeys = list_copy_head(*group_pathkeys, num_groupby_pathkeys);
 
+	/* Make a new copy before reordering clauses */
+	*group_clauses = list_copy(*group_clauses);
+
 	/*
 	 * Walk the pathkeys (determining ordering of the input path) and see if
 	 * there's a matching GROUP BY key. If we find one, we append it to the
@@ -400,8 +403,8 @@ group_keys_reorder_by_pathkeys(List *pathkeys, List **group_pathkeys,
 	 */
 	foreach(lc, pathkeys)
 	{
-		PathKey*pathkey = (PathKey *) lfirst(lc);
-		SortGroupClause *sgc;
+		PathKey			   *pathkey = (PathKey *) lfirst(lc);
+		SortGroupClause	   *sgc;
 
 		/*
 		 * Pathkeys are built in a way that allows simply comparing pointers.
@@ -431,17 +434,25 @@ group_keys_reorder_by_pathkeys(List *pathkeys, List **group_pathkeys,
 		Assert(OidIsValid(sgc->sortop));
 
 		new_group_pathkeys = lappend(new_group_pathkeys, pathkey);
+
+		/*
+		 * Keeping in mind that the SortGroupClause list doesn't guarantee
+		 * unique pointers we must explicitly transfer elements one-by-one.
+		 */
 		new_group_clauses = lappend(new_group_clauses, sgc);
+		*group_clauses = list_delete_ptr(*group_clauses, sgc);
 	}
 
 	/* remember the number of pathkeys with a matching GROUP BY key */
 	n = list_length(new_group_pathkeys);
 
-	/* append the remaining group pathkeys (will be treated as not sorted) */
+	/*
+	 * Append the remaining group pathkeys (will be treated as not sorted) and
+	 * grouping clauses.
+	 */
 	*group_pathkeys = list_concat_unique_ptr(new_group_pathkeys,
 			 *group_pathkeys);
-	*group_clauses = list_concat_unique_ptr(new_group_clauses,
-			*group_clauses);
+	*group_clauses = list_concat(new_group_clauses, *group_clauses);
 
 	list_free(grouping_pathkeys);
 	return n;
@@ -484,9 +495,9 @@ pathkeys_are_duplicate(List *infos, List *pathkeys)
 List *
 get_useful_group_keys_orderings(PlannerInfo *root, Path *path)
 {
-	Query	   *parse = root->parse;
-	List	   *infos = NIL;
-	PathKeyInfo *info;
+	Query		   *parse = root->parse;
+	List		   *infos = NIL;
+	PathKeyInfo	   *info;
 
 	List	   *pathkeys = r

Re: POC: GROUP BY optimization

2024-04-16 Thread Andrei Lepikhov

On 4/12/24 06:44, Tom Lane wrote:

* The very first hunk causes get_eclass_for_sort_expr to have
side-effects on existing EquivalenceClass data structures.
What is the argument that that's not going to cause problems?
At the very least it introduces asymmetry, in that the first
caller wins the chance to change the EC's ec_sortref and later
callers won't be able to.

Let me resolve issues bit by bit.
Addressing your first note, 'asymmetry,' I discovered this part of the 
code. Before the 8d83a5d, it could cause some artefacts, but since then, 
a kind of asymmetry has been introduced into the GROUP-BY clause list.
As you can see in the attached patch, GROUP-BY reordering works even 
when the asymmetry of the 8d83a5d chooses different keys.


At the same time, I agree that implicitly setting anything in an 
exporting function, which should just look up for EC, is a questionable 
coding style. To avoid possible usage issues (in extensions, for 
example), I propose setting it up afterwards, explicitly forcing this 
action by input parameter - see my attempt in the attachment.


--
regards,
Andrei Lepikhov
Postgres Professional
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index a619ff9177..bc08dfadaf 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -652,18 +652,7 @@ get_eclass_for_sort_expr(PlannerInfo *root,
 
 			if (opcintype == cur_em->em_datatype &&
 equal(expr, cur_em->em_expr))
-			{
-/*
- * Match!
- *
- * Copy the sortref if it wasn't set yet.  That may happen if
- * the ec was constructed from a WHERE clause, i.e. it doesn't
- * have a target reference at all.
- */
-if (cur_ec->ec_sortref == 0 && sortref > 0)
-	cur_ec->ec_sortref = sortref;
-return cur_ec;
-			}
+return cur_ec; /* Match! */
 		}
 	}
 
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 1d61881a6b..5d9597adcd 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -1355,7 +1355,8 @@ make_pathkeys_for_sortclauses(PlannerInfo *root,
 	,
 	tlist,
 	false,
-	);
+	,
+	false);
 	/* It's caller error if not all clauses were sortable */
 	Assert(sortable);
 	return result;
@@ -1379,13 +1380,16 @@ make_pathkeys_for_sortclauses(PlannerInfo *root,
  * to remove any clauses that can be proven redundant via the eclass logic.
  * Even though we'll have to hash in that case, we might as well not hash
  * redundant columns.)
+ * *set_ec_sortref is true if caller wants to set up ec_sortref field in
+ * the pathkey Equivalence Class, if it have not initialized beforehand.
  */
 List *
 make_pathkeys_for_sortclauses_extended(PlannerInfo *root,
 	   List **sortclauses,
 	   List *tlist,
 	   bool remove_redundant,
-	   bool *sortable)
+	   bool *sortable,
+	   bool set_ec_sortref)
 {
 	List	   *pathkeys = NIL;
 	ListCell   *l;
@@ -1409,6 +1413,13 @@ make_pathkeys_for_sortclauses_extended(PlannerInfo *root,
 		   sortcl->nulls_first,
 		   sortcl->tleSortGroupRef,
 		   true);
+		if (pathkey->pk_eclass->ec_sortref == 0 && set_ec_sortref)
+			/*
+			 * Copy the sortref if it wasn't set yet.  That may happen if
+			 * the ec was constructed from a WHERE clause, i.e. it doesn't
+			 * have a target reference at all.
+			 */
+			pathkey->pk_eclass->ec_sortref = sortcl->tleSortGroupRef;
 
 		/* Canonical form eliminates redundant ordering keys */
 		if (!pathkey_is_redundant(pathkey, pathkeys))
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 5320da51a0..dee98c9a90 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3391,7 +3391,7 @@ standard_qp_callback(PlannerInfo *root, void *extra)
 		/*
 		 * With a plain GROUP BY list, we can remove any grouping items that
 		 * are proven redundant by EquivalenceClass processing.  For example,
-		 * we can remove y given "WHERE x = y GROUP BY x, y".  These aren't
+		 * we can remove y given "WHERE x get_eclass_for_sort_expr= y GROUP BY x, y".  These aren't
 		 * especially common cases, but they're nearly free to detect.  Note
 		 * that we remove redundant items from processed_groupClause but not
 		 * the original parse->groupClause.
@@ -3403,7 +3403,8 @@ standard_qp_callback(PlannerInfo *root, void *extra)
    >processed_groupClause,
    tlist,
    true,
-   );
+   ,
+   true);
 		if (!sortable)
 		{
 			/* Can't sort; no point in considering aggregate ordering either */
@@ -3453,7 +3454,8 @@ standard_qp_callback(PlannerInfo *root, void *extra)

Re: POC: GROUP BY optimization

2024-04-11 Thread Andrei Lepikhov

On 4/12/24 06:44, Tom Lane wrote:

If this patch were producing better results I'd be more excited
about putting more work into it.  But on the basis of what I'm
seeing right now, I think maybe we ought to give up on it.
First, thanks for the deep review - sometimes, only a commit gives us a 
chance to get such observation :))).
On a broader note, introducing automatic group-by-order choosing is a 
step towards training the optimiser to handle poorly tuned incoming 
queries. While it's true that this may initially impact performance, 
it's crucial to weigh the potential benefits. So, beforehand, we should 
agree: Is it worth it?
If yes, I would say I see how often hashing doesn't work in grouping. 
Sometimes because of estimation errors, sometimes because grouping 
already has sorted input, sometimes in analytical queries when planner 
doesn't have enough memory for hashing. In analytical cases, the only 
way to speed up queries sometimes is to be smart with features like 
IncrementalSort and this one.
About low efficiency. Remember the previous version of the GROUP-BY 
optimisation - we disagreed on operator costs and the cost model in 
general. In the current version, we went the opposite - adding small 
features step-by-step. The current commit contains an integral part of 
the feature and is designed for safely testing the approach and adding 
more profitable parts like choosing group-by-order according to distinct 
values or unique indexes on grouping columns.
I have passed through the code being steered by the issues explained in 
detail. I see seven issues. Two of them definitely should be scrutinised 
right now, and I'm ready to do that.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: post-freeze damage control

2024-04-09 Thread Andrei Lepikhov

On 9/4/2024 12:55, Tom Lane wrote:

Andrei Lepikhov  writes:

* I really, really dislike jamming this logic into prepqual.c,
where it has no business being.  I note that it was shoved
into process_duplicate_ors without even the courtesy of
expanding the header comment:



Yeah, I preferred to do it in parse_expr.c with the assumption of some
'minimal' or 'canonical' tree form.


That seems quite the wrong direction to me.  AFAICS, the argument
for making this transformation depends on being able to convert
to an indexscan condition, so I would try to apply it even later,
when we have a set of restriction conditions to apply to a particular
baserel.  (This would weaken the argument that we need hashing
rather than naive equal() tests even further, I think.)  Applying
the transform to join quals seems unlikely to be a win.
Our first prototype did this job right at the stage of index path 
creation. Unfortunately, this approach was too narrow and expensive.
The most problematic cases we encountered were from BitmapOr paths: if 
an incoming query has a significant number of OR clauses, the optimizer 
spends a lot of time generating these, in most cases, senseless paths 
(remember also memory allocated for that purpose). Imagine how much 
worse the situation becomes when we scale it with partitions.
Another issue we resolved with this transformation: shorter list of 
clauses speeds up planning and, sometimes, makes cardinality estimation 
more accurate.
Moreover, it helps even SeqScan: attempting to find a value in the 
hashed array is much faster than cycling a long-expression on each 
incoming tuple.


One more idea that I have set aside here is that the planner can utilize 
quick clause hashing:
From time to time, in the mailing list, I see disputes on different 
approaches to expression transformation/simplification/grouping, and 
most of the time, it ends up with the problem of search complexity. 
Clause hash can be a way to solve this, can't it?


--
regards,
Andrei Lepikhov
Postgres Professional





Re: post-freeze damage control

2024-04-08 Thread Andrei Lepikhov

On 9/4/2024 09:12, Tom Lane wrote:

I have another one that I'm not terribly happy about:

 Author: Alexander Korotkov 
 Branch: master [72bd38cc9] 2024-04-08 01:27:52 +0300

 Transform OR clauses to ANY expression

Because I'm primary author of the idea, let me answer.


I don't know that I'd call it scary exactly, but I do think it
was premature.  A week ago there was no consensus that it was
ready to commit, but Alexander pushed it (or half of it, anyway)
despite that.  A few concrete concerns:

* Yet another planner GUC.  Do we really need or want that?
It is the most interesting question here. Looking around planner 
features designed but not applied for the same reason because they can 
produce suboptimal plans in corner cases, I think about inventing 
flag-type parameters and hiding some features that work better for 
different load types under such flagged parameters.


* What the medical community would call off-label usage of
query jumbling.  I'm not sure this is even correct as-used,
and for sure it's using that code for something never intended.
Nor is the added code adequately (as in, at all) documented.
I agree with documentation and disagree with critics on the expression 
jumbling. It was introduced in the core. Why don't we allow it to be 
used to speed up machinery with some hashing?


* Patch refuses to group anything but Consts into the SAOP
transformation.  I realize that if you want to produce an
array Const you need Const inputs, but I wonder why it
wasn't considered to produce an ARRAY[] construct if there
are available clauses with pseudo-constant (eg Param)
comparison values.

Good point. I think, we can consider that in the future.


* I really, really dislike jamming this logic into prepqual.c,
where it has no business being.  I note that it was shoved
into process_duplicate_ors without even the courtesy of
expanding the header comment:
Yeah, I preferred to do it in parse_expr.c with the assumption of some 
'minimal' or 'canonical' tree form. You can see this code in the 
previous version. I think we don't have any bugs here, but we have 
different opinions on how it should work.


  * process_duplicate_ors
  *   Given a list of exprs which are ORed together, try to apply
  *   the inverse OR distributive law.

Another reason to think this wasn't a very well chosen place is
that the file's list of #include's went from 4 entries to 11.
Somebody should have twigged to the idea that this was off-topic
for prepqual.c.

* OrClauseGroupKey is not a Node type, so why does it have
a NodeTag?  I wonder what value will appear in that field,
and what will happen if the struct is passed to any code
that expects real Nodes.
It's a hack authored by Alexander. I guess He can provide additional 
reasons in support of that.


I could probably find some other nits if I spent more time
on it, but I think these are sufficient to show that this
was not commit-ready.
It's up to you. On the one hand, I don't see any bugs or strong 
performance issues, and all the issues can be resolved further; on the 
other hand, I've got your good review and some ideas to work out.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: type cache cleanup improvements

2024-04-03 Thread Andrei Lepikhov

On 3/15/24 17:57, Teodor Sigaev wrote:

Okay, I've applied this piece for now.  Not sure I'll have much room
to look at the rest.


Thank you very much!
I have spent some time reviewing this feature. I think we can discuss 
and apply it step-by-step. So, the 0001-* patch is at this moment.
The feature addresses the issue of TypCache being bloated by intensive 
usage of non-standard types and domains. It adds significant overhead 
during relcache invalidation by thoroughly scanning this hash table.
IMO, this feature will be handy soon, as we already see some patches 
where TypCache is intensively used for storing composite types—for 
example, look into solutions proposed in [1].
One of my main concerns with this feature is the possibility of lost 
entries, which could be mistakenly used by relations with the same oid 
in the future. This seems particularly possible in cases with multiple 
temporary tables. The author has attempted to address this by replacing 
the typrelid and type_id fields in the mapRelType on each call of 
lookup_type_cache. However, I believe we could further improve this by 
removing the entry from mapRelType on invalidation, thus avoiding this 
potential issue.
While reviewing the patch, I made some minor changes (see attachment) 
that you're free to adopt or reject. However, it's crucial that the 
patch includes a detailed explanation, not just a single sentence, to 
ensure everyone understands the changes.
Upon closer inspection, I noticed that the current implementation only 
invalidates the cache entry. While this is acceptable for standard 
types, it may not be sufficient to maintain numerous custom types (as in 
the example in the initial letter) or in cases where whole-row vars are 
heavily used. In such scenarios, removing the entry and reducing the 
hash table's size might be more efficient.
In toto, the 0001-* patch looks good, and I would be glad to see it in 
the core.


[1] 
https://www.postgresql.org/message-id/flat/CAKcux6ktu-8tefLWtQuuZBYFaZA83vUzuRd7c1YHC-yEWyYFpg%40mail.gmail.com


--
regards,
Andrei Lepikhov
Postgres Professional
diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c
index e3c32c7848..ed321603d5 100644
--- a/src/backend/utils/cache/typcache.c
+++ b/src/backend/utils/cache/typcache.c
@@ -74,16 +74,17 @@
 #include "utils/typcache.h"
 
 
-/* The main type cache hashtable searched by lookup_type_cache */
-static HTAB *TypeCacheHash = NULL;
-
 /* The map from relation's oid to its type oid */
-typedef struct mapRelTypeEntry
+typedef struct RelTypeMapEntry
 {
 	Oid	typrelid;
 	Oid type_id;
-} mapRelTypeEntry;
+} RelTypeMapEntry;
+
+/* The main type cache hashtable searched by lookup_type_cache */
+static HTAB *TypeCacheHash = NULL;
 
+/* Utility hash table to speed up processing of invalidation relcache events. */
 static HTAB *mapRelType = NULL;
 
 /* List of type cache entries for domain types */
@@ -368,7 +369,7 @@ lookup_type_cache(Oid type_id, int flags)
 	, HASH_ELEM | HASH_BLOBS);
 
 		ctl.keysize = sizeof(Oid);
-		ctl.entrysize = sizeof(mapRelTypeEntry);
+		ctl.entrysize = sizeof(RelTypeMapEntry);
 		mapRelType = hash_create("Map reloid to typeoid", 64,
  , HASH_ELEM | HASH_BLOBS);
 
@@ -492,11 +493,11 @@ lookup_type_cache(Oid type_id, int flags)
 	 */
 	if (OidIsValid(typentry->typrelid) && typentry->typtype == TYPTYPE_COMPOSITE)
 	{
-		mapRelTypeEntry *relentry;
+		RelTypeMapEntry *relentry;
 
-		relentry = (mapRelTypeEntry*) hash_search(mapRelType,
-  >typrelid,
-  HASH_ENTER, NULL);
+		relentry = (RelTypeMapEntry *) hash_search(mapRelType,
+   >typrelid,
+   HASH_ENTER, NULL);
 
 		relentry->typrelid = typentry->typrelid;
 		relentry->type_id = typentry->type_id;
@@ -2297,7 +2298,7 @@ SharedRecordTypmodRegistryAttach(SharedRecordTypmodRegistry *registry)
 }
 
 static void
-invalidateCompositeTypeCacheEntry(TypeCacheEntry *typentry)
+invalidateTypeCacheEntry(TypeCacheEntry *typentry)
 {
 	/* Delete tupdesc if we have it */
 	if (typentry->tupDesc != NULL)
@@ -2348,11 +2349,11 @@ TypeCacheRelCallback(Datum arg, Oid relid)
 
 	if (OidIsValid(relid))
 	{
-		mapRelTypeEntry *relentry;
+		RelTypeMapEntry *relentry;
 
-		relentry = (mapRelTypeEntry *) hash_search(mapRelType,
-  ,
-  HASH_FIND, NULL);
+		relentry = (RelTypeMapEntry *) hash_search(mapRelType,
+   ,
+   HASH_FIND, NULL);
 
 		if (relentry != NULL)
 		{
@@ -2365,7 +2366,7 @@ TypeCacheRelCallback(Datum arg, Oid relid)
 Assert(typentry->typtype == TYPTYPE_COMPOSITE);
 Assert(relid == typentry->typrelid);
 
-invalidateCompositeTypeCacheEntry(typentry);
+invalidateTypeCacheEntry(typentry);
 			}
 		}
 
@@ -2397,7 +2398,7 @@ TypeCacheRelCallback(Datum arg, Oid relid)
 		{
 			if (typentry->typtype == TYPTYPE_COMPOSITE)
 			{
-invalidateCompositeTypeCacheEntry(ty

Re: Asymmetric partition-wise JOIN

2024-04-01 Thread Andrei Lepikhov

On 15/10/2023 13:25, Alexander Korotkov wrote:

Great!  I'm looking forward to the revised patch.
Revising the code and opinions before restarting this work, I found two 
different possible strategies mentioned in the thread:
1. 'Common Resources' shares the materialised result of the inner table 
scan (a hash table in the case of HashJoin) to join each partition one 
by one. It gives us a profit in the case of parallel append and possibly 
other cases, like the one shown in the initial message.
2. 'Individual strategies' - By limiting the AJ feature to cases when 
the JOIN clause contains a partitioning expression, we can push an 
additional scan clause into each copy of the inner table scan, reduce 
the number of tuples scanned, and even prune something because of proven 
zero input.


I see the pros and cons of both approaches. The first option is more 
straightforward, and its outcome is obvious in the case of parallel 
append. But how can we guarantee the same join type for each join? Why 
should we ignore the positive effect of different strategies for 
different partitions?
The second strategy is more expensive for the optimiser, especially in 
the multipartition case. But as I can predict, it is easier to implement 
and looks more natural for the architecture. What do you think about that?


--
regards,
Andrei Lepikhov
Postgres Professional





Re: Table AM Interface Enhancements

2024-04-01 Thread Andrei Lepikhov

On 31/3/2024 00:33, Alexander Korotkov wrote:

I think the way forward might be to introduce the new API, which would
isolate executor details from table AM.  We may introduce a new data
structure InsertWithArbiterContext which would contain EState and a
set of callbacks which would avoid table AM from calling the executor
directly.  That should be our goal for pg18.  Now, this is too close
to FF to design a new API.
I'm a bit late, but have you ever considered adding some sort of index 
probing routine to the AM interface for estimation purposes?
I am working out the problem when we have dubious estimations. For 
example, we don't have MCV or do not fit MCV statistics for equality of 
multiple clauses, or we detected that the constant value is out of the 
histogram range. In such cases (especially for [parameterized] JOINs), 
the optimizer could have a chance to probe the index and avoid huge 
underestimation. This makes sense, especially for multicolumn 
filters/clauses.

Having a probing AM method, we may invent something for this challenge.

--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2024-04-01 Thread Andrei Lepikhov

On 28/3/2024 16:54, Alexander Korotkov wrote:

The current patch has a boolean guc enable_or_transformation.
However, when we have just a few ORs to be transformated, then we
should get less performance gain from the transformation and higher
chances to lose a good bitmap scan plan from that.  When there is a
huge list of ORs to be transformed, then the performance gain is
greater and it is less likely we could lose a good bitmap scan plan.

What do you think about introducing a GUC threshold value: the minimum
size of list to do OR-to-ANY transformation?
min_list_or_transformation or something.
I labelled it or_transformation_limit (see in attachment). Feel free to 
rename it.
It's important to note that the limiting GUC doesn't operate 
symmetrically for forward, OR -> SAOP, and backward SAOP -> OR 
operations. In the forward case, it functions as you've proposed. 
However, in the backward case, we only check whether the feature is 
enabled or not. This is due to our existing limitation, 
MAX_SAOP_ARRAY_SIZE, and the fact that we can't match the length of the 
original OR list with the sizes of the resulting SAOPs. For instance, a 
lengthy OR list with 100 elements can be transformed into 3 SAOPs, each 
with a size of around 30 elements.
One aspect that requires attention is the potential inefficiency of our 
OR -> ANY transformation when we have a number of elements less than 
MAX_SAOP_ARRAY_SIZE. This is because we perform a reverse transformation 
ANY -> OR at the stage of generating bitmap scans. If the BitmapScan 
path dominates, we may have done unnecessary work. Is this an occurrence 
that we should address?
But the concern above may just be a point of improvement later: We can 
add one more strategy to the optimizer: testing each array element as an 
OR clause; we can also provide a BitmapOr path, where SAOP is covered 
with a minimal number of partial indexes (likewise, previous version).


--
regards,
Andrei Lepikhov
Postgres Professional
From e42a7111a12ef82eecdb2e692d65e7ba6e43ad79 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Fri, 2 Feb 2024 22:01:09 +0300
Subject: [PATCH 1/2] Transform OR clauses to ANY expression.

Replace (expr op C1) OR (expr op C2) ... with expr op ANY(ARRAY[C1, C2, ...]) 
on the
preliminary stage of optimization when we are still working with the
expression tree.
Here C is a constant expression, 'expr' is non-constant expression, 'op' is
an operator which returns boolean result and has a commuter (for the case of
reverse order of constant and non-constant parts of the expression,
like 'CX op expr').
Sometimes it can lead to not optimal plan. But we think it is better to have
array of elements instead of a lot of OR clauses. Here is a room for further
optimizations on decomposing that array into more optimal parts.
Authors: Alena Rybakina , Andrey Lepikhov 

Reviewed-by: Peter Geoghegan , Ranier Vilela 
Reviewed-by: Alexander Korotkov , Robert Haas 

Reviewed-by: jian he 
---
 .../postgres_fdw/expected/postgres_fdw.out|   8 +-
 doc/src/sgml/config.sgml  |  18 +
 src/backend/nodes/queryjumblefuncs.c  |  27 ++
 src/backend/optimizer/prep/prepqual.c | 379 +-
 src/backend/utils/misc/guc_tables.c   |  13 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/nodes/queryjumble.h   |   1 +
 src/include/optimizer/optimizer.h |   2 +
 src/test/regress/expected/create_index.out| 172 +++-
 src/test/regress/expected/join.out|  60 ++-
 src/test/regress/expected/partition_prune.out | 211 +-
 src/test/regress/expected/stats_ext.out   |  12 +-
 src/test/regress/expected/tidscan.out |  21 +-
 src/test/regress/sql/create_index.sql |  44 ++
 src/test/regress/sql/join.sql |   8 +
 src/test/regress/sql/partition_prune.sql  |  18 +
 src/test/regress/sql/tidscan.sql  |   4 +
 src/tools/pgindent/typedefs.list  |   2 +
 18 files changed, 950 insertions(+), 51 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index b7af86d351..277ef3f385 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8853,18 +8853,18 @@ insert into utrtest values (2, 'qux');
 -- Check case where the foreign partition is a subplan target rel
 explain (verbose, costs off)
 update utrtest set a = 1 where a = 1 or a = 2 returning *;
- QUERY PLAN
 
-
+  QUERY PLAN   
   
+--
  Update on public.utrtes

Re: POC, WIP: OR-clause support for indexes

2024-03-18 Thread Andrei Lepikhov

On 14/3/2024 16:31, Alexander Korotkov wrote:
On Wed, Mar 13, 2024 at 2:16 PM Andrei Lepikhov 
As you can see this case is not related to partial indexes.  Just no 
index selective for the whole query.  However, splitting scan by the OR 
qual lets use a combination of two selective indexes.
I have rewritten the 0002-* patch according to your concern. A candidate 
and some thoughts are attached.
As I see, we have a problem here: expanding each array and trying to 
apply an element to each index can result in a lengthy planning stage.
Also, an index scan with the SAOP may potentially be more effective than 
with the list of OR clauses.
Originally, the transformation's purpose was to reduce a query's 
complexity and the number of optimization ways to speed up planning and 
(sometimes) execution. Here, we reduce planning complexity only in the 
case of an array size larger than MAX_SAOP_ARRAY_SIZE.
Maybe we can fall back to the previous version of the second patch, 
keeping in mind that someone who wants to get maximum profit from the 
BitmapOr scan of multiple indexes can just disable this optimization, 
enabling deep search of the most optimal scanning way?
As a compromise solution, I propose adding one more option to the 
previous version: if an element doesn't fit any partial index, try to 
cover it with a plain index.
In this case, we still do not guarantee the most optimal fit of elements 
to the set of indexes, but we speed up planning. Does that make sense?


--
regards,
Andrei Lepikhov
Postgres Professional
From d2d8944fc83ccd090653c1b15703a2c3ba096fa9 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 13 Mar 2024 12:26:02 +0700
Subject: [PATCH 2/2] Teach generate_bitmap_or_paths to build BitmapOr paths
 over SAOP clauses.

Likewise OR clauses, discover SAOP array and try to split its elements
between smaller sized arrays to fit a set of partial indexes.
---
 doc/src/sgml/config.sgml   |   3 +
 src/backend/optimizer/path/indxpath.c  |  74 +-
 src/backend/optimizer/util/predtest.c  |  37 +++
 src/backend/optimizer/util/restrictinfo.c  |  13 +
 src/include/optimizer/optimizer.h  |   3 +
 src/include/optimizer/restrictinfo.h   |   1 +
 src/test/regress/expected/create_index.out |  24 +-
 src/test/regress/expected/select.out   | 280 +
 src/test/regress/sql/select.sql|  82 ++
 9 files changed, 500 insertions(+), 17 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2de6ae301a..0df56f44e3 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5485,6 +5485,9 @@ ANY num_sync 
( orclause)->args;
+
+   if (!enable_or_transformation)
+   return orlist;
+
+   if (restriction_is_saop_clause(rinfo))
+   {
+   result = transform_saop_to_ors(root, rinfo);
+   return (result == NIL) ? list_make1(rinfo) : result;
+   }
+
+   foreach(lc, orlist)
+   {
+   Expr *expr = (Expr *) lfirst(lc);
+
+   if (IsA(expr, RestrictInfo) && 
restriction_is_saop_clause((RestrictInfo *) expr))
+   {
+   List *sublist;
+
+   sublist = extract_saop_ors(root, (RestrictInfo *) 
lfirst(lc));
+   if (sublist != NIL)
+   {
+   result = list_concat(result, sublist);
+   continue;
+   }
+
+   /* Need to return expr to the result list */
+   }
+
+   result = lappend(result, expr);
+   }
+
+   return result;
+}
+
 /*
  * generate_bitmap_or_paths
- * Look through the list of clauses to find OR clauses, and 
generate
- * a BitmapOrPath for each one we can handle that way.  Return a 
list
- * of the generated BitmapOrPaths.
+ * Look through the list of clauses to find OR and SAOP clauses, 
and
+ * Each saop clause are splitted to be covered by partial indexes.
+ * generate a BitmapOrPath for each one we can handle that way.
+ * Return a list of the generated BitmapOrPaths.
  *
  * other_clauses is a list of additional clauses that can be assumed true
  * for the purpose of generating indexquals, but are not to be searched for
@@ -1247,20 +1303,24 @@ generate_bitmap_or_paths(PlannerInfo *root, RelOptInfo 
*rel,
foreach(lc, clauses)
{
RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
-   List   *pathlist;
+   List   *pathlist = NIL;
Path   *bitmapqual;
ListCell   *j;
+   List   *orlist = NIL;
 
-   /* Ignore RestrictInfos that aren't ORs */
-   if (!restriction_is_or_clause(rinfo))
+   orlist = extract_saop_ors(root, rinfo);
+   

Re: POC, WIP: OR-clause support for indexes

2024-03-14 Thread Andrei Lepikhov

On 14/3/2024 17:39, Alexander Korotkov wrote:

Thank you, Andrei.  Looks like a very undesirable side effect.  Do you
have any idea why it happens?  Partition pruning should work correctly
for both transformed and non-transformed quals, why does
transformation hurt it?
Now we have the v23-0001-* patch with all issues resolved. The last one 
which caused execution stage pruning was about necessity to evaluate 
SAOP expression right after transformation. In previous version the core 
executed it on transformed expressions.


> As you can see this case is not related to partial indexes.  Just no
> index selective for the whole query.  However, splitting scan by the
> OR qual lets use a combination of two selective indexes.
Thanks for the case. I will try to resolve it.

--
regards,
Andrei Lepikhov
Postgres Professional
From 156c00c820a38e5e1856f07363af87b3109b5d77 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Fri, 2 Feb 2024 22:01:09 +0300
Subject: [PATCH 1/2] Transform OR clauses to ANY expression.

Replace (expr op C1) OR (expr op C2) ... with expr op ANY(ARRAY[C1, C2, ...]) 
on the
preliminary stage of optimization when we are still working with the
expression tree.
Here C is a constant expression, 'expr' is non-constant expression, 'op' is
an operator which returns boolean result and has a commuter (for the case of
reverse order of constant and non-constant parts of the expression,
like 'CX op expr').
Sometimes it can lead to not optimal plan. But we think it is better to have
array of elements instead of a lot of OR clauses. Here is a room for further
optimizations on decomposing that array into more optimal parts.
Authors: Alena Rybakina , Andrey Lepikhov 

Reviewed-by: Peter Geoghegan , Ranier Vilela 
Reviewed-by: Alexander Korotkov , Robert Haas 

Reviewed-by: jian he 
---
 .../postgres_fdw/expected/postgres_fdw.out|   8 +-
 doc/src/sgml/config.sgml  |  17 +
 src/backend/nodes/queryjumblefuncs.c  |  27 ++
 src/backend/optimizer/prep/prepqual.c | 374 +-
 src/backend/utils/misc/guc_tables.c   |  11 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/nodes/queryjumble.h   |   1 +
 src/include/optimizer/optimizer.h |   2 +
 src/test/regress/expected/create_index.out| 156 +++-
 src/test/regress/expected/join.out|  62 ++-
 src/test/regress/expected/partition_prune.out | 215 +-
 src/test/regress/expected/stats_ext.out   |  12 +-
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/expected/tidscan.out |  23 +-
 src/test/regress/sql/create_index.sql |  35 ++
 src/test/regress/sql/join.sql |  10 +
 src/test/regress/sql/partition_prune.sql  |  22 ++
 src/test/regress/sql/tidscan.sql  |   6 +
 src/tools/pgindent/typedefs.list  |   2 +
 19 files changed, 929 insertions(+), 58 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 58a603ac56..a965b43cc6 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8838,18 +8838,18 @@ insert into utrtest values (2, 'qux');
 -- Check case where the foreign partition is a subplan target rel
 explain (verbose, costs off)
 update utrtest set a = 1 where a = 1 or a = 2 returning *;
- QUERY PLAN
 
-
+  QUERY PLAN   
   
+--
  Update on public.utrtest
Output: utrtest_1.a, utrtest_1.b
Foreign Update on public.remp utrtest_1
Update on public.locp utrtest_2
->  Append
  ->  Foreign Update on public.remp utrtest_1
-   Remote SQL: UPDATE public.loct SET a = 1 WHERE (((a = 1) OR (a 
= 2))) RETURNING a, b
+   Remote SQL: UPDATE public.loct SET a = 1 WHERE ((a = ANY 
('{1,2}'::integer[]))) RETURNING a, b
  ->  Seq Scan on public.locp utrtest_2
Output: 1, utrtest_2.tableoid, utrtest_2.ctid, NULL::record
-   Filter: ((utrtest_2.a = 1) OR (utrtest_2.a = 2))
+   Filter: (utrtest_2.a = ANY ('{1,2}'::integer[]))
 (10 rows)
 
 -- The new values are concatenated with ' triggered !'
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 65a6e6c408..2de6ae301a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5472,6 +5472,23 @@ ANY num_sync ( 
+  enable_or_transformation (boolean)
+   
+enable_or_transformation configuration 
parameter
+   
+  
+  
+   
+Enables or 

Re: POC, WIP: OR-clause support for indexes

2024-03-14 Thread Andrei Lepikhov

On 14/3/2024 16:31, Alexander Korotkov wrote:
On Wed, Mar 13, 2024 at 2:16 PM Andrei Lepikhov 
mailto:a.lepik...@postgrespro.ru>> wrote:

 > On 13/3/2024 18:05, Alexander Korotkov wrote:
 > > On Wed, Mar 13, 2024 at 7:52 AM Andrei Lepikhov
 > > Given all of the above, I think moving transformation to the
 > > canonicalize_qual() would be the right way to go.
 > Ok, I will try to move the code.
 > I have no idea about the timings so far. I recall the last time I got
 > bogged down in tons of duplicated code. I hope with an almost-ready
 > sketch, it will be easier.

Thank you!  I'll be looking forward to the updated patch.
Okay, I moved the 0001-* patch to the prepqual.c module. See it in the 
attachment. I treat it as a transient patch.

It has positive outcomes as well as negative ones.
The most damaging result you can see in the partition_prune test:
partition pruning, in some cases, moved to the executor initialization 
stage. I guess, we should avoid it somehow in the next version.


--
regards,
Andrei Lepikhov
Postgres Professional
From 170f6871540025d0d1683750442e7af902b11a40 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Fri, 2 Feb 2024 22:01:09 +0300
Subject: [PATCH 1/2] Transform OR clauses to ANY expression.

Replace (expr op C1) OR (expr op C2) ... with expr op ANY(ARRAY[C1, C2, ...]) 
on the
preliminary stage of optimization when we are still working with the
expression tree.
Here C is a constant expression, 'expr' is non-constant expression, 'op' is
an operator which returns boolean result and has a commuter (for the case of
reverse order of constant and non-constant parts of the expression,
like 'CX op expr').
Sometimes it can lead to not optimal plan. But we think it is better to have
array of elements instead of a lot of OR clauses. Here is a room for further
optimizations on decomposing that array into more optimal parts.
Authors: Alena Rybakina , Andrey Lepikhov 

Reviewed-by: Peter Geoghegan , Ranier Vilela 
Reviewed-by: Alexander Korotkov , Robert Haas 

Reviewed-by: jian he 
---
 .../postgres_fdw/expected/postgres_fdw.out|   8 +-
 doc/src/sgml/config.sgml  |  17 +
 src/backend/nodes/queryjumblefuncs.c  |  27 ++
 src/backend/optimizer/prep/prepqual.c | 371 +-
 src/backend/utils/misc/guc_tables.c   |  11 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/nodes/queryjumble.h   |   1 +
 src/include/optimizer/optimizer.h |   2 +
 src/test/regress/expected/create_index.out| 156 +++-
 src/test/regress/expected/join.out|  62 ++-
 src/test/regress/expected/partition_prune.out | 235 +--
 src/test/regress/expected/stats_ext.out   |  12 +-
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/expected/tidscan.out |  19 +-
 src/test/regress/sql/create_index.sql |  35 ++
 src/test/regress/sql/join.sql |  10 +
 src/test/regress/sql/partition_prune.sql  |  22 ++
 src/test/regress/sql/tidscan.sql  |   6 +
 src/tools/pgindent/typedefs.list  |   2 +
 19 files changed, 939 insertions(+), 61 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 58a603ac56..a965b43cc6 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8838,18 +8838,18 @@ insert into utrtest values (2, 'qux');
 -- Check case where the foreign partition is a subplan target rel
 explain (verbose, costs off)
 update utrtest set a = 1 where a = 1 or a = 2 returning *;
- QUERY PLAN
 
-
+  QUERY PLAN   
   
+--
  Update on public.utrtest
Output: utrtest_1.a, utrtest_1.b
Foreign Update on public.remp utrtest_1
Update on public.locp utrtest_2
->  Append
  ->  Foreign Update on public.remp utrtest_1
-   Remote SQL: UPDATE public.loct SET a = 1 WHERE (((a = 1) OR (a 
= 2))) RETURNING a, b
+   Remote SQL: UPDATE public.loct SET a = 1 WHERE ((a = ANY 
('{1,2}'::integer[]))) RETURNING a, b
  ->  Seq Scan on public.locp utrtest_2
Output: 1, utrtest_2.tableoid, utrtest_2.ctid, NULL::record
-   Filter: ((utrtest_2.a = 1) OR (utrtest_2.a = 2))
+   Filter: (utrtest_2.a = ANY ('{1,2}'::integer[]))
 (10 rows)
 
 -- The new values are concatenated with ' triggered !'
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 65a6e6c408..2de6ae301a 100644
-

Re: POC, WIP: OR-clause support for indexes

2024-03-13 Thread Andrei Lepikhov

On 13/3/2024 18:05, Alexander Korotkov wrote:

On Wed, Mar 13, 2024 at 7:52 AM Andrei Lepikhov
Given all of the above, I think moving transformation to the
canonicalize_qual() would be the right way to go.

Ok, I will try to move the code.
I have no idea about the timings so far. I recall the last time I got 
bogged down in tons of duplicated code. I hope with an almost-ready 
sketch, it will be easier.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2024-03-12 Thread Andrei Lepikhov

On 12/3/2024 22:20, Alexander Korotkov wrote:

On Mon, Mar 11, 2024 at 2:43 PM Andrei Lepikhov

I think you are right. It is probably a better place than any other to
remove duplicates in an array. I just think we should sort and remove
duplicates from entry->consts in one pass. Thus, this optimisation
should be applied to sortable constants.


Ok.
New version of the patch set implemented all we have agreed on for now. 
We can return MAX_SAOP_ARRAY_SIZE constraint and Alena's approach to 
duplicates deletion for non-sortable cases at the end.



Hmm, we already tried to do it at that point. I vaguely recall some
issues caused by this approach. Anyway, it should be done as quickly as
possible to increase the effect of the optimization.


I think there were provided quite strong reasons why this shouldn't be
implemented at the parse analysis stage [1], [2], [3].  The
canonicalize_qual() looks quite appropriate place for that since it
does similar transformations.
Ok. Let's discuss these reasons. In Robert's opinion [1,3], we should do 
the transformation based on the cost model. But in the canonicalize_qual 
routine, we still make the transformation blindly. Moreover, the second 
patch reduces the weight of this reason, doesn't it? Maybe we shouldn't 
think about that as about optimisation but some 'general form of 
expression'?
Peter [2] worries about the possible transformation outcomes at this 
stage. But remember, we already transform clauses like ROW() IN (...) to 
a series of ORs here, so it is not an issue. Am I wrong?
Why did we discard the attempt with canonicalize_qual on the previous 
iteration? - The stage of parsing is much more native for building SAOP 
quals. We can reuse make_scalar_array_op and other stuff, for example.
During the optimisation stage, the only list partitioning machinery 
creates SAOP based on a list of constants. So, in theory, it is possible 
to implement. But do we really need to make the code more complex?


Links.
1. 
https://www.postgresql.org/message-id/CA%2BTgmoZCgP6FrBQEusn4yaWm02XU8OPeoEMk91q7PRBgwaAkFw%40mail.gmail.com
2. 
https://www.postgresql.org/message-id/CAH2-Wzm2%3Dnf_JhiM3A2yetxRs8Nd2NuN3JqH%3Dfm_YWYd1oYoPg%40mail.gmail.com
3. 
https://www.postgresql.org/message-id/CA%2BTgmoaOiwMXBBTYknczepoZzKTp-Zgk5ss1%2BCuVQE-eFTqBmA%40mail.gmail.com


--
regards,
Andrei Lepikhov
Postgres Professional
From 86d969f2598a03b1eba84c0c064707de34ee60ac Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Fri, 2 Feb 2024 22:01:09 +0300
Subject: [PATCH 1/2] Transform OR clauses to ANY expression.

Replace (expr op C1) OR (expr op C2) ... with expr op ANY(ARRAY[C1, C2, ...]) 
on the
preliminary stage of optimization when we are still working with the
expression tree.
Here C is a constant expression, 'expr' is non-constant expression, 'op' is
an operator which returns boolean result and has a commuter (for the case of
reverse order of constant and non-constant parts of the expression,
like 'CX op expr').
Sometimes it can lead to not optimal plan. But we think it is better to have
array of elements instead of a lot of OR clauses. Here is a room for further
optimizations on decomposing that array into more optimal parts.
Authors: Alena Rybakina , Andrey Lepikhov 

Reviewed-by: Peter Geoghegan , Ranier Vilela 
Reviewed-by: Alexander Korotkov , Robert Haas 

Reviewed-by: jian he 
---
 .../postgres_fdw/expected/postgres_fdw.out|   8 +-
 doc/src/sgml/config.sgml  |  17 +
 src/backend/nodes/queryjumblefuncs.c  |  27 ++
 src/backend/parser/parse_expr.c   | 416 ++
 src/backend/utils/misc/guc_tables.c   |  11 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/bin/pg_dump/t/002_pg_dump.pl  |   6 +-
 src/include/nodes/queryjumble.h   |   1 +
 src/include/optimizer/optimizer.h |   1 +
 src/test/regress/expected/create_index.out| 156 ++-
 src/test/regress/expected/join.out|  62 ++-
 src/test/regress/expected/partition_prune.out | 215 -
 src/test/regress/expected/stats_ext.out   |  12 +-
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/expected/tidscan.out |  23 +-
 src/test/regress/sql/create_index.sql |  35 ++
 src/test/regress/sql/join.sql |  10 +
 src/test/regress/sql/partition_prune.sql  |  22 +
 src/test/regress/sql/tidscan.sql  |   6 +
 src/tools/pgindent/typedefs.list  |   2 +
 20 files changed, 974 insertions(+), 60 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 58a603ac56..a965b43cc6 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8838,18 +8838,18 @@ insert into utrtest values (2, 'qux');
 -- Check case where the foreign partition is a subplan target rel
 explain (verbose, costs off)
 update utrt

Re: a wrong index choose when statistics is out of date

2024-03-12 Thread Andrei Lepikhov

On 8/3/2024 18:53, Andy Fan wrote:

I just reviewed the bad queries plan for the past half years internally,
I found many queries used the Nested loop which is the direct cause. now
I think I find out a new reason for this, because the missed optimizer
statistics cause the rows in outer relation to be 1, which make the Nest
loop is choosed. I'm not sure your idea could help on this or can help
on this than mine at this aspect.


Having had the same problem for a long time, I've made an attempt and 
invented a patch that probes an index to determine whether the estimated 
constant is within statistics' scope.
I remember David's remark on the overhead problem, but I don't argue it 
here. This patch is on the table to have one more solution sketch for 
further discussion.
Also, Andy, if you have a specific problem with index choosing, you can 
try a tiny option that makes the index-picking technique less dependent 
on the ordering of index lists [1].


[1] 
https://www.postgresql.org/message-id/9b3dbf6d-776a-4953-a5a4-609929393...@postgrespro.ru


--
regards,
Andrei Lepikhov
Postgres Professional
diff --git a/src/backend/utils/adt/like_support.c 
b/src/backend/utils/adt/like_support.c
index 2635050861..bc8e59bbf3 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -643,7 +643,7 @@ patternsel_common(PlannerInfo *root,
/*
 * Pattern specifies an exact match, so estimate as for '='
 */
-   result = var_eq_const(, eqopr, collation, 
prefix->constvalue,
+   result = var_eq_const(root, , eqopr, collation, 
prefix->constvalue,
  false, true, false);
}
else
@@ -1294,7 +1294,7 @@ prefix_selectivity(PlannerInfo *root, VariableStatData 
*vardata,
 * probably off the end of the histogram, and thus we probably got a 
very
 * small estimate from the >= condition; so we still need to clamp.
 */
-   eq_sel = var_eq_const(vardata, eqopr, collation, prefixcon->constvalue,
+   eq_sel = var_eq_const(root, vardata, eqopr, collation, 
prefixcon->constvalue,
  false, true, false);
 
prefixsel = Max(prefixsel, eq_sel);
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index cea777e9d4..be6213046e 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -273,7 +273,7 @@ eqsel_internal(PG_FUNCTION_ARGS, bool negate)
 * in the query.)
 */
if (IsA(other, Const))
-   selec = var_eq_const(, operator, collation,
+   selec = var_eq_const(root, , operator, collation,
 ((Const *) 
other)->constvalue,
 ((Const *) 
other)->constisnull,
 varonleft, negate);
@@ -286,13 +286,133 @@ eqsel_internal(PG_FUNCTION_ARGS, bool negate)
return selec;
 }
 
+/*
+ * Lookup the value in the index and try to estimate number of the tuples with
+ * the same value.
+ */
+static Selectivity
+estimate_ntuples_by_index(PlannerInfo *root, VariableStatData *vardata,
+ Datum constval,
+ Oid collation, Oid 
regprocoid, int32 stawidth)
+{
+   RelOptInfo *rel = vardata->rel;
+   RangeTblEntry  *rte = root->simple_rte_array[rel->relid];
+   ListCell   *lc;
+   int ntuples = 0;
+   Selectivity selec = -1.;
+
+   foreach(lc, rel->indexlist)
+   {
+   IndexOptInfo   *index = (IndexOptInfo *) lfirst(lc);
+   ScanKeyData scankeys[1];
+   SnapshotDataSnapshotNonVacuumable;
+   IndexScanDesc   index_scan;
+   RelationheapRel;
+   RelationindexRel;
+
+   if (index->relam != BTREE_AM_OID)
+   continue;
+   if (index->indpred != NIL)
+   continue;
+   if (index->hypothetical)
+   continue;
+   if (!match_index_to_operand(vardata->var, 0, index))
+   continue;
+
+   heapRel = table_open(rte->relid, NoLock);
+   indexRel = index_open(index->indexoid, NoLock);
+
+   ScanKeyEntryInitialize([0],
+  0,
+  1,
+  
BTEqualStrategyNumber,
+  
vardata->atttype,
+ 

Re: POC, WIP: OR-clause support for indexes

2024-03-11 Thread Andrei Lepikhov

On 11/3/2024 18:31, Alexander Korotkov wrote:

I'm not convinced about this limit. The initial reason was to combine
long lists of ORs into the array because such a transformation made at
an early stage increases efficiency.
I understand the necessity of this limit in the array decomposition
routine but not in the creation one.


The comment near MAX_SAOP_ARRAY_SIZE says that this limit is because
N^2 algorithms could be applied to arrays.  Are you sure that's not
true for our case?
When you operate an array, indeed. But when we transform ORs to an 
array, not. Just check all the places in the optimizer and even the 
executor where we would pass along the list of ORs. This is why I think 
we should use this optimization even more intensively for huge numbers 
of ORs in an attempt to speed up the overall query.



3) Better save the original order of clauses by putting hash entries and
untransformable clauses to the same list.  A lot of differences in
regression tests output have gone.

I agree that reducing the number of changes in regression tests looks
better. But to achieve this, you introduced a hack that increases the
complexity of the code. Is it worth it? Maybe it would be better to make
one-time changes in tests instead of getting this burden on board. Or
have you meant something more introducing the node type?


For me the reason is not just a regression test.  The current code
keeps the original order of quals as much as possible.  The OR
transformation code reorders quals even in cases when it doesn't
eventually apply any optimization.  I don't think that's acceptable.
However, less hackery ways for this is welcome for sure.
Why is it unacceptable? Can the user implement some order-dependent 
logic with clauses, and will it be correct?
Otherwise, it is a matter of taste, and generally, this decision is up 
to you.



We don't make array values unique.  That might make query execution
performance somewhat worse, and also makes selectivity estimation
worse.  I suggest Andrei and/or Alena should implement making array
values unique.

The fix Alena has made looks correct. But I urge you to think twice:
The optimizer doesn't care about duplicates, so why do we do it?
What's more, this optimization is intended to speed up queries with long
OR lists. Using the list_append_unique() comparator on such lists could
impact performance. I suggest sticking to the common rule and leaving
the responsibility on the user's shoulders.


I don't see why the optimizer doesn't care about duplicates for OR
lists.  As I showed before in an example, it successfully removes the
duplicate.  So, currently OR transformation clearly introduces a
regression in terms of selectivity estimation.  I think we should
evade that.
I think you are right. It is probably a better place than any other to 
remove duplicates in an array. I just think we should sort and remove 
duplicates from entry->consts in one pass. Thus, this optimisation 
should be applied to sortable constants.





At least, we should do this optimization later, in one pass, with
sorting elements before building the array. But what if we don't have a
sort operator for the type?


It was probably discussed before, but can we do our work later?  There
is a canonicalize_qual() which calls find_duplicate_ors().  This is
the place where currently duplicate OR clauses are removed.  Could our
OR-to-ANY transformation be just another call from
canonicalize_qual()?
Hmm, we already tried to do it at that point. I vaguely recall some 
issues caused by this approach. Anyway, it should be done as quickly as 
possible to increase the effect of the optimization.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2024-03-10 Thread Andrei Lepikhov

On 7/3/2024 21:51, Alexander Korotkov wrote:

Hi!

On Tue, Mar 5, 2024 at 9:59 AM Andrei Lepikhov 
mailto:a.lepik...@postgrespro.ru>> wrote:

 > On 5/3/2024 12:30, Andrei Lepikhov wrote:
 > > On 4/3/2024 09:26, jian he wrote:
 > ... and the new version of the patchset is attached.

I made some revisions for the patchset.

Great!

1) Use hash_combine() to combine hash values.

Looks better

2) Upper limit the number of array elements by MAX_SAOP_ARRAY_SIZE.


I'm not convinced about this limit. The initial reason was to combine 
long lists of ORs into the array because such a transformation made at 
an early stage increases efficiency.
I understand the necessity of this limit in the array decomposition 
routine but not in the creation one.


3) Better save the original order of clauses by putting hash entries and 
untransformable clauses to the same list.  A lot of differences in 
regression tests output have gone.
I agree that reducing the number of changes in regression tests looks 
better. But to achieve this, you introduced a hack that increases the 
complexity of the code. Is it worth it? Maybe it would be better to make 
one-time changes in tests instead of getting this burden on board. Or 
have you meant something more introducing the node type?


We don't make array values unique.  That might make query execution 
performance somewhat worse, and also makes selectivity estimation 
worse.  I suggest Andrei and/or Alena should implement making array 
values unique.

The fix Alena has made looks correct. But I urge you to think twice:
The optimizer doesn't care about duplicates, so why do we do it?
What's more, this optimization is intended to speed up queries with long 
OR lists. Using the list_append_unique() comparator on such lists could 
impact performance. I suggest sticking to the common rule and leaving 
the responsibility on the user's shoulders.
At least, we should do this optimization later, in one pass, with 
sorting elements before building the array. But what if we don't have a 
sort operator for the type?


--
regards,
Andrei Lepikhov
Postgres Professional





Re: a wrong index choose when statistics is out of date

2024-03-08 Thread Andrei Lepikhov

On 7/3/2024 17:32, David Rowley wrote:

On Thu, 7 Mar 2024 at 21:17, Andrei Lepikhov  wrote:

I would like to ask David why the var_eq_const estimator doesn't have an
option for estimation with a histogram. Having that would relieve a
problem with skewed data. Detecting the situation with incoming const
that is out of the covered area would allow us to fall back to ndistinct
estimation or something else. At least, histogram usage can be
restricted by the reltuples value and ratio between the total number of
MCV values and the total number of distinct values in the table.


If you can think of a way how to calculate it, you should propose a patch.

IIRC, we try to make the histogram buckets evenly sized based on the
number of occurrences. I've not followed the code in default, I'd
guess that doing that allows us to just subtract off the MCV
frequencies and assume the remainder is evenly split over each
histogram bucket, so unless we had an n_distinct per histogram bucket,
or at the very least n_distinct_for_histogram_values, then how would
the calculation look for what we currently record?
Yeah, It is my mistake; I see nothing special here with such a kind of 
histogram: in the case of a coarse histogram net, the level of 
uncertainty in one bin is too high to make a better estimation. I am 
just pondering detection situations when estimation constant is just out 
of statistics scope to apply to alternative, more expensive logic 
involving the number of index pages out of the boundary, index tuple 
width, and distinct value. The Left and right boundaries of the 
histogram are suitable detectors for such a situation.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: a wrong index choose when statistics is out of date

2024-03-07 Thread Andrei Lepikhov

On 5/3/2024 19:56, Andy Fan wrote:

I think it is OK for a design review, for the implementaion side, the
known issue includes:

1. Support grap such infromation from its parent for partitioned table
if the child doesn't have such information.
2. builtin document and testing.

Any feedback is welcome.

Thanks for your efforts.
I was confused when you showed the problem connected to clauses like 
"Var op Const" and "Var op Param".
As far as I know, the estimation logic of such clauses uses MCV and 
number-distinct statistics. So, being out of MCV values, it becomes 
totally insensitive to any internal skew in data and any data outside 
the statistics boundaries.
Having studied the example you provided with the patch, I think it is 
not a correct example:

Difference between var_eq_const and var_eq_non_const quite obvious:
In the second routine, you don't have information about the const value 
and can't use MCV for estimation. Also, you can't exclude MCV values 
from the estimation. And it is just luck that you've got the right 
answer. I think if you increased the weight of the unknown part, you 
would get a bad result, too.
I would like to ask David why the var_eq_const estimator doesn't have an 
option for estimation with a histogram. Having that would relieve a 
problem with skewed data. Detecting the situation with incoming const 
that is out of the covered area would allow us to fall back to ndistinct 
estimation or something else. At least, histogram usage can be 
restricted by the reltuples value and ratio between the total number of 
MCV values and the total number of distinct values in the table.


Just for demo: demonstration of data skew issue:

CREATE EXTENSION tablefunc;
CREATE TABLE norm_test AS
  SELECT abs(r::integer) AS val
  FROM normal_rand(1E7::integer, 5.::float8, 300.::float8) AS r;
ANALYZE norm_test;

-- First query is estimated with MCV quite precisely:
EXPLAIN ANALYZE SELECT * FROM norm_test WHERE val = 100;
-- result: planned rows=25669, actual rows=25139

-- Here we have numdistinct estimation, mostly arbitrary:
EXPLAIN ANALYZE SELECT * FROM norm_test WHERE val = 200;
-- result: planned rows=8604, actual rows=21239
EXPLAIN ANALYZE SELECT * FROM norm_test WHERE val = 500;
-- result: planned rows=8604, actual rows=6748
EXPLAIN ANALYZE SELECT * FROM norm_test WHERE val = 600;
-- result: planned rows=8604, actual rows=3501
EXPLAIN ANALYZE SELECT * FROM norm_test WHERE val = 700;
-- result: planned rows=8604, actual rows=1705
EXPLAIN ANALYZE SELECT * FROM norm_test WHERE val = 1000;
-- result: planned rows=8604, actual rows=91

--
regards,
Andrei Lepikhov
Postgres Professional





Re: "type with xxxx does not exist" when doing ExecMemoize()

2024-03-05 Thread Andrei Lepikhov

On 6/3/2024 10:10, Tender Wang wrote:



Andrei Lepikhov <mailto:a.lepik...@postgrespro.ru>> 于2024年3月5日周二 17:36写道:


On 1/3/2024 14:18, Tender Wang wrote:
 > During debug, I learned that numeric_add doesn't have type check
like
 > rangetype, so aboved query will not report "type with xxx does
not exist".
 >
 > And I realize that  the test case added by Andrei Lepikhov  in v3 is
 > right. So in v6 patch I add Andrei Lepikhov's test case.  Thanks
a lot.
 >
 > Now I think the v6 version patch seems to be complete now.
I've passed through the patch, and it looks okay. Although I am afraid
of the same problems that future changes can cause and how to detect
them, it works correctly.


Thanks for reviewing it, and I add it to commitfest 2024-07.

I think, it is a bug. Should it be fixed (and back-patched) earlier?

--
regards,
Andrei Lepikhov
Postgres Professional





Re: Hooking into ExplainOneQuery() complicated by missing standard_ExplainOneQuery

2024-03-05 Thread Andrei Lepikhov

On 6/3/2024 06:25, Michael Paquier wrote:

Just to elaborate: the intention was to allow a section to be added to
every node in the plan containing information from further down and also
allow this information to propagate upwards. We happen to have buffer
information right now, but allowing something similar to be added
dynamically by extending ExplainNode and passing down a callback to
standard_ExplainOneQuery.


Or an extra hook at the end of ExplainNode() to be able to append more
information at node level?  Not sure if others would agree with that,
though.


We already discussed EXPLAIN hooks, at least in [1]. IMO, extensions 
should have a chance to add something to the node explain and the 
summary, if only because they can significantly influence the planner 
and executor's behaviour.


[1] 
https://www.postgresql.org/message-id/flat/6cd5caa7-06e1-4460-bf35-00a59da3f677%40garret.ru


--
regards,
Andrei Lepikhov
Postgres Professional





Re: "type with xxxx does not exist" when doing ExecMemoize()

2024-03-05 Thread Andrei Lepikhov

On 1/3/2024 14:18, Tender Wang wrote:
During debug, I learned that numeric_add doesn't have type check like 
rangetype, so aboved query will not report "type with xxx does not exist".


And I realize that  the test case added by Andrei Lepikhov  in v3 is 
right. So in v6 patch I add Andrei Lepikhov's test case.  Thanks a lot.


Now I think the v6 version patch seems to be complete now.
I've passed through the patch, and it looks okay. Although I am afraid 
of the same problems that future changes can cause and how to detect 
them, it works correctly.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2024-03-04 Thread Andrei Lepikhov

On 5/3/2024 12:30, Andrei Lepikhov wrote:

On 4/3/2024 09:26, jian he wrote:

... and the new version of the patchset is attached.

--
regards,
Andrei Lepikhov
Postgres Professional
From 1c3ac3e006cd66ff40f1ddaaa09e3fc0f3a75ca5 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Fri, 2 Feb 2024 22:01:09 +0300
Subject: [PATCH 1/2] Transform OR clauses to ANY expression.

Replace (expr op C1) OR (expr op C2) ... with expr op ANY(ARRAY[C1, C2, ...]) 
on the
preliminary stage of optimization when we are still working with the
expression tree.
Here C is a constant expression, 'expr' is non-constant expression, 'op' is
an operator which returns boolean result and has a commuter (for the case of
reverse order of constant and non-constant parts of the expression,
like 'CX op expr').
Sometimes it can lead to not optimal plan. But we think it is better to have
array of elements instead of a lot of OR clauses. Here is a room for further
optimizations on decomposing that array into more optimal parts.
Authors: Alena Rybakina , Andrey Lepikhov 

Reviewed-by: Peter Geoghegan , Ranier Vilela 
Reviewed-by: Alexander Korotkov , Robert Haas 

Reviewed-by: jian he 
---
 .../postgres_fdw/expected/postgres_fdw.out|  16 +-
 doc/src/sgml/config.sgml  |  17 +
 src/backend/nodes/queryjumblefuncs.c  |  27 ++
 src/backend/parser/parse_expr.c   | 339 ++
 src/backend/utils/misc/guc_tables.c   |  11 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/bin/pg_dump/t/002_pg_dump.pl  |   6 +-
 src/include/nodes/queryjumble.h   |   1 +
 src/include/optimizer/optimizer.h |   1 +
 src/test/regress/expected/create_index.out| 156 +++-
 src/test/regress/expected/inherit.out |   2 +-
 src/test/regress/expected/join.out|  62 +++-
 src/test/regress/expected/partition_prune.out | 219 +--
 src/test/regress/expected/rules.out   |   4 +-
 src/test/regress/expected/stats_ext.out   |  12 +-
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/expected/tidscan.out |  23 +-
 src/test/regress/sql/create_index.sql |  35 ++
 src/test/regress/sql/join.sql |  10 +
 src/test/regress/sql/partition_prune.sql  |  22 ++
 src/test/regress/sql/tidscan.sql  |   6 +
 src/tools/pgindent/typedefs.list  |   2 +
 22 files changed, 906 insertions(+), 69 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index c355e8f3f7..0523bbd8f7 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1349,7 +1349,7 @@ SELECT t1.c1, t1.c2, t2.c1, t2.c2 FROM ft4 t1 LEFT JOIN 
(SELECT * FROM ft5 WHERE
  Foreign Scan
Output: t1.c1, t1.c2, ft5.c1, ft5.c2
Relations: (public.ft4 t1) LEFT JOIN (public.ft5)
-   Remote SQL: SELECT r1.c1, r1.c2, r4.c1, r4.c2 FROM ("S 1"."T 3" r1 LEFT 
JOIN "S 1"."T 4" r4 ON (((r1.c1 = r4.c1)) AND ((r4.c1 < 10 WHERE (((r4.c1 < 
10) OR (r4.c1 IS NULL))) AND ((r1.c1 < 10))
+   Remote SQL: SELECT r1.c1, r1.c2, r4.c1, r4.c2 FROM ("S 1"."T 3" r1 LEFT 
JOIN "S 1"."T 4" r4 ON (((r1.c1 = r4.c1)) AND ((r4.c1 < 10 WHERE (((r4.c1 
IS NULL) OR (r4.c1 < 10))) AND ((r1.c1 < 10))
 (4 rows)
 
 SELECT t1.c1, t1.c2, t2.c1, t2.c2 FROM ft4 t1 LEFT JOIN (SELECT * FROM ft5 
WHERE c1 < 10) t2 ON (t1.c1 = t2.c1)
@@ -3105,7 +3105,7 @@ select array_agg(distinct (t1.c1)%5) from ft4 t1 full 
join ft5 t2 on (t1.c1 = t2
  Foreign Scan
Output: (array_agg(DISTINCT (t1.c1 % 5))), ((t2.c1 % 3))
Relations: Aggregate on ((public.ft4 t1) FULL JOIN (public.ft5 t2))
-   Remote SQL: SELECT array_agg(DISTINCT (r1.c1 % 5)), (r2.c1 % 3) FROM ("S 
1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1 WHERE (((r1.c1 < 
20) OR ((r1.c1 IS NULL) AND (r2.c1 < 5 GROUP BY 2 ORDER BY 
array_agg(DISTINCT (r1.c1 % 5)) ASC NULLS LAST
+   Remote SQL: SELECT array_agg(DISTINCT (r1.c1 % 5)), (r2.c1 % 3) FROM ("S 
1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1 WHERE r1.c1 IS 
NULL) AND (r2.c1 < 5)) OR (r1.c1 < 20))) GROUP BY 2 ORDER BY array_agg(DISTINCT 
(r1.c1 % 5)) ASC NULLS LAST
 (4 rows)
 
 select array_agg(distinct (t1.c1)%5) from ft4 t1 full join ft5 t2 on (t1.c1 = 
t2.c1) where t1.c1 < 20 or (t1.c1 is null and t2.c1 < 5) group by (t2.c1)%3 
order by 1;
@@ -3123,7 +3123,7 @@ select array_agg(distinct (t1.c1)%5 order by (t1.c1)%5) 
from ft4 t1 full join ft
  Foreign Scan
Output: (array_agg(DISTINCT (t1.c1 % 5) ORDER BY (t1.c1 % 5))), ((t2.c1 % 
3))
Relations: Aggregate on ((public.ft4 t1) FULL JOIN (public.ft5 t2))
-   Remote SQL: SELECT array_agg(DISTINCT (r1.c1 % 5) ORDER BY ((r1.c

Re: POC, WIP: OR-clause support for indexes

2024-03-04 Thread Andrei Lepikhov

On 4/3/2024 09:26, jian he wrote:

On Thu, Feb 29, 2024 at 4:59 PM Andrei Lepikhov

Feel free to add, change or totally rewrite these changes.

On replacement of static ScalarArrayOpExpr dest with dynamic allocated one:
After discussion [1] I agree with that replacement.

Some style (and language) changes in comments I haven't applied because 
it looks debatable for me.



I think it should be something like:
+ gettext_noop("Transform a sequence of OR expressions to an array
expression."),
+ gettext_noop("The planner will replace expression like 'x=c1 OR x=c2 "
+ "to the expression 'x = ANY( ARRAY[c1,c2])''"

Fixed


queryId may not be a good variable name here?
Not sure. QueryId is a concept, part of queryjumble technique and can be 
used by other tools. It just tells the developer what it is the same 
thing as Query Jumbling but for a separate expression.
At least you don't insist on removing of JumbleState return pointer that 
looks strange for a simple hash ...


comment `/* Compute query ID */`
seems not correct, here we are just hashing the expression?

The same as above.

+/*
+ * Dynahash match function to use in guc_hashtab
the above comments seem not correct?

Yes, fixed.


` It applies to equality expressions only.` seems not correct?
`select * from tenk1 where unique1 < 1 or unique1 < 2; ` can also do
the transformation.

Yes, I forgot it.

`similarity of variable sides.` seems not correct,
should  it be 'sameness of the variable sides`?

The term 'equivalence' looks better *).


in [2], we can get:
SOME is a synonym for ANY. IN is equivalent to = ANY.

but still transforming OR to ANY is not intuitive.
a normal user may not know what is "transforming OR to ANY".
so maybe adding a simple example at

would be great. which, I did at previous thread.
Not sure. Examples in that section are unusual things. What's more, 
should a user who doesn't know what it means to change this setting? 
Let's wait for other opinions.


[1] https://www.postgresql.org/message-id/2157387.1709068...@sss.pgh.pa.us

--
regards,
Andrei Lepikhov
Postgres Professional





Re: a wrong index choose when statistics is out of date

2024-03-03 Thread Andrei Lepikhov

On 4/3/2024 12:33, David Rowley wrote:

[1] 
https://www.postgresql.org/message-id/CAApHDvo2sMPF9m%3Di%2BYPPUssfTV1GB%3DZ8nMVa%2B9Uq4RZJ8sULeQ%40mail.gmail.com

Thanks for the link!
Could we use the trick with the get_actual_variable_range() to find some 
reason and extrapolate histogram data out of the boundaries when an 
index shows us that we have min/max outside known statistics?
Because it would be used for the values out of the histogram, it should 
only add an overhead with a reason.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: a wrong index choose when statistics is out of date

2024-03-03 Thread Andrei Lepikhov

On 3/3/2024 14:01, Andy Fan wrote:

1. We can let the user define the column as the value is increased day by
day. the syntax may be:

ALTER TABLE x_events ALTER COLUMN created_at ALWAYS_INCREASED.

then when a query like 'create_at op const', the statistics module can
treat it as 'created_at = $1'. so the missing statistics doesn't make
difference. Then I think the above issue can be avoided.

Let me write some words to support your efforts in that way.
I also have some user cases where they periodically insert data in large 
chunks. These chunks contain 'always increased' values, and it causes 
trouble each time they start an analytic query over this new data before 
the analyze command.
I have thought about that issue before but invented nothing special 
except a more aggressive analysis of such tables.
Your trick can work, but it needs a new parameter in pg_type and a lot 
of additional code for such a rare case.

I'm looking forward to the demo patch.

--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2024-03-03 Thread Andrei Lepikhov

On 1/3/2024 22:33, Alexander Korotkov wrote:

I'm going to review and revise the patch.

Nice!


One question I have yet.

 >        /*
 >         * Transformation only works with both side type is not
 >         * { array | composite | domain | record }.

Why do we limit transformation for these types?  Also, it doesn't seem 
the current code restricts anything except composite/record.

Answer can be a bit long. Let's try to see comment a4424c5 at first.

We forbid record types although they can have typarray. It is because of 
the RowExpr comparison specific. Although we have the record_eq() 
routine, all base types in comparing records must be strictly the same. 
Let me show:


explain analyze
SELECT * FROM
(SELECT ROW(relpages,relnatts) AS x FROM pg_class LIMIT 10) AS q1,
(SELECT ROW(relpages,relallvisible) AS x FROM pg_class LIMIT 10) AS q2
WHERE q1.x=q2.x;
ERROR:  cannot compare dissimilar column types smallint and integer at 
record column 2


As you can see, we have smallint and integer in the second position of 
RowExpr and it causes the ERROR. It is the reason, why PostgreSQL 
transforms ROW expressions to the series of ORs, Look:


explain (costs off)
SELECT oid,relname FROM pg_class
WHERE (oid,relname) IN ((1, 'a'), (2,'b'));

 Bitmap Heap Scan on pg_class
   Recheck Cond: ((relname = 'a'::name) OR (relname = 'b'::name))
   Filter: (((oid = '1'::oid) AND (relname = 'a'::name)) OR ((oid = 
'2'::oid) AND (relname = 'b'::name)))

   ->  BitmapOr
...

So, transforming composite types to the ScalarArrayOpExpr expression 
doesn't make sense. Am I wrong?


The same with domain. If it have composite base type we reject the 
transformation according to the logic above.


What about arrays? As I see, arrays don't have typarray and we can avoid 
to spend more cycles after detection of TYPCATEGORY_ARRAY. I haven't 
done it yet because have a second thought: what if to combine arrays 
into the larger one? I'm unsure on that, so we can forbid it too.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2024-02-29 Thread Andrei Lepikhov

On 28/2/2024 17:27, Alena Rybakina wrote:

Maybe like that:

It also considers the way to generate a path using BitmapScan indexes, 
converting the transformed expression into expressions separated by "OR" 
operations, and if it turns out to be the best and finally selects the 
best one.

Thanks,
I spent some time describing the feature with documentation.
A condensed description of the GUC is in the runtime-config file.
Feature description has spread between TransformOrExprToANY and 
generate_saop_pathlist routines.

Also, I've made tiny changes in the code to look more smoothly.
All modifications are integrated into the two new patches.

Feel free to add, change or totally rewrite these changes.

--
regards,
Andrei Lepikhov
Postgres Professional
From 015a564cc784139c806a7004f25bf5f7a4b4a29d Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Fri, 2 Feb 2024 22:01:09 +0300
Subject: [PATCH 1/2] Transform OR clause to ANY expressions.

Replace (expr op C1) OR (expr op C2) ... with expr op ANY(C1, C2, ...) on the
preliminary stage of optimization when we are still working with the tree
expression.
Here C is a constant expression, 'expr' is non-constant expression, 'op' is
an operator which returns boolean result and has a commuter (for the case of
reverse order of constant and non-constant parts of the expression,
like 'CX op expr').
Sometimes it can lead to not optimal plan. But we think it is better to have
array of elements instead of a lot of OR clauses. Here is a room for further
optimizations on decomposing that array into more optimal parts.
Authors: Alena Rybakina , Andrey Lepikhov 

Reviewed-by: Peter Geoghegan , Ranier Vilela 
Reviewed-by: Alexander Korotkov , Robert Haas 

Reviewed-by: jian he 
---
 .../postgres_fdw/expected/postgres_fdw.out|  16 +-
 doc/src/sgml/config.sgml  |  18 +
 src/backend/nodes/queryjumblefuncs.c  |  27 ++
 src/backend/parser/parse_expr.c   | 340 ++
 src/backend/utils/misc/guc_tables.c   |  11 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/bin/pg_dump/t/002_pg_dump.pl  |   6 +-
 src/include/nodes/queryjumble.h   |   1 +
 src/include/optimizer/optimizer.h |   1 +
 src/test/regress/expected/create_index.out| 156 +++-
 src/test/regress/expected/inherit.out |   2 +-
 src/test/regress/expected/join.out|  62 +++-
 src/test/regress/expected/partition_prune.out | 219 +--
 src/test/regress/expected/rules.out   |   4 +-
 src/test/regress/expected/stats_ext.out   |  12 +-
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/expected/tidscan.out |  23 +-
 src/test/regress/sql/create_index.sql |  35 ++
 src/test/regress/sql/join.sql |  10 +
 src/test/regress/sql/partition_prune.sql  |  22 ++
 src/test/regress/sql/tidscan.sql  |   6 +
 src/tools/pgindent/typedefs.list  |   2 +
 22 files changed, 908 insertions(+), 69 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index c355e8f3f7..0523bbd8f7 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1349,7 +1349,7 @@ SELECT t1.c1, t1.c2, t2.c1, t2.c2 FROM ft4 t1 LEFT JOIN 
(SELECT * FROM ft5 WHERE
  Foreign Scan
Output: t1.c1, t1.c2, ft5.c1, ft5.c2
Relations: (public.ft4 t1) LEFT JOIN (public.ft5)
-   Remote SQL: SELECT r1.c1, r1.c2, r4.c1, r4.c2 FROM ("S 1"."T 3" r1 LEFT 
JOIN "S 1"."T 4" r4 ON (((r1.c1 = r4.c1)) AND ((r4.c1 < 10 WHERE (((r4.c1 < 
10) OR (r4.c1 IS NULL))) AND ((r1.c1 < 10))
+   Remote SQL: SELECT r1.c1, r1.c2, r4.c1, r4.c2 FROM ("S 1"."T 3" r1 LEFT 
JOIN "S 1"."T 4" r4 ON (((r1.c1 = r4.c1)) AND ((r4.c1 < 10 WHERE (((r4.c1 
IS NULL) OR (r4.c1 < 10))) AND ((r1.c1 < 10))
 (4 rows)
 
 SELECT t1.c1, t1.c2, t2.c1, t2.c2 FROM ft4 t1 LEFT JOIN (SELECT * FROM ft5 
WHERE c1 < 10) t2 ON (t1.c1 = t2.c1)
@@ -3105,7 +3105,7 @@ select array_agg(distinct (t1.c1)%5) from ft4 t1 full 
join ft5 t2 on (t1.c1 = t2
  Foreign Scan
Output: (array_agg(DISTINCT (t1.c1 % 5))), ((t2.c1 % 3))
Relations: Aggregate on ((public.ft4 t1) FULL JOIN (public.ft5 t2))
-   Remote SQL: SELECT array_agg(DISTINCT (r1.c1 % 5)), (r2.c1 % 3) FROM ("S 
1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1 WHERE (((r1.c1 < 
20) OR ((r1.c1 IS NULL) AND (r2.c1 < 5 GROUP BY 2 ORDER BY 
array_agg(DISTINCT (r1.c1 % 5)) ASC NULLS LAST
+   Remote SQL: SELECT array_agg(DISTINCT (r1.c1 % 5)), (r2.c1 % 3) FROM ("S 
1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1 WHERE r1.c1 IS 
NULL) AND (r2.c1 < 5)) OR (r1.c1 < 20))) GROUP BY 2 ORDER BY array_

Re: POC, WIP: OR-clause support for indexes

2024-02-28 Thread Andrei Lepikhov

On 28/2/2024 17:07, jian he wrote:

doc/src/sgml/array.sgml corresponds to
https://www.postgresql.org/docs/current/arrays.html.
  this GUC is related to parser|optimzier.
adding a GUC to array.sgml seems strange. (I think).
Maybe. In that case, I suggest adding extended comments to functions 
transformBoolExprOr and generate_saop_pathlist (including 
cross-referencing each other). These are starting points to understand 
the transformation and, therefore, a good place for a detailed explanation.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: "type with xxxx does not exist" when doing ExecMemoize()

2024-02-27 Thread Andrei Lepikhov

On 28/2/2024 13:53, Tender Wang wrote:
The attached patch is a new version based on v3(not including Andrei's 
the test case). There is no need to call datumCopy when

isnull is true.

I have not added a new runtime memoryContext so far. Continue to use 
mstate->tableContext, I'm not sure the memory used of probeslot will 
affect mstate->mem_limit.
Maybe adding a new memoryContext is better. I think I should spend a 
little time to learn nodeMemoize.c more deeply.
I am curious about your reasons to stay with tableContext. In terms of 
memory allocation, Richard's approach looks better.
Also, You don't need to initialize tts_values[i] at all if tts_isnull[i] 
set to true.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2024-02-27 Thread Andrei Lepikhov

On 26/2/2024 11:10, Alena Rybakina wrote:

On 24.02.2024 14:28, jian he wrote:

Hi.
I wrote the first draft patch of the documentation.
it's under the section: Planner Method Configuration 
(runtime-config-query.html)

but this feature's main meat is in src/backend/parser/parse_expr.c
so it may be slightly inconsistent, as mentioned by others.

You can further furnish it.


Thank you for your work. I found a few spelling mistakes - I fixed this 
and added some information about generating a partial index plan. I 
attached it.

Thanks Jian and Alena,
It is a good start for the documentation. But I think the runtime-config 
section needs only a condensed description of a feature underlying the 
GUC. The explanations in this section look a bit awkward.
Having looked through the documentation for a better place for a 
detailed explanation, I found array.sgml as a candidate. Also, we have 
the parser's short overview section. I'm unsure about the best place but 
it is better when the server config section.

What's more, there are some weak points in the documentation:
1. We choose constant and variable parts of an expression and don't 
require the constant to be on the right side.
2. We should describe the second part of the feature, where the 
optimiser can split an array to fit the optimal BitmapOr scan path.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: The const expression evaluation routine should always return a copy

2024-02-27 Thread Andrei Lepikhov

On 28/2/2024 04:19, Tom Lane wrote:

Andrei Lepikhov  writes:

IMO, the routine eval_const_expressions_mutator contains some stale code:


I'd like to push back against the idea that eval_const_expressions
is expected to return a freshly-copied tree.  Its API specification
contains no such claim.  It's true that it appears to do that
everywhere but here, but I think that's an implementation detail
that callers had better not depend on.  It's not hard to imagine
someone trying to improve its performance by avoiding unnecessary
copying.
Thanks for the explanation. I was just such a developer of extensions 
who had looked into the internals of the eval_const_expressions, found a 
call for a '_mutator' function, and made such a mistake :).
I agree that freeing the return node value doesn't provide a sensible 
benefit because the underlying bushy tree was copied during mutation. 
What's more, it makes even less sense with the selectivity context 
coming shortly (I hope).


--
regards,
Andrei Lepikhov
Postgres Professional





The const expression evaluation routine should always return a copy

2024-02-26 Thread Andrei Lepikhov

IMO, the routine eval_const_expressions_mutator contains some stale code:

case T_SubPlan:
case T_AlternativeSubPlan:
  /*
   * Return a SubPlan unchanged --- too late to do anything with it.
   *
   * XXX should we ereport() here instead?  Probably this routine
   * should never be invoked after SubPlan creation.
   */
   return node;

At least, this code could be achieved with estimate_expression_value(). 
So, we should fix the comment. But maybe we need to do a bit more. 
According to the mutator idea, only the Query node is returned 
unchanged. If the Subplan node is on top of the expression, the call 
above returns the same node, which may be unconventional.
I'm not totally sure about the impossibility of constantifying SubPlan: 
we already have InitPlans for uncorrelated subplans. Maybe something 
about parameters that can be estimated as constants at this level and, 
as a result, allow to return a Const instead of SubPlan?
But at least we can return a flat copy of the SubPplan node just for the 
convention — the same thing for the AlternativeSubPlan. See the patch in 
the attachment.


--
regards,
Andrei Lepikhov
Postgres ProfessionalFrom a227e7e72d917726085001027c350d2fda2ad3c4 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Tue, 27 Feb 2024 11:00:23 +0700
Subject: [PATCH] Force eval_const_expressions_mutator to return a copy of the
 node.

In some situations, when SubPlan or AlternativeSubPlan nodes are on the top of
the expression tree, this routine returns the same node, which could baffle
users who, remembering the mutator convention, wanted to free the evaluation
result immediately.

Author: a.rybakina.
---
 src/backend/optimizer/util/clauses.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/src/backend/optimizer/util/clauses.c 
b/src/backend/optimizer/util/clauses.c
index edc25d712e..f09c6c6420 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -2907,15 +2907,28 @@ eval_const_expressions_mutator(Node *node,
}
 
case T_SubPlan:
+   {
+   SubPlan *subplan = makeNode(SubPlan);
+
+   memcpy(subplan, node, sizeof(SubPlan));
+
+   /*
+* Return a SubPlan unchanged. If the subplan had been 
uncorrelated
+* it already have been converted to an InitPlan.
+*/
+   return (Node *) subplan;
+   }
case T_AlternativeSubPlan:
+   {
+   AlternativeSubPlan *subplan = 
makeNode(AlternativeSubPlan);
+
+   memcpy(subplan, node, sizeof(AlternativeSubPlan));
 
/*
 * Return a SubPlan unchanged --- too late to do 
anything with it.
-*
-* XXX should we ereport() here instead?  Probably this 
routine
-* should never be invoked after SubPlan creation.
 */
-   return node;
+   return (Node *) subplan;
+   }
case T_RelabelType:
{
RelabelType *relabel = (RelabelType *) node;
-- 
2.43.2



Re: "type with xxxx does not exist" when doing ExecMemoize()

2024-02-26 Thread Andrei Lepikhov

On 26/2/2024 18:34, Richard Guo wrote:


On Mon, Feb 26, 2024 at 3:54 PM Andrei Lepikhov 
mailto:a.lepik...@postgrespro.ru>> wrote:


On 26/2/2024 12:44, Tender Wang wrote:
 > Make sense. I found MemoizeState already has a MemoryContext, so
I used it.
 > I update the patch.
This approach is better for me. In the next version of this patch, I
included a test case. I am still unsure about the context chosen and
the
stability of the test case. Richard, you recently fixed some Memoize
issues, could you look at this problem and patch?


I looked at this issue a bit.  It seems to me what happens is that at
first the memory areas referenced by probeslot->tts_values[] are
allocated in the per tuple context (see prepare_probe_slot).  And then
in MemoizeHash_hash, after we've calculated the hashkey, we will reset
the per tuple context.  However, later in MemoizeHash_equal, we still
need to reference the values in probeslot->tts_values[], which have been
cleared.

Agree


Actually the context would always be reset in MemoizeHash_equal, for
both binary and logical mode.  So I kind of wonder if it's necessary to
reset the context in MemoizeHash_hash.
I can only provide one thought against this solution: what if we have a 
lot of unique hash values, maybe all of them? In that case, we still 
have a kind of 'leak' David fixed by the commit 0b053e78b5.
Also, I have a segfault report of one client. As I see, it was caused by 
too long text column in the table slot. As I see, key value, stored in 
the Memoize hash table, was corrupted, and the most plain reason is this 
bug. Should we add a test on this bug, and what do you think about the 
one proposed in v3?


--
regards,
Andrei Lepikhov
Postgres Professional





Re: "type with xxxx does not exist" when doing ExecMemoize()

2024-02-25 Thread Andrei Lepikhov

On 26/2/2024 12:44, Tender Wang wrote:



Andrei Lepikhov <mailto:a.lepik...@postgrespro.ru>> 于2024年2月26日周一 10:57写道:


On 25/2/2024 20:32, Tender Wang wrote:
 > I think in prepare_probe_slot(), should called datumCopy as the
attached
 > patch does.
 >
 > Any thoughts? Thanks.
Thanks for the report.
I think it is better to invent a Runtime Memory Context; likewise,
it is
already designed in IndexScan and derivatives. Here, you just allocate
the value in some upper memory context.
Also, I'm curious why such a trivial error hasn't been found for a
long time


Make sense. I found MemoizeState already has a MemoryContext, so I used it.
I update the patch.
This approach is better for me. In the next version of this patch, I 
included a test case. I am still unsure about the context chosen and the 
stability of the test case. Richard, you recently fixed some Memoize 
issues, could you look at this problem and patch?


--
regards,
Andrei Lepikhov
Postgres Professional
From e32900e50730bccfde26355609d6b0b3e970f0a8 Mon Sep 17 00:00:00 2001
From: "tender.wang" 
Date: Mon, 26 Feb 2024 13:38:30 +0800
Subject: [PATCH] Store Memoize probeslot values in the hash table memory
 context.

Values of probeslot evaluates in expression memory context which can be reset
on hash comparison when we still need it for the equality comparison.
So we copy the result to probeslot from ExecEvalExpr.
---
 src/backend/executor/nodeMemoize.c| 23 ---
 src/test/regress/expected/memoize.out | 26 ++
 src/test/regress/sql/memoize.sql  | 14 ++
 3 files changed, 56 insertions(+), 7 deletions(-)

diff --git a/src/backend/executor/nodeMemoize.c 
b/src/backend/executor/nodeMemoize.c
index 18870f10e1..6402943772 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -312,17 +312,26 @@ prepare_probe_slot(MemoizeState *mstate, MemoizeKey *key)
if (key == NULL)
{
ExprContext *econtext = mstate->ss.ps.ps_ExprContext;
+   Datum value;
+   bool isnull;
+   TupleDesc tup = pslot->tts_tupleDescriptor;
+   Form_pg_attribute att;
MemoryContext oldcontext;
 
-   oldcontext = 
MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
-
/* Set the probeslot's values based on the current parameter 
values */
for (int i = 0; i < numKeys; i++)
-   pslot->tts_values[i] = 
ExecEvalExpr(mstate->param_exprs[i],
-   
econtext,
-   
>tts_isnull[i]);
-
-   MemoryContextSwitchTo(oldcontext);
+   {
+   att = TupleDescAttr(tup, i);
+   value = 
ExecEvalExprSwitchContext(mstate->param_exprs[i],
+   
  econtext,
+   
  );
+   /* Copy the value to avoid freed after resetting 
ExprContext */
+   oldcontext = 
MemoryContextSwitchTo(mstate->tableContext);
+   pslot->tts_values[i] = datumCopy(value, att->attbyval, 
att->attlen);
+   MemoryContextSwitchTo(oldcontext);
+
+   pslot->tts_isnull[i] = isnull;
+   }
}
else
{
diff --git a/src/test/regress/expected/memoize.out 
b/src/test/regress/expected/memoize.out
index cf6886a288..9842b238c6 100644
--- a/src/test/regress/expected/memoize.out
+++ b/src/test/regress/expected/memoize.out
@@ -196,6 +196,32 @@ SELECT * FROM flt f1 INNER JOIN flt f2 ON f1.f >= f2.f;', 
false);
 (10 rows)
 
 DROP TABLE flt;
+CREATE TABLE expr_key (x numeric, t text);
+INSERT INTO expr_key (x,t) SELECT d1::numeric, d1::text FROM (
+  SELECT round((d/pi())::numeric,7) AS d1 FROM generate_series(1,20) AS d);
+-- Having duplicates we must see hits in the Memoize node
+INSERT INTO expr_key SELECT * FROM expr_key;
+CREATE INDEX expr_key_idx_x_t ON expr_key (x,t);
+VACUUM ANALYZE expr_key;
+SELECT explain_memoize('
+SELECT * FROM expr_key t1 INNER JOIN expr_key t2
+ON t1.x=t2.t::numeric AND t1.t::numeric=t2.x;', true);
+  explain_memoize  

+---
+ Nested Loop (actual rows=80 loops=N)
+   ->  Index Only Scan using expr_key_idx_x_t on expr_key t1 (actual rows=40 
loops=N)
+ Heap Fetches: N
+   ->  Memoize (actual rows=2 loops=N)
+ Cache Key: t1.x, (t1.t)::numeric
+  

Re: "type with xxxx does not exist" when doing ExecMemoize()

2024-02-25 Thread Andrei Lepikhov

On 26/2/2024 09:52, Andrei Lepikhov wrote:

On 25/2/2024 20:32, Tender Wang wrote:
I think in prepare_probe_slot(), should called datumCopy as the 
attached patch does.


Any thoughts? Thanks.

Thanks for the report.
I think it is better to invent a Runtime Memory Context; likewise, it is 
already designed in IndexScan and derivatives. Here, you just allocate 
the value in some upper memory context.
Also, I'm curious why such a trivial error hasn't been found for a long 
time
Hmmm. I see the problem (test.sql in attachment for reproduction and 
results). We only detect it by the number of Hits:

  Cache Key: t1.x, (t1.t)::numeric
  Cache Mode: logical
  Hits: 0  Misses: 30  Evictions: 0  Overflows: 0  Memory Usage: 8kB

We see no hits in logical mode and 100 hits in binary mode. We see 15 
hits for both logical and binary mode if parameters are integer numbers 
- no problems with resetting expression context.


Your patch resolves the issue for logical mode - I see 15 hits for 
integer and complex keys. But I still see 100 hits in binary mode. Maybe 
we still have a problem?


What's more, why the Memoize node doesn't see the problem at all?

--
regards,
Andrei Lepikhov
Postgres Professional


test.sql
Description: application/sql


Re: "type with xxxx does not exist" when doing ExecMemoize()

2024-02-25 Thread Andrei Lepikhov

On 25/2/2024 20:32, Tender Wang wrote:
I think in prepare_probe_slot(), should called datumCopy as the attached 
patch does.


Any thoughts? Thanks.

Thanks for the report.
I think it is better to invent a Runtime Memory Context; likewise, it is 
already designed in IndexScan and derivatives. Here, you just allocate 
the value in some upper memory context.

Also, I'm curious why such a trivial error hasn't been found for a long time

--
regards,
Andrei Lepikhov
Postgres Professional





Re: Removing unneeded self joins

2024-02-22 Thread Andrei Lepikhov

On 21/2/2024 14:26, Richard Guo wrote:

I think the right fix for these issues is to introduce a new element
'sublevels_up' in ReplaceVarnoContext, and enhance replace_varno_walker
to 1) recurse into subselects with sublevels_up increased, and 2)
perform the replacement only when varlevelsup is equal to sublevels_up.

This code looks good. No idea how we have lost it before.


While writing the fix, I noticed some outdated comments.  Such as in
remove_rel_from_query, the first for loop updates otherrel's attr_needed
as well as lateral_vars, but the comment only mentions attr_needed.  So
this patch also fixes some outdated comments.

Thanks, looks good.


While writing the test cases, I found that the test cases for SJE are
quite messy.  Below are what I have noticed:

* There are several test cases using catalog tables like pg_class,
pg_stats, pg_index, etc. for testing join removal.  I don't see a reason
why we need to use catalog tables, and I think this just raises the risk
of instability.

I see only one unusual query with the pg_class involved.


* In many test cases, a mix of uppercase and lowercase keywords is used
in one query.  I think it'd better to maintain consistency by using
either all uppercase or all lowercase keywords in one query.

I see uppercase -> lowercase change:
select t1.*, t2.a as ax from sj t1 join sj t2
and lowercase -> uppercase in many other cases:
explain (costs off)
I guess it is a matter of taste, so give up for the committer decision. 
Technically, it's OK.


* In most situations, we verify the plan and the output of a query like:

explain (costs off)
select ...;
select ...;

The two select queries are supposed to be the same.  But in the SJE test
cases, I have noticed instances where the two select queries differ from
each other.

This patch also includes some cosmetic tweaks for SJE test cases.  It
does not change the test cases using catalog tables though.  I think
that should be a seperate patch.
I can't assess the necessity of changing these dozens of lines of code 
because I follow another commenting style, but technically, it's still OK.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC: GROUP BY optimization

2024-02-21 Thread Andrei Lepikhov

On 22/2/2024 13:35, Richard Guo wrote:

The avg() function on integer argument is commonly used in
aggregates.sql.  I don't think this is an issue.  See the first test
query in aggregates.sql.

Make sense

 > it should be parallel to the test cases for utilize the ordering of
 > index scan and subquery scan.

Also, I'm unsure about removing the disabling of the
max_parallel_workers_per_gather parameter. Have you discovered the
domination of the current plan over the partial one? Do the cost
fluctuations across platforms not trigger a parallel plan?


The table used for testing contains only 100 tuples, which is the size
of only one page.  I don't believe it would trigger any parallel plans,
unless we manually change min_parallel_table_scan_size.
I don't intend to argue it, but just for the information, I frequently 
reduce it to zero, allowing PostgreSQL to make a decision based on 
costs. It sometimes works much better, because one small table in multi 
join can disallow an effective parallel plan.


What's more, I suggest to address here the complaint from [1]. As I
see,
cost difference between Sort and IncrementalSort strategies in that
case
is around 0.5. To make the test more stable I propose to change it a
bit
and add a limit:
SELECT count(*) FROM btg GROUP BY z, y, w, x LIMIT 10;
It makes efficacy of IncrementalSort more obvious difference around 10
cost points.


I don't think that's necessary.  With Incremental Sort the final cost
is:

     GroupAggregate  (cost=1.66..19.00 rows=100 width=25)

while with full Sort it is:

     GroupAggregate  (cost=16.96..19.46 rows=100 width=25)

With the STD_FUZZ_FACTOR (1.01), there is no doubt that the first path
is cheaper on total cost.  Not to say that even if somehow we decide the
two paths are fuzzily the same on total cost, the first path still
dominates because its startup cost is much cheaper.
As before, I won't protest here - it needs some computations about how 
much cost can be added by bulk extension of the relation blocks. If 
Maxim will answer that it's enough to resolve his issue, why not?


--
regards,
Andrei Lepikhov
Postgres Professional





Re: Unlinking Parallel Hash Join inner batch files sooner

2024-02-21 Thread Andrei Lepikhov

On 22/2/2024 06:42, Thomas Munro wrote:

On Wed, Feb 21, 2024 at 7:34 PM Andrei Lepikhov
 wrote:

I see in [1] that the reporter mentioned a delay between the error
message in parallel HashJoin and the return control back from PSQL. Your
patch might reduce this delay.
Also, I have the same complaint from users who processed gigabytes of
data in parallel HashJoin. Presumably, they also stuck into the unlink
of tons of temporary files. So, are you going to do something with this
code?


Yeah, right.  I will aim to get this into the tree next week.  First,
there are a couple of minor issues to resolve around freeing that
Heikki mentioned.  Then there is the question of whether we think this
might be a candidate for back-patching, given the complaints you
mention.  Opinions?
The code is related to performance, not a bug. Also, it adds one 
external function into the 'sharedtuplestore.h'. IMO, it isn't worth it 
to make back-patches.


I would add that the problems you reach when you get to very large
number of partitions are hard (see several very long threads about
extreme skew for one version of the problem, but even with zero/normal
skewness and perfect estimation of the number of partitions, if you
ask a computer to partition 42TB of data into partitions that fit in a
work_mem suitable for a Commodore 64, it's gonna hurt on several
levels) and this would only slightly improve one symptom.  One idea
that might improve just the directory entry and file descriptor
aspect, would be to scatter the partitions into (say) 1MB chunks
within the file, and hope that the file system supports holes (a bit
like logtape.c's multiplexing but I wouldn't do it quite like that).

Thanks, I found in [1] good entry point to dive into this issue.

[1] 
https://www.postgresql.org/message-id/CA+hUKGKDbv+5uiJZDdB1wttkMPFs9CDb6=02qxitq4am-kb...@mail.gmail.com


--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC: GROUP BY optimization

2024-02-21 Thread Andrei Lepikhov

On 22/2/2024 09:09, Richard Guo wrote:


On Wed, Feb 21, 2024 at 6:20 PM Alexander Korotkov <mailto:aekorot...@gmail.com>> wrote:


Hi, Richard!

 > What do you think about the revisions for the test cases?

I've rebased your patch upthread.  Did some minor beautifications.

 > * The table 'btg' is inserted with 1 tuples, which seems a bit
 > expensive for a test.  I don't think we need such a big table to test
 > what we want.

Your patch reduces the number of rows to 1000 tuples.  I found it
possible to further reduce it to 100 tuples.  That also allowed me to
save the plan in the test case introduced by e1b7fde418.

Please check if you're OK with the patch attached.


I looked through the v2 patch and have two comments.

* The test case under "Check we don't pick aggregate path key instead of
grouping path key" does not have EXPLAIN to show the plan.  So how can
we ensure we do not mistakenly select the aggregate pathkeys instead of
the grouping pathkeys?
I confirm it works correctly. I am not sure about the stability of the 
zeros number in the output on different platforms here:

   avg

 4.
 5.
It was why I'd used the format() function before. So, may we elaborate 
on the test and restrict the output?


* I don't think the test case introduced by e1b7fde418 is still needed,
because we already have one under "Utilize the ordering of merge join to
avoid a full Sort operation".  This kind of test case is just to ensure
that we are able to utilize the ordering of the subplans underneath.  So
it should be parallel to the test cases for utilize the ordering of
index scan and subquery scan.
I confirm, this version also checks ec_sortref initialization in the 
case when ec are contructed from WHERE clause. Generally, I like more 
one test for one issue instead of one test for all at once. But it works 
and I don't have any reason to dispute it.


Also, I'm unsure about removing the disabling of the 
max_parallel_workers_per_gather parameter. Have you discovered the 
domination of the current plan over the partial one? Do the cost 
fluctuations across platforms not trigger a parallel plan?


What's more, I suggest to address here the complaint from [1]. As I see, 
cost difference between Sort and IncrementalSort strategies in that case 
is around 0.5. To make the test more stable I propose to change it a bit 
and add a limit:

SELECT count(*) FROM btg GROUP BY z, y, w, x LIMIT 10;
It makes efficacy of IncrementalSort more obvious difference around 10 
cost points.


[1] 
https://www.postgresql.org/message-id/CACG=ezaYM1tr6Lmp8PRH1aeZq=rbkxeotwgzmclad5mphfw...@mail.gmail.com


--
regards,
Andrei Lepikhov
Postgres Professional





Re: Removing unneeded self joins

2024-02-21 Thread Andrei Lepikhov

On 21/2/2024 14:26, Richard Guo wrote:

This patch also includes some cosmetic tweaks for SJE test cases.  It
does not change the test cases using catalog tables though.  I think
that should be a seperate patch.
Thanks for this catch, it is really messy thing, keeping aside the 
question why we need two different subtrees for the same query.

I will look into your fix.

--
regards,
Andrei Lepikhov
Postgres Professional





Re: Unlinking Parallel Hash Join inner batch files sooner

2024-02-20 Thread Andrei Lepikhov

Hi,

I see in [1] that the reporter mentioned a delay between the error 
message in parallel HashJoin and the return control back from PSQL. Your 
patch might reduce this delay.
Also, I have the same complaint from users who processed gigabytes of 
data in parallel HashJoin. Presumably, they also stuck into the unlink 
of tons of temporary files. So, are you going to do something with this 
code?


[1] 
https://www.postgresql.org/message-id/18349-83d33dd3d0c855c3%40postgresql.org


--
regards,
Andrei Lepikhov
Postgres Professional





Re: [POC] Allow flattening of subquery with a link to upper query

2024-02-20 Thread Andrei Lepikhov

On 20/2/2024 17:43, David Rowley wrote:
On Tue, 20 Feb 2024 at 22:57, Andrei Lepikhov  wrote: 
I agree that it would be nice to teach the planner how to do this, but

I think it just has to be a cost-based decision.  Imagine how the
transformed query would perform of pg_am had a billion rows and
pg_class had 1 row. That's quite a costly hash table build to be
probing it just once.
True, the origins of this work lie in foreign tables where such a query 
generates an even worse situation.



I didn't follow the patch, but there was a patch to push aggregate
function evaluation down [1].  I imagine this has the same problem as
if you just blindly pushed and aggregate function evaluation as deep
as you could evaluate all the aggregate's parameters and group by vars
then you may end up aggregating far more than you need to as some join
could eliminate the majority of the groups.  I think we'd need to come
up with some way to have the planner consider these types of
optimisations as alternatives to what happens today and only apply
them when we estimate that they're cheaper.  Right now a Path has no
ability to describe that it's performed GROUP BY.
Thanks for the link. We also ended up with the idea of an alternative 
subtree (inspired by the approach of AlternativeSubplan). Here, we just 
explain the current state of the pull-up sublink technique.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: [POC] Allow flattening of subquery with a link to upper query

2024-02-20 Thread Andrei Lepikhov

On 1/9/2022 19:24, Richard Guo wrote:

Even if we ignore these assertion checks, in the final plan we would
have to access the RHS of the B/C semi join, i.e. C, to evaluate qual
'c.j = a.j' at the join level of A/BC join, which is wrong.
Having committed 9f13376396 recently, we did a lot of work in this area. 
By applying regression tests from my last patch [1] to the master, I 
compared these two implementations.
As I see, using the LATERAL trick allowed us to simplify the code 
drastically. But because we know just a fact of the lateral link, not 
its place, in the master we do less when in the patch proposed in that 
thread. For example, having query:


explain (costs off)
SELECT relname FROM pg_class c1
WHERE relname = ANY (
  SELECT a.amname from pg_am a WHERE a.oid=c1.oid GROUP BY a.amname
);

We see on master:
 Nested Loop
   ->  Seq Scan on pg_class c1
   ->  Subquery Scan on "ANY_subquery"
 Filter: (c1.relname = "ANY_subquery".amname)
 ->  Group
   Group Key: a.amname
   ->  Sort
 Sort Key: a.amname
 ->  Seq Scan on pg_am a
   Filter: (oid = c1.oid)

And with this patch:
 Hash Join
   Hash Cond: ((c1.relname = a.amname) AND (c1.oid = a.oid))
   ->  Seq Scan on pg_class c1
   ->  Hash
 ->  HashAggregate
   Group Key: a.amname
   ->  Seq Scan on pg_am a

Also, we attempted to fix links from a non-parent query block.
So, in my opinion, the reason for this patch still exists, and we can 
continue this work further, maybe elaborating on flattening LATERAL 
references - this needs some research.


[1] 
https://www.postgresql.org/message-id/35c8a3e8-d080-dfa8-2be3-cf5fe702010a%40postgrespro.ru


--
regards,
Andrei Lepikhov
Postgres Professional





Re: Optimize planner memory consumption for huge arrays

2024-02-19 Thread Andrei Lepikhov

On 20/2/2024 04:51, Tom Lane wrote:

Tomas Vondra  writes:

On 2/19/24 16:45, Tom Lane wrote:

Tomas Vondra  writes:

For example, I don't think we expect selectivity functions to allocate
long-lived objects, right? So maybe we could run them in a dedicated
memory context, and reset it aggressively (after each call).

Here's a quick and probably-incomplete implementation of that idea.
I've not tried to study its effects on memory consumption, just made
sure it passes check-world.
Thanks for the sketch. The trick with the planner_tmp_cxt_depth 
especially looks interesting.
I think we should design small memory contexts - one per scalable 
direction of memory utilization, like selectivity or partitioning 
(appending ?).
My coding experience shows that short-lived GEQO memory context forces 
people to learn on Postgres internals more precisely :).


--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2024-02-19 Thread Andrei Lepikhov

On 20/2/2024 11:03, jian he wrote:

Neither the code comments nor the commit message really explain the
design idea here. That's unfortunate, principally because it makes
review difficult.

I'm very skeptical about the idea of using JumbleExpr for any part of
this. It seems fairly expensive, and it might produce false matches.
If expensive is OK, then why not just use equal()? If it's not, then
this probably isn't really OK either. But in any case there should be
comments explaining why this strategy was chosen.


The above message
(https://postgr.es/m/CA%2BTgmoZCgP6FrBQEusn4yaWm02XU8OPeoEMk91q7PRBgwaAkFw%40mail.gmail.com)
seems still not answered.
How can we evaluate whether JumbleExpr is expensive or not?
I used this naive script to test, but didn't find a big difference
when enable_or_transformation is ON or OFF.
First, I am open to discussion here. But IMO, equal() operation is quite 
expensive by itself. We should use the hash table approach to avoid 
quadratic behaviour when looking for similar clauses in the 'OR' list.
Moreover, we use equal() in many places: selectivity estimations, 
proving of fitting the index, predtest, etc. So, by reducing the clause 
list, we eliminate many calls of the equal() routine, too.



`leftop operator rightop`
the operator can also be volatile.
Do we need to check (op_volatile(opno) == PROVOLATILE_VOLATILE) within
transformBoolExprOr?

As usual, could you provide a test case to discuss it more objectively?

--
regards,
Andrei Lepikhov
Postgres Professional





Re: Optimize planner memory consumption for huge arrays

2024-02-19 Thread Andrei Lepikhov

On 19/2/2024 20:47, Tomas Vondra wrote:

On 9/8/23 07:11, Lepikhov Andrei wrote:

Just for comparison, without partitioning:
elems   1   1E1 1E2 1E3 1E4 
master: 12kB14kB37kB266kB   2.5MB
patched:12kB11.5kB  13kB24kB141kB



These improvements look pretty nice, considering how simple the patch
seems to be. I can't even imagine how much memory we'd need with even
more partitions (say, 1000) if 100 partitions means 274MB.

BTW when releasing memory in scalararraysel, wouldn't it be good to also
free the elem_values/elem_nulls? I haven't tried and maybe it's not that
significant amount.

Agree. Added into the next version of the patch.
Moreover, I see a slight planning speedup. Looking into the reason for 
that, I discovered that it is because sometimes the planner utilizes the 
same memory piece for the next array element. It finds this piece more 
quickly than before that optimization.


--
regards,
Andrei Lepikhov
Postgres Professional
From 2e89dc8b743953068174c777d7a014e1ea71f659 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Tue, 20 Feb 2024 11:05:51 +0700
Subject: [PATCH] Utilize memory in scalararraysel in more optimal manner.

Estimating selectivity on an array operation free the memory right after
working out a single element. It reduces peak memory consumption
on massive arrays.
---
 src/backend/utils/adt/selfuncs.c | 26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index cea777e9d4..e5bad75ec1 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -281,6 +281,7 @@ eqsel_internal(PG_FUNCTION_ARGS, bool negate)
selec = var_eq_non_const(, operator, collation, other,
 varonleft, 
negate);
 
+   pfree(other);
ReleaseVariableStats(vardata);
 
return selec;
@@ -1961,15 +1962,15 @@ scalararraysel(PlannerInfo *root,
{
List   *args;
Selectivity s2;
-
-   args = list_make2(leftop,
- 
makeConst(nominal_element_type,
-   
-1,
-   
nominal_element_collation,
-   
elmlen,
-   
elem_values[i],
-   
elem_nulls[i],
-   
elmbyval));
+   Const *c = makeConst(nominal_element_type,
+   -1,
+   
nominal_element_collation,
+   elmlen,
+   elem_values[i],
+   elem_nulls[i],
+   elmbyval);
+
+   args = list_make2(leftop, c);
if (is_join_clause)
s2 = 
DatumGetFloat8(FunctionCall5Coll(,

  clause->inputcollid,
@@ -1985,7 +1986,8 @@ scalararraysel(PlannerInfo *root,

  ObjectIdGetDatum(operator),

  PointerGetDatum(args),

  Int32GetDatum(varRelid)));
-
+   list_free(args);
+   pfree(c);
if (useOr)
{
s1 = s1 + s2 - s1 * s2;
@@ -2004,6 +2006,9 @@ scalararraysel(PlannerInfo *root,
if ((useOr ? isEquality : isInequality) &&
s1disjoint >= 0.0 && s1disjoint <= 1.0)
s1 = s1disjoint;
+
+   pfree(elem_values);
+   pfree(elem_nulls);
}
else if (rightop && IsA(rightop, ArrayExpr) &&
 !((ArrayExpr *) rightop)->multidims)
@@ -2052,6 +2057,7 @@ scalararraysel(PlannerInfo *root,
  

Re: POC, WIP: OR-clause support for indexes

2024-02-19 Thread Andrei Lepikhov

On 19/2/2024 19:53, Ranier Vilela wrote:

v17-0002
1) move the vars *arrayconst and *dest, to after if, to avoid makeNode 
(palloc).

+ Const   *arrayconst;
+ ScalarArrayOpExpr  *dest;
+
+ pd = (PredicatesData *) lfirst(lc);
+ if (pd->elems == NIL)
+ /* The index doesn't participate in this operation */
+ continue;

+ arrayconst = lsecond_node(Const, saop->args);
+ dest = makeNode(ScalarArrayOpExpr);

Thanks for the review!
I'm not sure I understand you clearly. Does the patch in attachment fix 
the issue you raised?


--
regards,
Andrei Lepikhov
Postgres Professional
diff --git a/src/backend/optimizer/path/indxpath.c 
b/src/backend/optimizer/path/indxpath.c
index 56b04541db..1545876e2f 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -1284,7 +1284,7 @@ build_paths_for_SAOP(PlannerInfo *root, RelOptInfo *rel, 
RestrictInfo *rinfo,
ArrayType  *arrayval = NULL;
ArrayExpr  *arr = NULL;
Const  *arrayconst = lsecond_node(Const, 
saop->args);
-   ScalarArrayOpExpr  *dest = makeNode(ScalarArrayOpExpr);
+   ScalarArrayOpExpr   dest;
 
pd = (PredicatesData *) lfirst(lc);
if (pd->elems == NIL)
@@ -1302,10 +1302,10 @@ build_paths_for_SAOP(PlannerInfo *root, RelOptInfo 
*rel, RestrictInfo *rinfo,
arr->multidims = false;
 
/* Compose new SAOP, partially covering the source one */
-   memcpy(dest, saop, sizeof(ScalarArrayOpExpr));
-   dest->args = list_make2(linitial(saop->args), arr);
+   memcpy(, saop, sizeof(ScalarArrayOpExpr));
+   dest.args = list_make2(linitial(saop->args), arr);
 
-   clause = (Expr *) estimate_expression_value(root, (Node *) 
dest);
+   clause = (Expr *) estimate_expression_value(root, (Node *) 
);
 
/*
 * Create new RestrictInfo. It maybe more heavy than just copy 
node,


Re: Memory consumed by paths during partitionwise join planning

2024-02-19 Thread Andrei Lepikhov

On 19/2/2024 19:25, Ashutosh Bapat wrote:

On Fri, Feb 16, 2024 at 8:42 AM Andrei Lepikhov
 wrote:

Live example: right now, I am working on the code like MSSQL has - a
combination of NestLoop and HashJoin paths and switching between them in
real-time. It requires both paths in the path list at the moment when
extensions are coming. Even if one of them isn't referenced from the
upper pathlist, it may still be helpful for the extension.


There is no guarantee that every path presented to add_path will be
preserved. Suboptimal paths are freed as and when add_path discovers
that they are suboptimal. So I don't think an extension can rely on
existence of a path. But having a refcount makes it easy to preserve
the required paths by referencing them.
I don't insist, just provide my use case. It would be ideal if you would 
provide some external routines for extensions that allow for sticking 
the path in pathlist even when it has terrible cost estimation.





About partitioning. As I discovered planning issues connected to
partitions, the painful problem is a rule, according to which we are
trying to use all nomenclature of possible paths for each partition.
With indexes, it quickly increases optimization work. IMO, this can help
a 'symmetrical' approach, which could restrict the scope of possible
pathways for upcoming partitions if we filter some paths in a set of
previously planned partitions.


filter or free?

Filter.
I meant that Postres tries to apply IndexScan, BitmapScan,
IndexOnlyScan, and other strategies, passing throughout the partition
indexes. The optimizer spends a lot of time doing that. So, why not
introduce a symmetrical strategy and give away from the search some
indexes of types of scan based on the pathifying experience of previous
partitions of the same table: if you have dozens of partitions, Is it
beneficial for the system to find a bit more optimal IndexScan on one
partition having SeqScans on 999 other?


IIUC, you are suggesting that instead of planning each
partition/partitionwise join, we only create paths with the strategies
which were found to be optimal with previous partitions. That's a good
heuristic but it won't work if partition properties - statistics,
indexes etc. differ between groups of partitions.
Sure, but the "Symmetry" strategy assumes that on the scope of a 
thousand partitions, especially with parallel append involved, it 
doesn't cause sensible performance degradation if we find a bit 
suboptimal path in a small subset of partitions. Does it make sense?
As I see, when people use 10-100 partitions for the table, they usually 
strive to keep indexes symmetrical for all partitions.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2024-02-19 Thread Andrei Lepikhov

On 16/2/2024 19:54, jian he wrote:

After setting these parameters, overall enable_or_transformation ON is
performance better.
sorry for the noise.

Don't worry, at least we know a weak point of partial paths estimation.

so now I didn't find any corner case where enable_or_transformation is
ON peforms worse than when it's OFF.

+typedef struct OrClauseGroupEntry
+{
+ OrClauseGroupKey key;
+
+ Node   *node;
+ List   *consts;
+ Oid scalar_type;
+ List   *exprs;
+} OrClauseGroupEntry;

I found that the field `scalar_type` was never used.

Thanks, fixed.
In attachment - v17 for both patches. As I see it, the only general 
explanation of the idea is not addressed. I'm not sure how deeply we 
should explain it.


--
regards,
Andrei Lepikhov
Postgres Professional
From 3a3b6aa36320a06b64f2f608e3526255e53ed655 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Fri, 2 Feb 2024 22:01:09 +0300
Subject: [PATCH 1/2] Transform OR clause to ANY expressions.

Replace (expr op C1) OR (expr op C2) ... with expr op ANY(C1, C2, ...) on the
preliminary stage of optimization when we are still working with the tree
expression.
Here C is a constant expression, 'expr' is non-constant expression, 'op' is
an operator which returns boolean result and has a commuter (for the case of
reverse order of constant and non-constant parts of the expression,
like 'CX op expr').
Sometimes it can lead to not optimal plan. But we think it is better to have
array of elements instead of a lot of OR clauses. Here is a room for further
optimizations on decomposing that array into more optimal parts.
Authors: Alena Rybakina , Andrey Lepikhov 

Reviewed-by: Peter Geoghegan , Ranier Vilela 
Reviewed-by: Alexander Korotkov , Robert Haas 

Reviewed-by: jian he 
---
 .../postgres_fdw/expected/postgres_fdw.out|  16 +-
 src/backend/nodes/queryjumblefuncs.c  |  27 ++
 src/backend/parser/parse_expr.c   | 326 +-
 src/backend/utils/misc/guc_tables.c   |  11 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/bin/pg_dump/t/002_pg_dump.pl  |   6 +-
 src/include/nodes/queryjumble.h   |   1 +
 src/include/optimizer/optimizer.h |   1 +
 src/test/regress/expected/create_index.out| 156 -
 src/test/regress/expected/inherit.out |   2 +-
 src/test/regress/expected/join.out|  62 +++-
 src/test/regress/expected/partition_prune.out | 219 ++--
 src/test/regress/expected/rules.out   |   4 +-
 src/test/regress/expected/stats_ext.out   |  12 +-
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/expected/tidscan.out |  23 +-
 src/test/regress/sql/create_index.sql |  35 ++
 src/test/regress/sql/join.sql |  10 +
 src/test/regress/sql/partition_prune.sql  |  22 ++
 src/test/regress/sql/tidscan.sql  |   6 +
 src/tools/pgindent/typedefs.list  |   2 +
 21 files changed, 875 insertions(+), 70 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index c355e8f3f7..0523bbd8f7 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1349,7 +1349,7 @@ SELECT t1.c1, t1.c2, t2.c1, t2.c2 FROM ft4 t1 LEFT JOIN 
(SELECT * FROM ft5 WHERE
  Foreign Scan
Output: t1.c1, t1.c2, ft5.c1, ft5.c2
Relations: (public.ft4 t1) LEFT JOIN (public.ft5)
-   Remote SQL: SELECT r1.c1, r1.c2, r4.c1, r4.c2 FROM ("S 1"."T 3" r1 LEFT 
JOIN "S 1"."T 4" r4 ON (((r1.c1 = r4.c1)) AND ((r4.c1 < 10 WHERE (((r4.c1 < 
10) OR (r4.c1 IS NULL))) AND ((r1.c1 < 10))
+   Remote SQL: SELECT r1.c1, r1.c2, r4.c1, r4.c2 FROM ("S 1"."T 3" r1 LEFT 
JOIN "S 1"."T 4" r4 ON (((r1.c1 = r4.c1)) AND ((r4.c1 < 10 WHERE (((r4.c1 
IS NULL) OR (r4.c1 < 10))) AND ((r1.c1 < 10))
 (4 rows)
 
 SELECT t1.c1, t1.c2, t2.c1, t2.c2 FROM ft4 t1 LEFT JOIN (SELECT * FROM ft5 
WHERE c1 < 10) t2 ON (t1.c1 = t2.c1)
@@ -3105,7 +3105,7 @@ select array_agg(distinct (t1.c1)%5) from ft4 t1 full 
join ft5 t2 on (t1.c1 = t2
  Foreign Scan
Output: (array_agg(DISTINCT (t1.c1 % 5))), ((t2.c1 % 3))
Relations: Aggregate on ((public.ft4 t1) FULL JOIN (public.ft5 t2))
-   Remote SQL: SELECT array_agg(DISTINCT (r1.c1 % 5)), (r2.c1 % 3) FROM ("S 
1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1 WHERE (((r1.c1 < 
20) OR ((r1.c1 IS NULL) AND (r2.c1 < 5 GROUP BY 2 ORDER BY 
array_agg(DISTINCT (r1.c1 % 5)) ASC NULLS LAST
+   Remote SQL: SELECT array_agg(DISTINCT (r1.c1 % 5)), (r2.c1 % 3) FROM ("S 
1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1 WHERE r1.c1 IS 
NULL) AND (r2.c1 < 5)) OR (r1.c1 < 20))) GROUP BY 2 ORDER BY array_agg(DISTINCT 
(r1.c1 % 5)) ASC NULLS LAST
 (4 ro

Re: Removing unneeded self joins

2024-02-18 Thread Andrei Lepikhov

On 18/2/2024 23:18, Alexander Korotkov wrote:

On Sun, Feb 18, 2024 at 5:04 PM Alexander Korotkov  wrote:

On Sun, Feb 18, 2024 at 3:00 PM Alexander Lakhin  wrote:

09.01.2024 01:09, Alexander Korotkov wrote:

Fixed in 30b4955a46.


Please look at the following query which fails with an error since
d3d55ce57:

create table t (i int primary key);

select t3.i from t t1
  join t t2 on t1.i = t2.i,
  lateral (select t1.i limit 1) t3;

ERROR:  non-LATERAL parameter required by subquery


Thank you for spotting.  I'm looking at this.


Attached is a draft patch fixing this query.  Could you, please, recheck?
I reviewed this patch. Why do you check only the target list? I guess 
these links can be everywhere. See the patch in the attachment with the 
elaborated test and slightly changed code.


--
regards,
Andrei Lepikhov
Postgres Professional
From 7f94a3c96fd410522b87e570240cdb96b300dd31 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Mon, 19 Feb 2024 12:17:55 +0700
Subject: [PATCH] Replace relids in lateral subquery target list during SJE

---
 src/backend/optimizer/plan/analyzejoins.c | 29 ++-
 src/test/regress/expected/join.out| 44 +++
 src/test/regress/sql/join.sql | 12 +++
 3 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/src/backend/optimizer/plan/analyzejoins.c 
b/src/backend/optimizer/plan/analyzejoins.c
index e494acd51a..072298f66c 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -395,7 +395,34 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
}
 
/* Update lateral references. */
-   replace_varno((Node *) otherrel->lateral_vars, relid, subst);
+   if (root->hasLateralRTEs)
+   {
+   RangeTblEntry *rte = root->simple_rte_array[rti];
+   ReplaceVarnoContext ctx = {.from = relid,.to = subst};
+
+   if (rte->lateral)
+   {
+   replace_varno((Node *) otherrel->lateral_vars, 
relid, subst);
+
+   /*
+* Although we pass root->parse through cleanup 
procedure,
+* but parse->rtable and rte contains refs to 
different copies
+* of the subquery.
+*/
+   if (otherrel->rtekind == RTE_SUBQUERY)
+   query_tree_walker(rte->subquery, 
replace_varno_walker, ,
+ 
QTW_EXAMINE_SORTGROUP);
+#ifdef USE_ASSERT_CHECKING
+   /* Just check possibly hidden non-replaced 
relids */
+   Assert(!bms_is_member(relid, pull_varnos(root, 
(Node *) rte->tablesample)));
+   Assert(!bms_is_member(relid, pull_varnos(root, 
(Node *) rte->functions)));
+   Assert(!bms_is_member(relid, pull_varnos(root, 
(Node *) rte->tablefunc)));
+   Assert(!bms_is_member(relid, pull_varnos(root, 
(Node *) rte->values_lists)));
+#endif
+   }
+   }
+
+
}
 
/*
diff --git a/src/test/regress/expected/join.out 
b/src/test/regress/expected/join.out
index 0c2cba8921..d560a4a6b9 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6349,6 +6349,50 @@ on true;
->  Seq Scan on int8_tbl y
 (7 rows)
 
+-- Test processing target lists in lateral subqueries
+explain (verbose, costs off)
+SELECT t3.a FROM sj t1, sj t2,
+LATERAL (SELECT t1.a WHERE t1.a <> 1
+GROUP BY (t1.a) HAVING t1.a > 0 ORDER BY t1.a LIMIT 1) t3,
+LATERAL (SELECT t1.a,t3.a WHERE t1.a <> t3.a+t2.a
+GROUP BY (t3.a) HAVING t1.a > t3.a*t3.a+t2.a/t1.a LIMIT 2) t4,
+LATERAL (SELECT * FROM sj TABLESAMPLE bernoulli(t1.a/t2.a)
+REPEATABLE (t1.a+t2.a)) t5,
+LATERAL generate_series(1, t1.a + t2.a) AS t6
+WHERE t1.a = t2.a;
+  QUERY PLAN   

+---
+ Nested Loop
+   Output: (t2.a)
+   ->  Nested Loop
+ Output: t2.a, (t2.a)
+ ->  Nested Loop
+   Output: t2.a, (t2.a)
+   ->  Nested Loop
+ Output: t2.a, (t2.a)
+ ->  Seq Scan on public.sj t2
+   Output: t2.a, t2.b, t2.c
+   Filter: (t2.a IS NOT NULL)
+ ->  Limit
+   Output: (t2.

Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

2024-02-18 Thread Andrei Lepikhov

On 19/2/2024 06:05, Tomas Vondra wrote:

However, this is a somewhat extreme example - it's joining 5 tables,
each with 1000 partitions, using a partition-wise join. It reduces the
amount of memory, but the planning time is still quite high (and
essentially the same as without the patch). So it's not like it'd make
them significantly more practical ... do we have any other ideas/plans
how to improve that?
The planner principle of cleaning up all allocated structures after the 
optimization stage simplifies development and code. But, if we want to 
achieve horizontal scalability on many partitions, we should introduce 
per-partition memory context and reset it in between. GEQO already has a 
short-lived memory context, making designing extensions a bit more 
challenging but nothing too painful.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2024-02-15 Thread Andrei Lepikhov

On 16/2/2024 07:00, jian he wrote:

On Wed, Feb 14, 2024 at 11:21 AM Andrei Lepikhov
 wrote:
My OS: Ubuntu 22.04.3 LTS
I already set the max_parallel_workers_per_gather to 10.
So for all cases, it should use parallelism first?

a better question would be:
how to make the number of OR less than 29 still faster when
enable_or_transformation is ON by only set parameters?
In my test environment this example gives some subtle supremacy to ORs 
over ANY with only 3 ors and less.
Please, provide next EXPLAIN ANALYZE results for the case you want to 
discuss here:

1. with enable_or_transformation enabled
2. with enable_or_transformation disabled
3. with enable_or_transformation disabled but with manual transformation 
OR -> ANY done, to check the overhead of this optimization.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: Memory consumed by paths during partitionwise join planning

2024-02-15 Thread Andrei Lepikhov

On 15/2/2024 19:06, Ashutosh Bapat wrote:

On Thu, Feb 15, 2024 at 9:41 AM Andrei Lepikhov

But I'm not sure about freeing unreferenced paths. I would have to see
alternatives in the pathlist.


I didn't understand this. Can you please elaborate? A path in any
pathlist is referenced. An unreferenced path should not be in any
pathlist.
I mean that at some point, an extension can reconsider the path tree 
after building the top node of this path. I vaguely recall that we 
already have (or had) kind of such optimization in the core where part 
of the plan changes after it has been built.
Live example: right now, I am working on the code like MSSQL has - a 
combination of NestLoop and HashJoin paths and switching between them in 
real-time. It requires both paths in the path list at the moment when 
extensions are coming. Even if one of them isn't referenced from the 
upper pathlist, it may still be helpful for the extension.



About partitioning. As I discovered planning issues connected to
partitions, the painful problem is a rule, according to which we are
trying to use all nomenclature of possible paths for each partition.
With indexes, it quickly increases optimization work. IMO, this can help
a 'symmetrical' approach, which could restrict the scope of possible
pathways for upcoming partitions if we filter some paths in a set of
previously planned partitions.


filter or free?

Filter.
I meant that Postres tries to apply IndexScan, BitmapScan, 
IndexOnlyScan, and other strategies, passing throughout the partition 
indexes. The optimizer spends a lot of time doing that. So, why not 
introduce a symmetrical strategy and give away from the search some 
indexes of types of scan based on the pathifying experience of previous 
partitions of the same table: if you have dozens of partitions, Is it 
beneficial for the system to find a bit more optimal IndexScan on one 
partition having SeqScans on 999 other?


--
regards,
Andrei Lepikhov
Postgres Professional





Re: planner chooses incremental but not the best one

2024-02-15 Thread Andrei Lepikhov

On 15/2/2024 18:10, Tomas Vondra wrote:



On 2/15/24 07:50, Andrei Lepikhov wrote:

On 18/12/2023 19:53, Tomas Vondra wrote:

On 12/18/23 11:40, Richard Guo wrote:
The challenge is where to get usable information about correlation
between columns. I only have a couple very rought ideas of what might
try. For example, if we have multi-column ndistinct statistics, we might
look at ndistinct(b,c) and ndistinct(b,c,d) and deduce something from

  ndistinct(b,c,d) / ndistinct(b,c)

If we know how many distinct values we have for the predicate column, we
could then estimate the number of groups. I mean, we know that for the
restriction "WHERE b = 3" we only have 1 distinct value, so we could
estimate the number of groups as

  1 * ndistinct(b,c)

Did you mean here ndistinct(c,d) and the formula:
ndistinct(b,c,d) / ndistinct(c,d) ?


Yes, I think that's probably a more correct ... Essentially, the idea is
to estimate the change in number of distinct groups after adding a
column (or restricting it in some way).
Thanks, I got it. I just think how to implement such techniques with 
extensions just to test the idea in action. In the case of GROUP-BY we 
can use path hook, of course. But what if to invent a hook on clauselist 
estimation?

Do you implicitly bear in mind here the necessity of tracking clauses
that were applied to the data up to the moment of grouping?



I don't recall what exactly I considered two months ago when writing the
message, but I don't see why we would need to track that beyond what we
already have. Shouldn't it be enough for the grouping to simply inspect
the conditions on the lower levels?
Yes, exactly. I've thought about looking into baserestrictinfos and, if 
group-by references a subquery targetlist, into subqueries too.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: planner chooses incremental but not the best one

2024-02-14 Thread Andrei Lepikhov

On 15/12/2023 15:58, Richard Guo wrote:

With the patch the estimate for the number of distinct 'b' values is
more accurate.

+1 to commit this patch.
It looks good and resolves kind of a bug in the code.


BTW, this patch does not change any existing regression test results.  I
attempted to devise a regression test that shows how this change can
improve query plans, but failed.  Should I try harder to find such a
test case?
The test that was changed refers to different features. Its behaviour 
can be changed in. the future, and mask testing of this code. IMO, you 
should add a test directly checking appendrel->tuples correction.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: planner chooses incremental but not the best one

2024-02-14 Thread Andrei Lepikhov

On 18/12/2023 19:53, Tomas Vondra wrote:

On 12/18/23 11:40, Richard Guo wrote:
The challenge is where to get usable information about correlation
between columns. I only have a couple very rought ideas of what might
try. For example, if we have multi-column ndistinct statistics, we might
look at ndistinct(b,c) and ndistinct(b,c,d) and deduce something from

 ndistinct(b,c,d) / ndistinct(b,c)

If we know how many distinct values we have for the predicate column, we
could then estimate the number of groups. I mean, we know that for the
restriction "WHERE b = 3" we only have 1 distinct value, so we could
estimate the number of groups as

 1 * ndistinct(b,c)

Did you mean here ndistinct(c,d) and the formula:
ndistinct(b,c,d) / ndistinct(c,d) ?

Do you implicitly bear in mind here the necessity of tracking clauses 
that were applied to the data up to the moment of grouping?


--
regards,
Andrei Lepikhov
Postgres Professional





Re: Memory consumed by paths during partitionwise join planning

2024-02-14 Thread Andrei Lepikhov

On 6/2/2024 19:51, Ashutosh Bapat wrote:

On Fri, Dec 15, 2023 at 5:22 AM Ashutosh Bapat
The patches are raw. make check has some crashes that I need to fix. I
am waiting to hear whether this is useful and whether the design is on
the right track.

Let me write words of opinion on that feature.
I generally like the idea of a refcount field. We had problems 
generating alternative paths in extensions and designing the Asymmetric 
Join feature [1] when we proposed an alternative path one level below 
and called the add_path() routine. We had lost the path in the path list 
referenced from the upper RelOptInfo. This approach allows us to make 
more handy solutions instead of hacking with a copy/restore pathlist.
But I'm not sure about freeing unreferenced paths. I would have to see 
alternatives in the pathlist.
About partitioning. As I discovered planning issues connected to 
partitions, the painful problem is a rule, according to which we are 
trying to use all nomenclature of possible paths for each partition. 
With indexes, it quickly increases optimization work. IMO, this can help 
a 'symmetrical' approach, which could restrict the scope of possible 
pathways for upcoming partitions if we filter some paths in a set of 
previously planned partitions.
Also, I am glad to see a positive opinion about the path_walker() 
routine. Somewhere else, for example, in [2], it seems we need it too.


[1] 
https://www.postgresql.org/message-id/flat/CAOP8fzaVL_2SCJayLL9kj5pCA46PJOXXjuei6-3aFUV45j4LJQ%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/flat/CAMbWs496%2BN%3DUAjOc%3DrcD3P7B6oJe4rZw08e_TZRUsWbPxZW3Tw%40mail.gmail.com

--
regards,
Andrei Lepikhov
Postgres Professional





Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

2024-02-13 Thread Andrei Lepikhov

On 14/2/2024 13:32, Ashutosh Bapat wrote:

On Wed, Feb 14, 2024 at 9:50 AM Andrei Lepikhov
 wrote:


On 30/1/2024 12:44, Ashutosh Bapat wrote:

Thanks Vignesh. PFA patches rebased on the latest HEAD. The patch
addressing Amit's comments is still a separate patch for him to
review.

Thanks for this improvement. Working with partitions, I frequently see
peaks of memory consumption during planning. So, maybe one more case can
be resolved here.
Patch 0001 looks good. I'm not sure about free_child_sjinfo_members. Do
we really need it as a separate routine? It might be better to inline
this code.


try_partitionwise_join() is already 200 lines long. A separate
function is better than adding more lines to try_partitionwise_join().
Also if someone wants to call build_child_join_sjinfo() outside
try_partitionwise_join() may find free_child_sjinfo_members() handy.

Make sense, thanks.

Patch 0002 adds valuable comments, and I'm OK with that.

Also, as I remember, some extensions, such as pg_hint_plan, call
build_child_join_sjinfo. It is OK to break the interface with a major
version. But what if they need child_sjinfo a bit longer and collect
links to this structure? I don't think it is a real stopper, but it is
worth additional analysis.


If these extensions call build_child_join_sjinfo() and do not call
free_child_sjinfo_members, they can keep their child sjinfo as long as
they want. I didn't understand your concern.
Nothing special. This patch makes external code responsible for 
allocation of the structure and it adds more paths to do something 
wrong. Current code looks good.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: Propagate pathkeys from CTEs up to the outer query

2024-02-13 Thread Andrei Lepikhov

On 29/1/2024 10:18, Richard Guo wrote:

In [1] we've reached a conclusion that for a MATERIALIZED CTE it's okay
to 'allow our statistics or guesses for the sub-query to subsequently
influence what the upper planner does'.  Commit f7816aec23 exposes
column statistics to the upper planner.  In the light of that, here is a
patch that exposes info about the ordering of the CTE result to the
upper planner.

This patch was initially posted in that same thread and has received
some comments from Tom in [2].  Due to the presence of multiple patches
in that thread, it has led to confusion.  So fork a new thread here
specifically dedicated to discussing the patch about exposing pathkeys
from CTEs to the upper planner.
I like this approach. It looks good initially, but such features need 
more opinions/views/time to analyse corner cases.
It goes alongside my current backburner - pull parameterisation through 
the GROUP-BY and the query block fence up to the JOIN searching code of 
the parent query.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

2024-02-13 Thread Andrei Lepikhov

On 30/1/2024 12:44, Ashutosh Bapat wrote:

Thanks Vignesh. PFA patches rebased on the latest HEAD. The patch
addressing Amit's comments is still a separate patch for him to
review.
Thanks for this improvement. Working with partitions, I frequently see 
peaks of memory consumption during planning. So, maybe one more case can 
be resolved here.
Patch 0001 looks good. I'm not sure about free_child_sjinfo_members. Do 
we really need it as a separate routine? It might be better to inline 
this code.

Patch 0002 adds valuable comments, and I'm OK with that.

Also, as I remember, some extensions, such as pg_hint_plan, call 
build_child_join_sjinfo. It is OK to break the interface with a major 
version. But what if they need child_sjinfo a bit longer and collect 
links to this structure? I don't think it is a real stopper, but it is 
worth additional analysis.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2024-02-13 Thread Andrei Lepikhov

On 13/2/2024 17:03, Andrei Lepikhov wrote:

On 13/2/2024 07:00, jian he wrote:

The time is the last result of the 10 iterations.
I'm not sure about the origins of such behavior, but it seems to be an 
issue of parallel workers, not this specific optimization.
Having written that, I'd got a backburner. And to close that issue, I 
explored get_restriction_qual_cost(). A close look shows us that "x IN 
(..)" cheaper than its equivalent "x=N1 OR ...". Just numbers:


ANY: startup_cost = 0.0225; total_cost = 0.005
OR: startup_cost==0; total_cost = 0.0225

Expression total_cost is calculated per tuple. In your example, we have 
many tuples, so the low cost of expression per tuple dominates over the 
significant startup cost.


According to the above, SAOP adds 6250 to the cost of SeqScan; OR - 
13541. So, the total cost of the query with SAOP is less than with OR, 
and the optimizer doesn't choose heavy parallel workers. And it is the 
answer.


So, this example is more about the subtle balance between 
parallel/sequential execution, which can vary from one platform to another.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2024-02-13 Thread Andrei Lepikhov

On 12/2/2024 17:51, Alena Rybakina wrote:

On 12.02.2024 12:01, Andrei Lepikhov wrote:

On 12/2/2024 15:55, Alena Rybakina wrote:
As I understand it, match_clauses_to_index is necessary if you have a 
RestrictInfo (rinfo1) variable, so maybe we should run it after the 
make_restrictonfo procedure, otherwise calling it, I think, is useless.
I think you must explain your note in more detail. Before this call, 
we already called make_restrictinfo() and built rinfo1, haven't we?


I got it, I think, I was confused by the “else“ block when we can 
process the index, otherwise we move on to the next element.


I think maybe “else“ block of creating restrictinfo with the indexpaths 
list creation should be moved to a separate function or just remove "else"?
IMO, it is a matter of taste. But if you are really confused, maybe it 
will make understanding for someone else simpler. So, changed.
I think we need to check that rinfo->clause is not empty, because if it 
is we can miss calling build_paths_for_OR function. We should add it there:


restriction_is_saop_clause(RestrictInfo *restrictinfo)
{
     if (IsA(restrictinfo->clause, ScalarArrayOpExpr))
I wonder if we should add here assertion, not NULL check. In what case 
we could get NULL clause here? But added for surety.


By the way, I think we need to add a check that the clauseset is not 
empty (if (!clauseset.nonempty)) otherwise we could get an error. The 
same check I noticed in build_paths_for_OR function.

I don't. Feel free to provide counterexample.

--
regards,
Andrei Lepikhov
Postgres Professional
From 2b088a1bd662c614ee491a797d2fcb67e2fb2391 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 24 Jan 2024 14:07:17 +0700
Subject: [PATCH 2/2] Teach generate_bitmap_or_paths to build BitmapOr paths
 over SAOP clauses.

Likewise OR clauses, discover SAOP array and try to split its elements
between smaller sized arrays to fit a set of partial indexes.
---
 src/backend/optimizer/path/indxpath.c | 301 ++
 src/backend/optimizer/util/predtest.c |  46 
 src/backend/optimizer/util/restrictinfo.c |  13 +
 src/include/optimizer/optimizer.h |  16 ++
 src/include/optimizer/restrictinfo.h  |   1 +
 src/test/regress/expected/select.out  | 282 
 src/test/regress/sql/select.sql   |  82 ++
 src/tools/pgindent/typedefs.list  |   1 +
 8 files changed, 689 insertions(+), 53 deletions(-)

diff --git a/src/backend/optimizer/path/indxpath.c 
b/src/backend/optimizer/path/indxpath.c
index 32c6a8bbdc..56b04541db 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -32,6 +32,7 @@
 #include "optimizer/paths.h"
 #include "optimizer/prep.h"
 #include "optimizer/restrictinfo.h"
+#include "utils/array.h"
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
 
@@ -1220,11 +1221,174 @@ build_paths_for_OR(PlannerInfo *root, RelOptInfo *rel,
return result;
 }
 
+/*
+ * Building index paths over SAOP clause differs from the logic of OR clauses.
+ * Here we iterate across all the array elements and split them to SAOPs,
+ * corresponding to different indexes. We must match each element to an index.
+ */
+static List *
+build_paths_for_SAOP(PlannerInfo *root, RelOptInfo *rel, RestrictInfo *rinfo,
+List *other_clauses)
+{
+   List   *result = NIL;
+   List   *predicate_lists = NIL;
+   ListCell   *lc;
+   PredicatesData *pd;
+   ScalarArrayOpExpr  *saop = (ScalarArrayOpExpr *) rinfo->clause;
+
+   Assert(IsA(saop, ScalarArrayOpExpr) && saop->useOr);
+
+   if (!IsA(lsecond(saop->args), Const))
+   /*
+* Has it practical outcome to merge arrays which couldn't 
constantified
+* before that step?
+*/
+   return NIL;
+
+   /* Collect predicates */
+   foreach(lc, rel->indexlist)
+   {
+   IndexOptInfo *index = (IndexOptInfo *) lfirst(lc);
+
+   /* Take into consideration partial indexes supporting bitmap 
scans */
+   if (!index->amhasgetbitmap || index->indpred == NIL || 
index->predOK)
+   continue;
+
+   pd = palloc0(sizeof(PredicatesData));
+   pd->id = foreach_current_index(lc);
+   /* The trick with reducing recursion is stolen from 
predicate_implied_by */
+   pd->predicate = list_length(index->indpred) == 1 ?
+   
(Node *) linitial(index->indpred) :
+   
(Node *) index->indpred;
+   predicate_lists = lappend(predicate_lis

Re: POC, WIP: OR-clause support for indexes

2024-02-13 Thread Andrei Lepikhov

On 13/2/2024 07:00, jian he wrote:

+ newa = makeNode(ArrayExpr);
+ /* array_collid will be set by parse_collate.c */
+ newa->element_typeid = scalar_type;
+ newa->array_typeid = array_type;
+ newa->multidims = false;
+ newa->elements = aexprs;
+ newa->location = -1;

I am confused by the comments `array_collid will be set by
parse_collate.c`, can you further explain it?
I wonder if the second paragraph of comments on commit b310b6e will be 
enough to dive into details.



if OR expression right arm is not plain Const, but with collation
specification, eg.
`where a  = 'a' collate "C" or a = 'b' collate "C";`

then the rightop is not Const, it will be CollateExpr, it will not be
used in transformation.
Yes, it is done for simplicity right now. I'm not sure about corner 
cases of merging such expressions.




set enable_or_transformation to on;
explain(timing off, analyze, costs off)
select count(*) from test where (x = 1 or x = 2 or x = 3 or x = 4 or x
= 5 or x = 6 or x = 7 or x = 8 or x = 9 ) \watch i=0.1 c=10
35.376 ms

The time is the last result of the 10 iterations.

The reason here - parallel workers.
If you see into the plan you will find parallel workers without 
optimization and absence of them in the case of optimization:


Gather  (cost=1000.00..28685.37 rows=87037 width=12)
(actual rows=90363 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on test
 Filter: ((x = 1) OR (x = 2) OR (x = 3) OR (x = 4) OR (x = 5) 
OR (x = 6) OR (x = 7) OR (x = 8) OR (x = 9))


Seq Scan on test  (cost=0.02..20440.02 rows=90600 width=12)
  (actual rows=90363 loops=1)
   Filter: (x = ANY ('{1,2,3,4,5,6,7,8,9}'::integer[]))

Having 90600 tuples returned we estimate it into 87000 (less precisely) 
without transformation and 90363 (more precisely) with the transformation.
But if you play with parallel_tuple_cost and parallel_setup_cost, you 
will end up having these parallel workers:


 Gather  (cost=0.12..11691.03 rows=90600 width=12)
 (actual rows=90363 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on test
 Filter: (x = ANY ('{1,2,3,4,5,6,7,8,9}'::integer[]))
 Rows Removed by Filter: 303212

And some profit about 25%, on my laptop.
I'm not sure about the origins of such behavior, but it seems to be an 
issue of parallel workers, not this specific optimization.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2024-02-12 Thread Andrei Lepikhov

On 12/2/2024 15:55, Alena Rybakina wrote:

Hi!

I can't unnderstand this part of code:

/* Time to generate index paths */

MemSet(, 0, sizeof(clauseset));
match_clauses_to_index(root, list_make1(rinfo1), index, );

As I understand it, match_clauses_to_index is necessary if you have a 
RestrictInfo (rinfo1) variable, so maybe we should run it after the 
make_restrictonfo procedure, otherwise calling it, I think, is useless.
I think you must explain your note in more detail. Before this call, we 
already called make_restrictinfo() and built rinfo1, haven't we?


--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2024-02-11 Thread Andrei Lepikhov

Thanks for the review!
It was the first version for discussion. Of course, refactoring and 
polishing cycles will be needed, even so we can discuss the general idea 
earlier.


On 10/2/2024 12:00, jian he wrote:

On Thu, Feb 8, 2024 at 1:34 PM Andrei Lepikhov
  1235 | PredicatesData *pd;

Thanks


+ if (!predicate_implied_by(index->indpred, list_make1(rinfo1), true))
+ elog(ERROR, "Logical mistake in OR <-> ANY transformation code");
the error message seems not clear?

Yeah, have rewritten


static List *
build_paths_for_SAOP(PlannerInfo *root, RelOptInfo *rel, RestrictInfo *rinfo,
List *other_clauses)
I am not sure what's `other_clauses`, and `rinfo` refers to? adding
some comments would be great.

struct PredicatesData needs some comments, I think.

Added, not so much though


+bool
+saop_covered_by_predicates(ScalarArrayOpExpr *saop, List *predicate_lists)
+{
+ ListCell   *lc;
+ PredIterInfoData clause_info;
+ bool result = false;
+ bool isConstArray;
+
+ Assert(IsA(saop, ScalarArrayOpExpr));
is this Assert necessary?

Not sure. Moved it into another routine.


For the function build_paths_for_SAOP, I think I understand the first
part of the code.
But I am not 100% sure of the second part of the `foreach(lc,
predicate_lists)` code.
more comments in `foreach(lc, predicate_lists)` would be helpful.

Done


do you need to add `PredicatesData` to src/tools/pgindent/typedefs.list?

Done


I also did some minor refactoring of generate_saop_pathlist.

Partially agree


instead of let it go to `foreach (lc, entries)`,
we can reject the Const array at `foreach(lc, expr->args)`
Yeah, I just think we can go further and transform two const arrays into 
a new one if we have the same clause and operator. In that case, we 
should allow it to pass through this cycle down to the classification stage.


also `foreach(lc, expr->args)` do we need to reject cases like
`contain_subplans((Node *) nconst_expr)`?
maybe let the nconst_expr be a Var node would be far more easier.
It's contradictory. On the one hand, we simplify the comparison 
procedure and can avoid expr jumbling at all. On the other hand - we 
restrict the feature. IMO, it would be better to unite such clauses

complex_clause1 IN (..) OR complex_clause1 IN (..)
into
complex_clause1 IN (.., ..)
and don't do duplicated work computing complex clauses.
In the attachment - the next version of the second patch.

--
regards,
Andrei Lepikhov
Postgres Professional
From c1e5fc35778acd01cca67e63fdf80c5605af03f2 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 24 Jan 2024 14:07:17 +0700
Subject: [PATCH 2/2] Teach generate_bitmap_or_paths to build BitmapOr paths
 over SAOP clauses.

Likewise OR clauses, discover SAOP array and try to split its elements
between smaller sized arrays to fit a set of partial indexes.
---
 src/backend/optimizer/path/indxpath.c | 304 ++
 src/backend/optimizer/util/predtest.c |  46 
 src/backend/optimizer/util/restrictinfo.c |  13 +
 src/include/optimizer/optimizer.h |  16 ++
 src/include/optimizer/restrictinfo.h  |   1 +
 src/test/regress/expected/select.out  | 282 
 src/test/regress/sql/select.sql   |  82 ++
 src/tools/pgindent/typedefs.list  |   1 +
 8 files changed, 692 insertions(+), 53 deletions(-)

diff --git a/src/backend/optimizer/path/indxpath.c 
b/src/backend/optimizer/path/indxpath.c
index 32c6a8bbdc..5383cb76dc 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -32,6 +32,7 @@
 #include "optimizer/paths.h"
 #include "optimizer/prep.h"
 #include "optimizer/restrictinfo.h"
+#include "utils/array.h"
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
 
@@ -1220,11 +1221,177 @@ build_paths_for_OR(PlannerInfo *root, RelOptInfo *rel,
return result;
 }
 
+/*
+ * Building index paths over SAOP clause differs from the logic of OR clauses.
+ * Here we iterate across all the array elements and split them to SAOPs,
+ * corresponding to different indexes. We must match each element to an index.
+ */
+static List *
+build_paths_for_SAOP(PlannerInfo *root, RelOptInfo *rel, RestrictInfo *rinfo,
+List *other_clauses)
+{
+   List   *result = NIL;
+   List   *predicate_lists = NIL;
+   ListCell   *lc;
+   PredicatesData *pd;
+   ScalarArrayOpExpr  *saop = (ScalarArrayOpExpr *) rinfo->clause;
+
+   Assert(IsA(saop, ScalarArrayOpExpr) && saop->useOr);
+
+   if (!IsA(lsecond(saop->args), Const))
+   /*
+* Has it practical outcome to merge arrays which couldn't 
constantified
+* before that step?
+*/
+   return NIL;
+
+   /* Collect predicates */
+   foreach(lc, rel-&g

Re: pg_stat_advisor extension

2024-02-07 Thread Andrei Lepikhov

On 6/2/2024 22:27, Ilia Evdokimov wrote:


I welcome your insights, feedback, and evaluations regarding the 
necessity of integrating this new extension into PostgreSQL.
Besides other issues that were immediately raised during the discovery 
of the extension, Let me emphasize two issues:
1. In the case of parallel workers the plan_rows value has a different 
semantics than the number of rows predicted. Just explore 
get_parallel_divisor().
2. The extension recommends new statistics immediately upon an error 
finding. But what if the reason for the error is stale statistics? Or 
this error may be raised for only one specific set of constants, and 
estimation will be done well in another 99.% of cases for the same 
expression.


According to No.2, it might make sense to collect and track clause 
combinations and cardinality errors found and let the DBA make decisions 
on their own.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2024-02-07 Thread Andrei Lepikhov

On 3/2/2024 02:06, Alena Rybakina wrote:

On 01.02.2024 08:00, jian he wrote:
I added your code to the patch.

Thanks Alena and Jian for the detailed scrutiny!

A couple of questions:
1. As I see, transformAExprIn uses the same logic as we invented but 
allows composite and domain types. Could you add a comment explaining 
why we forbid row types in general, in contrast to the transformAExprIn 
routine?
2. Could you provide the tests to check issues covered by the recent (in 
v.15) changes?


Patch 0001-* in the attachment incorporates changes induced by Jian's 
notes from [1].
Patch 0002-* contains a transformation of the SAOP clause, which allows 
the optimizer to utilize partial indexes if they cover all values in 
this array. Also, it is an answer to Alexander's note [2] on performance 
degradation. This first version may be a bit raw, but I need your 
opinion: Does it resolve the issue?


Skimming through the thread, I see that, in general, all issues have 
been covered for now. Only Robert's note on a lack of documentation is 
still needs to be resolved.


[1] 
https://www.postgresql.org/message-id/CACJufxGXhJ823cdAdp2Ho7qC-HZ3_-dtdj-myaAi_u9RQLn45g%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAPpHfduJtO0s9E%3DSHUTzrCD88BH0eik0UNog1_q3XBF2wLmH6g%40mail.gmail.com


--
regards,
Andrei Lepikhov
Postgres Professional
From 0ac511ab94959e87d1525d5ea8c4855643a6ccbe Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Fri, 2 Feb 2024 22:01:09 +0300
Subject: [PATCH 1/2] Transform OR clause to ANY expressions.

Replace (expr op C1) OR (expr op C2) ... with expr op ANY(C1, C2, ...) on the
preliminary stage of optimization when we are still working with the tree
expression.
Here C is a constant expression, 'expr' is non-constant expression, 'op' is
an operator which returns boolean result and has a commuter (for the case of
reverse order of constant and non-constant parts of the expression,
like 'CX op expr').
Sometimes it can lead to not optimal plan. But we think it is better to have
array of elements instead of a lot of OR clauses. Here is a room for further
optimizations on decomposing that array into more optimal parts.
Authors: Alena Rybakina , Andrey Lepikhov 

Reviewed-by: Peter Geoghegan , Ranier Vilela 
Reviewed-by: Alexander Korotkov , Robert Haas 

Reviewed-by: jian he 
---
 .../postgres_fdw/expected/postgres_fdw.out|  16 +-
 src/backend/nodes/queryjumblefuncs.c  |  27 ++
 src/backend/parser/parse_expr.c   | 327 +-
 src/backend/utils/misc/guc_tables.c   |  11 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/bin/pg_dump/t/002_pg_dump.pl  |   6 +-
 src/include/nodes/queryjumble.h   |   1 +
 src/include/optimizer/optimizer.h |   1 +
 src/test/regress/expected/create_index.out| 156 -
 src/test/regress/expected/inherit.out |   2 +-
 src/test/regress/expected/join.out|  62 +++-
 src/test/regress/expected/partition_prune.out | 219 ++--
 src/test/regress/expected/rules.out   |   4 +-
 src/test/regress/expected/stats_ext.out   |  12 +-
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/expected/tidscan.out |  23 +-
 src/test/regress/sql/create_index.sql |  35 ++
 src/test/regress/sql/join.sql |  10 +
 src/test/regress/sql/partition_prune.sql  |  22 ++
 src/test/regress/sql/tidscan.sql  |   6 +
 src/tools/pgindent/typedefs.list  |   2 +
 21 files changed, 876 insertions(+), 70 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index b5a38aeb21..a07aefc9e5 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1349,7 +1349,7 @@ SELECT t1.c1, t1.c2, t2.c1, t2.c2 FROM ft4 t1 LEFT JOIN 
(SELECT * FROM ft5 WHERE
  Foreign Scan
Output: t1.c1, t1.c2, ft5.c1, ft5.c2
Relations: (public.ft4 t1) LEFT JOIN (public.ft5)
-   Remote SQL: SELECT r1.c1, r1.c2, r4.c1, r4.c2 FROM ("S 1"."T 3" r1 LEFT 
JOIN "S 1"."T 4" r4 ON (((r1.c1 = r4.c1)) AND ((r4.c1 < 10 WHERE (((r4.c1 < 
10) OR (r4.c1 IS NULL))) AND ((r1.c1 < 10))
+   Remote SQL: SELECT r1.c1, r1.c2, r4.c1, r4.c2 FROM ("S 1"."T 3" r1 LEFT 
JOIN "S 1"."T 4" r4 ON (((r1.c1 = r4.c1)) AND ((r4.c1 < 10 WHERE (((r4.c1 
IS NULL) OR (r4.c1 < 10))) AND ((r1.c1 < 10))
 (4 rows)
 
 SELECT t1.c1, t1.c2, t2.c1, t2.c2 FROM ft4 t1 LEFT JOIN (SELECT * FROM ft5 
WHERE c1 < 10) t2 ON (t1.c1 = t2.c1)
@@ -3105,7 +3105,7 @@ select array_agg(distinct (t1.c1)%5) from ft4 t1 full 
join ft5 t2 on (t1.c1 = t2
  Foreign Scan
Output: (array_agg(DISTINCT (t1.c1 % 5))), ((t2.c1 % 3))
Relations: Aggregate on ((public.ft4 t1) FULL JOIN (public.ft5 t2))
-   Remote SQL: SELECT array_agg(DISTINCT (r1.c1 % 5

Re: POC: GROUP BY optimization

2024-02-01 Thread Andrei Lepikhov

On 2/2/2024 11:06, Richard Guo wrote:


On Fri, Feb 2, 2024 at 11:32 AM Richard Guo <mailto:guofengli...@gmail.com>> wrote:


On Fri, Feb 2, 2024 at 10:02 AM Tom Lane mailto:t...@sss.pgh.pa.us>> wrote:

One of the test cases added by this commit has not been very
stable in the buildfarm.  Latest example is here:


https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion=2024-02-01%2021%3A28%3A04
 
<https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion=2024-02-01%2021%3A28%3A04>

and I've seen similar failures intermittently on other machines.

I'd suggest building this test atop a table that is more stable
than pg_class.  You're just waving a red flag in front of a bull
if you expect stable statistics from that during a regression run.
Nor do I see any particular reason for pg_class to be especially
suited to the test.


Yeah, it's not a good practice to use pg_class in this place.  While
looking through the test cases added by this commit, I noticed some
other minor issues that are not great.  Such as

* The table 'btg' is inserted with 1 tuples, which seems a bit
expensive for a test.  I don't think we need such a big table to test
what we want.

* I don't see why we need to manipulate GUC max_parallel_workers and
max_parallel_workers_per_gather.

* I think we'd better write the tests with the keywords being all upper
or all lower.  A mixed use of upper and lower is not great. Such as in

     explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;

* Some comments for the test queries are not easy to read.

* For this statement

     CREATE INDEX idx_y_x_z ON btg(y,x,w);

I think the index name would cause confusion.  It creates an index on
columns y, x and w, but the name indicates an index on y, x and z.

I'd like to write a draft patch for the fixes.


Here is the draft patch that fixes the issues I complained about in
upthread.
I passed through the patch. Looks like it doesn't break anything. Why do 
you prefer to use count(*) in EXPLAIN instead of plain targetlist, like 
"SELECT x,y,..."?

Also, according to the test mentioned by Tom:
1. I see, PG uses IndexScan on (x,y). So, column x will be already 
sorted before the MergeJoin. Why not use Incremental Sort on (x,z,w) 
instead of full sort?
2. For memo, IMO, this test shows us the future near perspective of this 
feature: It is cheaper to use grouping order (w,z) instead of (z,w).


--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC: GROUP BY optimization

2024-02-01 Thread Andrei Lepikhov

On 2/2/2024 09:02, Tom Lane wrote:

Alexander Korotkov  writes:

I'm going to push this if there are no objections.


One of the test cases added by this commit has not been very
stable in the buildfarm.  Latest example is here:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion=2024-02-01%2021%3A28%3A04

and I've seen similar failures intermittently on other machines.

I'd suggest building this test atop a table that is more stable
than pg_class.  You're just waving a red flag in front of a bull
if you expect stable statistics from that during a regression run.
Nor do I see any particular reason for pg_class to be especially
suited to the test.

Yeah, It is my fault. Please, see in the attachment the patch fixing that.
--
regards,
Andrei Lepikhov
Postgres Professional
From 11a049d95ee48e38ad569aab7663d8de91f946ad Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Fri, 2 Feb 2024 10:39:55 +0700
Subject: [PATCH] Replace the GROUP-BY optimization test with the same based on
 something less volatile when the pg_class relation.

---
 src/test/regress/expected/aggregates.out | 32 +++-
 src/test/regress/sql/aggregates.sql  |  9 +++
 2 files changed, 18 insertions(+), 23 deletions(-)

diff --git a/src/test/regress/expected/aggregates.out 
b/src/test/regress/expected/aggregates.out
index 7a73c19314..c2e1b8c9ed 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2873,7 +2873,6 @@ SELECT y,x,array_agg(distinct w) FROM btg WHERE y < 0 
GROUP BY x,y;
 (6 rows)
 
 RESET enable_incremental_sort;
-DROP TABLE btg;
 -- The case, when scanning sort order correspond to aggregate sort order but
 -- can not be found in the group-by list
 CREATE TABLE agg_sort_order (c1 int PRIMARY KEY, c2 int);
@@ -2901,32 +2900,31 @@ DROP TABLE agg_sort_order CASCADE;
 SET enable_hashjoin = off;
 SET enable_nestloop = off;
 explain (COSTS OFF)
-SELECT c1.relname,c1.relpages
-FROM pg_class c1 JOIN pg_class c2 ON (c1.relname=c2.relname AND 
c1.relpages=c2.relpages)
-GROUP BY c1.reltuples,c1.relpages,c1.relname
-ORDER BY c1.relpages, c1.relname, c1.relpages*c1.relpages;
- QUERY PLAN
  
--
+SELECT b1.x,b1.w FROM btg b1 JOIN btg b2 ON (b1.z=b2.z AND b1.w=b2.w)
+GROUP BY b1.x,b1.z,b1.w ORDER BY b1.z, b1.w, b1.x*b1.x;
+QUERY PLAN 
+---
  Incremental Sort
-   Sort Key: c1.relpages, c1.relname, ((c1.relpages * c1.relpages))
-   Presorted Key: c1.relpages, c1.relname
+   Sort Key: b1.z, b1.w, ((b1.x * b1.x))
+   Presorted Key: b1.z, b1.w
->  Group
- Group Key: c1.relpages, c1.relname, c1.reltuples
+ Group Key: b1.z, b1.w, b1.x
  ->  Incremental Sort
-   Sort Key: c1.relpages, c1.relname, c1.reltuples
-   Presorted Key: c1.relpages, c1.relname
+   Sort Key: b1.z, b1.w, b1.x
+   Presorted Key: b1.z, b1.w
->  Merge Join
- Merge Cond: ((c1.relpages = c2.relpages) AND (c1.relname 
= c2.relname))
+ Merge Cond: ((b1.z = b2.z) AND (b1.w = b2.w))
  ->  Sort
-   Sort Key: c1.relpages, c1.relname
-   ->  Seq Scan on pg_class c1
+   Sort Key: b1.z, b1.w
+   ->  Seq Scan on btg b1
  ->  Sort
-   Sort Key: c2.relpages, c2.relname
-   ->  Seq Scan on pg_class c2
+   Sort Key: b2.z, b2.w
+   ->  Seq Scan on btg b2
 (16 rows)
 
 RESET enable_hashjoin;
 RESET enable_nestloop;
+DROP TABLE btg;
 RESET enable_hashagg;
 RESET max_parallel_workers;
 RESET max_parallel_workers_per_gather;
diff --git a/src/test/regress/sql/aggregates.sql 
b/src/test/regress/sql/aggregates.sql
index 916dbf908f..3548fbb8db 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1229,8 +1229,6 @@ EXPLAIN (VERBOSE, COSTS OFF)
 SELECT y,x,array_agg(distinct w) FROM btg WHERE y < 0 GROUP BY x,y;
 RESET enable_incremental_sort;
 
-DROP TABLE btg;
-
 -- The case, when scanning sort order correspond to aggregate sort order but
 -- can not be found in the group-by list
 CREATE TABLE agg_sort_order (c1 int PRIMARY KEY, c2 int);
@@ -1245,13 +1243,12 @@ DROP TABLE agg_sort_order CASCADE;
 SET enable_hashjoin = off;
 SET enable_nestloop = off;
 explain (COSTS OFF)
-SELECT c1.relname,c1.relpages
-FROM pg_class c1 JOIN pg_class c2 ON (c1.relname=c2.relname AND 
c1.relpages=c2.relpages)
-GROUP BY c1.reltuples,c1.relpages,c1.relname
-ORDER BY c1.relpages, c1.relname, c1.relpages*c1.r

Re: POC: GROUP BY optimization

2024-01-18 Thread Andrei Lepikhov

Just forgotten second attachment.

--
regards,
Andrei Lepikhov
Postgres Professional
diff --git a/src/backend/optimizer/path/pathkeys.c 
b/src/backend/optimizer/path/pathkeys.c
index 1095b73dac..b612420547 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -432,6 +432,21 @@ group_keys_reorder_by_pathkeys(List *pathkeys, List 
**group_pathkeys,
return n;
 }
 
+static bool
+duplicated_pathkey_combination(List *infos, List *pathkeys)
+{
+   ListCell *lc;
+
+   foreach (lc, infos)
+   {
+   PathKeyInfo *info = lfirst_node(PathKeyInfo, lc);
+
+   if (compare_pathkeys(pathkeys, info->pathkeys) == 
PATHKEYS_EQUAL)
+   return true;
+   }
+   return false;
+}
+
 /*
  * get_useful_group_keys_orderings
  * Determine which orderings of GROUP BY keys are potentially 
interesting.
@@ -491,7 +506,7 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path 
*path)
 
if (n > 0 &&
(enable_incremental_sort || n == 
root->num_groupby_pathkeys) &&
-   compare_pathkeys(pathkeys, root->group_pathkeys) != 
PATHKEYS_EQUAL)
+   !duplicated_pathkey_combination(infos, pathkeys))
{
info = makeNode(PathKeyInfo);
info->pathkeys = pathkeys;
@@ -514,8 +529,9 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path 
*path)

   ,

   root->num_groupby_pathkeys);
 
-   if (n > 0 && compare_pathkeys(pathkeys, root->group_pathkeys) 
!= PATHKEYS_EQUAL &&
-   (enable_incremental_sort || n == 
list_length(root->sort_pathkeys)))
+   if (n > 0 &&
+   (enable_incremental_sort || n == 
list_length(root->sort_pathkeys)) &&
+   !duplicated_pathkey_combination(infos, pathkeys))
{
info = makeNode(PathKeyInfo);
info->pathkeys = pathkeys;


Re: POC: GROUP BY optimization

2024-01-18 Thread Andrei Lepikhov

On 16/1/2024 22:05, Alexander Korotkov wrote:

On Tue, Jan 16, 2024 at 4:48 AM Richard Guo  wrote:

* When trying to match the ordering of GROUP BY to that of ORDER BY in
get_useful_group_keys_orderings, this patch checks against the length of
path->pathkeys.  This does not make sense.  I guess this is a typo and
the real intention is to check against root->sort_pathkeys.

Thanks! It is really my blunder - fresh look works.


--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -504,7 +504,7 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path 
*path)
root->num_groupby_pathkeys);

 if (n > 0 &&
-   (enable_incremental_sort || n == list_length(path->pathkeys)))
+   (enable_incremental_sort || n == list_length(root->sort_pathkeys)))

However, I think this is also incorrect.  When incremental sort is
disabled, it is reasonable to consider reordering the GROUP BY keys only
if the number of matching pathkeys is equal to the length of
root->group_pathkeys i.e. if 'n == list_length(root->group_pathkeys)'.


Hmm... Why should this be list_length(root->group_pathkeys) while
we're targeting root->sort_pathkeys.  I yet changed that to
list_length(root->sort_pathkeys).
I think, in the first case, when we are trying to arrange group-by keys 
according to underlying pathkeys with incremental sort = off, it makes 
sense to do if we fetch all group-by keys regardless of a more or equal 
number of path keys the incoming path contains. The code and test case 
are in step1.txt.



* When the original ordering of GROUP BY keys matches the path's
pathkeys or ORDER BY keys, get_useful_group_keys_orderings would
generate duplicate PathKeyInfos.  Consequently, this duplication would
lead to the creation and addition of identical paths multiple times.
This is not great.  Consider the query below:

create table t (a int, b int);
create index on t (a, b);
set enable_hashagg to off;

explain select count(*) from t group by a, b;
 QUERY PLAN
--
  GroupAggregate  (cost=0.15..97.27 rows=226 width=16)
Group Key: a, b
->  Index Only Scan using t_a_b_idx on t  (cost=0.15..78.06 rows=2260 
width=8)
(3 rows)

The same path with group keys {a, b} is created and added twice.


I tried to address that.


* Part of the work performed in this patch overlaps with that of
preprocess_groupclause.  They are both trying to adjust the ordering of
the GROUP BY keys to match ORDER BY.  I wonder if it would be better to
perform this work only once.


Andrei, could you take a look.
As I see, the PathKeyInfo list often contains duplicated pathkeys, 
coming from either sort_pathkeys or path->pathkeys orderings. So, I can 
propose to check duplicates each time (see step2.txt in attachment).



* When reordering the GROUP BY keys, I think the following checks need
some improvements.

+   /*
+* Since 1349d27 pathkey coming from underlying node can be in the
+* root->group_pathkeys but not in the processed_groupClause. So, we
+* should be careful here.
+*/
+   sgc = get_sortgroupref_clause_noerr(pathkey->pk_eclass->ec_sortref,
+   *group_clauses);
+   if (!sgc)
+   /* The grouping clause does not cover this pathkey */
+   break;

I think we need to check or assert 'pathkey->pk_eclass->ec_sortref' is
valid before calling get_sortgroupref_clause_noerr with it.  Also, how
can we guarantee that the returned GROUP BY item is sortable?  Should we
check that with OidIsValid(sgc->sortop)?


Added.

Reviewed it, looks good.

--
regards,
Andrei Lepikhov
Postgres Professional
diff --git a/src/backend/optimizer/path/pathkeys.c 
b/src/backend/optimizer/path/pathkeys.c
index 919d54dd79..1095b73dac 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -489,8 +489,9 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path 
*path)
n = group_keys_reorder_by_pathkeys(path->pathkeys, , 
,

   root->num_groupby_pathkeys);
 
-   if (n > 0 && compare_pathkeys(pathkeys, root->group_pathkeys) 
!= PATHKEYS_EQUAL &&
-   (enable_incremental_sort || n == 
list_length(path->pathkeys)))
+   if (n > 0 &&
+   (enable_incremental_sort || n == 
root->num_groupby_pathkeys) &&
+   compare_pathkeys(pathkeys, root->group_pathkeys) != 
PATHKEYS_EQUAL)
{
info = makeNode(PathKeyInfo);
info->pathkeys = pathkeys;
diff --git a/src/test/regress/expected/ag

Re: POC: GROUP BY optimization

2024-01-15 Thread Andrei Lepikhov

On 15/1/2024 13:42, Richard Guo wrote:


On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov <mailto:aekorot...@gmail.com>> wrote:


Thank you for providing the test case relevant for this code change.
The revised patch incorporating this change is attached.  Now the
patchset looks good to me.  I'm going to push it if there are no
objections.


Seems I'm late for the party.  Can we hold for several more days?  I'd
like to have a review on this patch.
Get on board! It looks like this feature needs as much review as 
possible (likewise SJE).


--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC: GROUP BY optimization

2024-01-14 Thread Andrei Lepikhov

On 13/1/2024 22:00, Alexander Korotkov wrote:

On Sat, Jan 13, 2024 at 11:09 AM Andrei Lepikhov
 wrote:

On 11/1/2024 18:30, Alexander Korotkov wrote:

On Tue, Jan 9, 2024 at 1:14 PM Pavel Borisov  wrote:

Hmm, I don't see this old code in these patches. Resend 0002-* because
of trailing spaces.



AFAIK, cfbot does not seek old versions of patchset parts in previous messages. 
So for it to run correctly, a new version of the whole patchset should be sent 
even if most patches are unchanged.


Please, find the revised patchset with some refactoring and comments
improvement from me.  I'll continue looking into this.

The patch looks better, thanks to your refactoring.
I propose additional comments and tests to make the code more
understandable (see attachment).
I intended to look into this part of the code more, but the tests show a
difference between PostgreSQL v.15 and v.16, which causes changes in the
code of this feature.


Makes sense.  I've incorporated your changes into the patchset.

One more improvement. To underpin code change:

-   return cur_ec;  /* Match! */
+   {
+   /*
+* Match!
+*
+* Copy the sortref if it wasn't set yet. That may happen if
+* the ec was constructed from WHERE clause, i.e. it doesn't
+* have a target reference at all.
+*/
+   if (cur_ec->ec_sortref == 0 && sortref > 0)
+   cur_ec->ec_sortref = sortref;
+   return cur_ec;
+   }

I propose the test (see attachment). It shows why we introduce this 
change: GROUP-BY should juggle not only pathkeys generated by explicit 
sort operators but also planner-made, likewise in this example, by 
MergeJoin.


--
regards,
Andrei Lepikhov
Postgres Professional
diff --git a/src/test/regress/expected/aggregates.out 
b/src/test/regress/expected/aggregates.out
index 0d46e096e5..ca38e78f21 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2879,6 +2879,37 @@ FROM t1 WHERE c2 < 100 GROUP BY c1 ORDER BY 2;
 (10 rows)
 
 DROP TABLE t1 CASCADE;
+-- Check, that GROUP-BY reordering optimization can operate with pathkeys, 
built
+-- by planner itself. For example, by MergeJoin.
+SET enable_hashjoin = off;
+SET enable_nestloop = off;
+explain (COSTS OFF)
+SELECT c1.relname,c1.relpages
+FROM pg_class c1 JOIN pg_class c2 ON (c1.relname=c2.relname AND 
c1.relpages=c2.relpages)
+GROUP BY c1.reltuples,c1.relpages,c1.relname
+ORDER BY c1.relpages, c1.relname, c1.relpages*c1.relpages;
+ QUERY PLAN
  
+-
+ Incremental Sort
+   Sort Key: c1.relpages, c1.relname, ((c1.relpages * c1.relpages))
+   Presorted Key: c1.relpages, c1.relname
+   ->  Group
+ Group Key: c1.relpages, c1.relname, c1.reltuples
+ ->  Incremental Sort
+   Sort Key: c1.relpages, c1.relname, c1.reltuples
+   Presorted Key: c1.relpages, c1.relname
+   ->  Merge Join
+ Merge Cond: ((c1.relpages = c2.relpages) AND (c1.relname 
= c2.relname))
+ ->  Sort
+   Sort Key: c1.relpages, c1.relname
+   ->  Seq Scan on pg_class c1
+ ->  Sort
+   Sort Key: c2.relpages, c2.relname
+   ->  Seq Scan on pg_class c2
+(16 rows)
+
+RESET enable_hashjoin;
+RESET enable_nestloop;
 RESET enable_hashagg;
 RESET max_parallel_workers;
 RESET max_parallel_workers_per_gather;
diff --git a/src/test/regress/sql/aggregates.sql 
b/src/test/regress/sql/aggregates.sql
index f99167ac9e..cf87b5d5dd 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1233,6 +1233,18 @@ SELECT array_agg(c1 ORDER BY c2),c2
 FROM t1 WHERE c2 < 100 GROUP BY c1 ORDER BY 2;
 DROP TABLE t1 CASCADE;
 
+-- Check, that GROUP-BY reordering optimization can operate with pathkeys, 
built
+-- by planner itself. For example, by MergeJoin.
+SET enable_hashjoin = off;
+SET enable_nestloop = off;
+explain (COSTS OFF)
+SELECT c1.relname,c1.relpages
+FROM pg_class c1 JOIN pg_class c2 ON (c1.relname=c2.relname AND 
c1.relpages=c2.relpages)
+GROUP BY c1.reltuples,c1.relpages,c1.relname
+ORDER BY c1.relpages, c1.relname, c1.relpages*c1.relpages;
+RESET enable_hashjoin;
+RESET enable_nestloop;
+
 RESET enable_hashagg;
 RESET max_parallel_workers;
 RESET max_parallel_workers_per_gather;


Re: POC: GROUP BY optimization

2024-01-13 Thread Andrei Lepikhov

On 11/1/2024 18:30, Alexander Korotkov wrote:

Hi!

On Tue, Jan 9, 2024 at 1:14 PM Pavel Borisov  wrote:

Hmm, I don't see this old code in these patches. Resend 0002-* because
of trailing spaces.



AFAIK, cfbot does not seek old versions of patchset parts in previous messages. 
So for it to run correctly, a new version of the whole patchset should be sent 
even if most patches are unchanged.


Please, find the revised patchset with some refactoring and comments
improvement from me.  I'll continue looking into this.

The patch looks better, thanks to your refactoring.
I propose additional comments and tests to make the code more 
understandable (see attachment).
I intended to look into this part of the code more, but the tests show a 
difference between PostgreSQL v.15 and v.16, which causes changes in the 
code of this feature.


--
regards,
Andrei Lepikhov
Postgres Professional
diff --git a/src/backend/optimizer/path/pathkeys.c 
b/src/backend/optimizer/path/pathkeys.c
index f4b7dcac21..5aac6d6677 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -397,6 +397,11 @@ group_keys_reorder_by_pathkeys(List *pathkeys, List 
**group_pathkeys,
!list_member_ptr(*group_pathkeys, pathkey))
break;
 
+   /*
+* Since 1349d27 pathkey coming from underlying node can be in 
the
+* root->group_pathkeys but not in the processed_groupClause. 
So, we
+* should be careful here.
+*/
sgc = 
get_sortgroupref_clause_noerr(pathkey->pk_eclass->ec_sortref,

*group_clauses);
if (!sgc)
diff --git a/src/test/regress/expected/aggregates.out 
b/src/test/regress/expected/aggregates.out
index 423c8ec3b6..0d46e096e5 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2857,6 +2857,28 @@ explain (COSTS OFF) SELECT x,w FROM btg GROUP BY w,x,y,z 
ORDER BY x*x,z;
 (8 rows)
 
 DROP TABLE btg;
+-- The case, when scanning sort order correspond to aggregate sort order but
+-- can not be found in the group-by list
+CREATE TABLE t1 (c1 int PRIMARY KEY, c2 int);
+CREATE UNIQUE INDEX ON t1(c2);
+explain (costs off)
+SELECT array_agg(c1 ORDER BY c2),c2
+FROM t1 WHERE c2 < 100 GROUP BY c1 ORDER BY 2;
+   QUERY PLAN   
+
+ Sort
+   Sort Key: c2
+   ->  GroupAggregate
+ Group Key: c1
+ ->  Sort
+   Sort Key: c1, c2
+   ->  Bitmap Heap Scan on t1
+ Recheck Cond: (c2 < 100)
+ ->  Bitmap Index Scan on t1_c2_idx
+   Index Cond: (c2 < 100)
+(10 rows)
+
+DROP TABLE t1 CASCADE;
 RESET enable_hashagg;
 RESET max_parallel_workers;
 RESET max_parallel_workers_per_gather;
diff --git a/src/test/regress/sql/aggregates.sql 
b/src/test/regress/sql/aggregates.sql
index b9fcceedd7..f99167ac9e 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1224,6 +1224,15 @@ explain (COSTS OFF) SELECT x,w FROM btg GROUP BY w,x,y,z 
ORDER BY x*x,z;
 
 DROP TABLE btg;
 
+-- The case, when scanning sort order correspond to aggregate sort order but
+-- can not be found in the group-by list
+CREATE TABLE t1 (c1 int PRIMARY KEY, c2 int);
+CREATE UNIQUE INDEX ON t1(c2);
+explain (costs off)
+SELECT array_agg(c1 ORDER BY c2),c2
+FROM t1 WHERE c2 < 100 GROUP BY c1 ORDER BY 2;
+DROP TABLE t1 CASCADE;
+
 RESET enable_hashagg;
 RESET max_parallel_workers;
 RESET max_parallel_workers_per_gather;


Re: introduce dynamic shared memory registry

2024-01-10 Thread Andrei Lepikhov

On 9/1/2024 00:16, Nathan Bossart wrote:

On Mon, Jan 08, 2024 at 10:53:17AM +0530, Bharath Rupireddy wrote:

1. I think we need to add some notes about this new way of getting
shared memory for external modules in the Shared Memory and
LWLocks section in xfunc.sgml? This will at least tell there's
another way for external modules to get shared memory, not just with
the shmem_request_hook and shmem_startup_hook. What do you think?
+1. Maybe even more - in the section related to extensions, this 
approach to using shared data can be mentioned, too.



2. FWIW, I'd like to call this whole feature "Support for named DSM
segments in Postgres". Do you see anything wrong with this?


Why do you feel it should be renamed?  I don't see anything wrong with it,
but I also don't see any particular advantage with that name compared to
"dynamic shared memory registry."
It is not a big issue, I suppose. But for me personally (as not a native 
English speaker), the label "Named DSM segments" seems more 
straightforward to understand.



3. IIUC, this feature eventually makes both shmem_request_hook and
shmem_startup_hook pointless, no? Or put another way, what's the
significance of shmem request and startup hooks in lieu of this new
feature? I think it's quite possible to get rid of the shmem request
and startup hooks (of course, not now but at some point in future to
not break the external modules), because all the external modules can
allocate and initialize the same shared memory via
dsm_registry_init_or_attach and its init_callback. All the external
modules will then need to call dsm_registry_init_or_attach in their
_PG_init callbacks and/or in their bg worker's main functions in case
the modules intend to start up bg workers. Am I right?


Well, modules might need to do a number of other things (e.g., adding
hooks) that can presently only be done when preloaded, in which case I
doubt there's much benefit from switching to the DSM registry.  I don't
really intend for it to replace the existing request/startup hooks, but
you're probably right that most, if not all, could use the registry
instead.  IMHO this is well beyond the scope of this thread, though.

+1, it may be a many reasons to use these hooks.

>> 3. Use NAMEDATALEN instead of 64?
>> +charkey[64];
> I kept this the same, as I didn't see any need to tie the key size to
> NAMEDATALEN.
IMO, we should avoid magic numbers whenever possible. Current logic 
according to which first users of this feature will be extensions 
naturally bonds this size to the size of the 'name' type.


And one more point. I think the commit already deserves a more detailed 
commit message.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: Custom explain options

2024-01-10 Thread Andrei Lepikhov

On 10/1/2024 20:27, Konstantin Knizhnik wrote:


On 10/01/2024 8:46 am, Michael Paquier wrote:

On Wed, Jan 10, 2024 at 01:29:30PM +0700, Andrei Lepikhov wrote:
What do you think about this really useful feature? Do you wish to 
develop

it further?

I am biased here.  This seems like a lot of code for something we've
been delegating to the explain hook for ages.  Even if I can see the
appeal of pushing that more into explain.c to get more data on a
per-node basis depending on the custom options given by the caller of
an EXPLAIN entry point, I cannot get really excited about the extra
maintenance this facility would involve compared to the potential
gains, knowing that there's a hook.
--
Michael



Well, I am not sure that proposed patch is flexible enough to handle all 
possible scenarios.
I just wanted to make it as simple as possible to leave some chances for 
it to me merged.
But it is easy to answer the question why existed explain hook is not 
enough:


1. It doesn't allow to add some extra options to EXPLAIN. My intention 
was to be able to do something like this "explain 
(analyze,buffers,prefetch) ...". It is completely not possible with 
explain hook.
I agree. Designing mostly planner-related extensions, I also wanted to 
add some information to the explain of nodes. For example, 
pg_query_state could add the status of the node at the time of 
interruption of execution: started, stopped, or loop closed.
Maybe we should gather some statistics on how developers of extensions 
deal with that issue ...


--
regards,
Andrei Lepikhov
Postgres Professional





Re: Multidimensional Histograms

2024-01-10 Thread Andrei Lepikhov

On 8/1/2024 16:21, Alexander Cheshev wrote:

Hi Andrei,


Maybe my wording needed to be more precise. I didn't implement
multidimensional histograms before, so I don't know how expensive they
are. I meant that for dependency statistics over about six columns, we
have a lot of combinations to compute.


Equi-Depth Histogram in a 6 dimensional case requires 6 times more
iterations. Postgres already uses Equi-Depth Histogram. Even if you
increase the number of buckets from 100 to 1000 then there will be no
overhead as the time complexity of Equi-Depth Histogram has no
dependence on the number of buckets. So, no overhead at all!


Maybe. For three columns, we have 9 combinations (passes) for building 
dependency statistics and 4 combinations for ndistincts; for six 
columns, we have 186 and 57 combinations correspondingly.
Even remembering that dependency is just one number for one combination, 
building the dependency statistics is still massive work. So, in the 
multicolumn case, having something like a histogram may be more effective.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: Custom explain options

2024-01-09 Thread Andrei Lepikhov

On 30/11/2023 22:40, Konstantin Knizhnik wrote:
In all this cases we are using array of `Instrumentation` and if it 
contains varying part, then it is not clear where to place it.
Yes, there is also code which serialize and sends instrumentations 
between worker processes  and I have updated this code in my PR to send 
actual amount of custom instrumentation data. But it can not help with 
the cases above.
What do you think about this really useful feature? Do you wish to 
develop it further?


--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC: GROUP BY optimization

2024-01-09 Thread Andrei Lepikhov

On 9/1/2024 16:45, vignesh C wrote:

On Tue, 9 Jan 2024 at 14:31, Andrei Lepikhov  wrote:


Here is a new version of GROUP-BY optimization without sort model.

On 21/12/2023 17:53, Alexander Korotkov wrote:

I'd like to make some notes.

1) As already mentioned, there is clearly a repetitive pattern for the
code following after get_useful_group_keys_orderings() calls.  I think
it would be good to extract it into a separate function.  Please, do
this as a separate patch coming before the group-by patch. That would
simplify the review.

Done. See patch 0001-*. Unfortunately, extraction of whole cycle isn't
practical, because it blows out the interface of the routine.


2) I wonder what planning overhead this patch could introduce?  Could
you try to measure the worst case?  What if we have a table with a lot
of indexes and a long list of group-by clauses partially patching
every index.  This should give us an understanding on whether we need
a separate GUC to control this feature.

In current implementation I don't anticipate any significant overhead.
GUC is needed here to allow users adhere their own ordering and to
disable feature in the case of problems.


4) I think we can do some optimizations when enable_incremental_sort
== off.  Then in get_useful_group_keys_orderings() we should only deal
with input_path fully matching the group-by clause, and try only full
match of group-by output to the required order.

Hm, is it really make sense in current implementation?


CFBot shows the following errors at [1] with:
[08:33:28.813] ../src/backend/utils/adt/selfuncs.c: In function
‘estimate_num_groups’:
[08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3389:9: warning:
implicit declaration of function ‘estimate_num_groups_incremental’
[-Wimplicit-function-declaration]
[08:33:28.813] 3389 | return estimate_num_groups_incremental(root, groupExprs,
[08:33:28.813] | ^~~
[08:33:28.813] ../src/backend/utils/adt/selfuncs.c: At top level:
[08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3400:1: warning: no
previous prototype for ‘estimate_num_groups_incremental’
[-Wmissing-prototypes]
[08:33:28.813] 3400 | estimate_num_groups_incremental(PlannerInfo
*root, List *groupExprs,
[08:33:28.813] | ^~~
[08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3400:1: error:
conflicting types for ‘estimate_num_groups_incremental’
[08:33:28.813] ../src/backend/utils/adt/selfuncs.c:3389:9: note:
previous implicit declaration of ‘estimate_num_groups_incremental’ was
here
[08:33:28.813] 3389 | return estimate_num_groups_incremental(root, groupExprs,
Hmm, I don't see this old code in these patches. Resend 0002-* because 
of trailing spaces.


--
regards,
Andrei Lepikhov
Postgres Professional
From 45cfff5731b81c0df2af00f5e4212fc598f6a231 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Tue, 9 Jan 2024 12:32:15 +0700
Subject: [PATCH 2/2] Explore alternative orderings of group-by pathkeys during
 optimization.

When evaluating a query with a multi-column GROUP BY clause, we can minimize
sort operations or avoid them if we synchronize the order of GROUP BY clauses
with the ORDER BY sort clause or sort order, which comes from the underlying
query tree. Grouping does not imply any ordering, so we can compare
the keys in arbitrary order, and a Hash Agg leverages this. But for Group Agg,
we simply compared keys in the order specified in the query. This commit
explores alternative ordering of the keys, trying to find a cheaper one.

The ordering of group keys may interact with other parts of the query, some of
which may not be known while planning the grouping. For example, there may be
an explicit ORDER BY clause or some other ordering-dependent operation higher up
in the query, and using the same ordering may allow using either incremental
sort or even eliminating the sort entirely.

The patch always keeps the ordering specified in the query, assuming the user
might have additional insights.

This introduces a new GUC enable_group_by_reordering so that the optimization
may be disabled if needed.
---
 src/backend/optimizer/path/equivclass.c   |  13 +-
 src/backend/optimizer/path/pathkeys.c | 222 ++
 src/backend/optimizer/plan/planner.c  | 214 +++--
 src/backend/utils/misc/guc_tables.c   |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/nodes/pathnodes.h |  10 +
 src/include/optimizer/paths.h |   3 +
 src/test/regress/expected/aggregates.out  | 132 +++
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/sql/aggregates.sql   |  47 
 10 files changed, 586 insertions(+), 69 deletions(-)

diff --git a/src/backend/optimizer/path/equivclass.c 
b/src/backend/optimizer/path/equivclass.c
index e86dfeaecd..7dd14d0a43 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -652

  1   2   >