Re: Vacuum statistics

2024-05-30 Thread Alena Rybakina

On 30.05.2024 10:33, Alena Rybakina wrote:


I suggest gathering information about vacuum resource consumption for 
processing indexes and tables and storing it in the table and index 
relationships (for example, PgStat_StatTabEntry structure like it has 
realized for usual statistics). It will allow us to determine how well 
the vacuum is configured and evaluate the effect of overhead on the 
system at the strategic level, the vacuum has gathered this 
information already, but this valuable information doesn't store it.


My colleagues and I have prepared a patch that can help to solve this 
problem.


We are open to feedback.

--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From dfda656b35be2a73c076cb723fdeb917630e61e3 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Thu, 30 May 2024 11:02:10 -0700
Subject: [PATCH] Machinery for grabbing an extended vacuum statistics on 
 relations. Remember, statistic on heap and index relations a bit different.

Value of total_blks_hit, total_blks_read, total_blks_dirtied are number of
hitted, missed and dirtied pages in shared buffers during a vacuum operation
respectively.

total_blks_dirtied means 'dirtied only by this action'. So, if this page was
dirty before the vacuum operation, it doesn't count this page as 'dirtied'.

The tuples_deleted parameter is the number of tuples cleaned up by the vacuum
operation.

The delay_time value means total vacuum sleep time in vacuum delay point.
The pages_removed value is the number of pages by which the physical data
storage of the relation was reduced.
The value of pages_deleted parameter is the number of freed pages in the table
(file size may not have changed).

Interruptions number of (auto)vacuum process during vacuuming of a relation.
We report from the vacuum_error_callback routine. So we can log all ERROR
reports. In the case of autovacuum we can report SIGINT signals too.
It maybe dangerous to make such complex task (send) in an error callback -
we can catch ERROR in ERROR problem. But it looks like we have so small
chance to stuck into this problem. So, let's try to use.
This parameter relates to a problem, covered by b19e4250.

Tracking of IO during an (auto)vacuum operation.
Introduced variables blk_read_time and blk_write_time tracks only access to
buffer pages and flushing them to disk. Reading operation is trivial, but
writing measurement technique is not obvious.
So, during a vacuum writing time can be zero incremented because no any flushing
operations were performed.

System time and user time are parameters that describes how much time a vacuum
operation has spent in executing of code in user space and kernel space
accordingly. Also, accumulate total time of a vacuum that is a diff between
timestamps in start and finish points in the vacuum code.
Remember about idle time, when vacuum waited for IO and locks, so total time
isn't equal a sum of user and system time, but no less.

pages_frozen - number of pages that are marked as frozen in vm during vacuum.
This parameter is incremented if page is marked as all-frozen.
pages_all_visible - number of pages that are marked as all-visible in vm during
vacuum.

Authors: Alena Rybakina , Andrei Lepikhov , Andrei Zubkov 
---
 src/backend/access/heap/vacuumlazy.c  | 245 +-
 src/backend/access/heap/visibilitymap.c   |  13 +
 src/backend/catalog/system_views.sql  | 123 +
 src/backend/commands/vacuum.c |   4 +
 src/backend/commands/vacuumparallel.c |   1 +
 src/backend/utils/activity/pgstat.c   | 117 +++--
 src/backend/utils/activity/pgstat_database.c  |   1 +
 src/backend/utils/activity/pgstat_relation.c  |  52 ++-
 src/backend/utils/adt/pgstatfuncs.c   | 290 
 src/backend/utils/error/elog.c|  13 +
 src/include/catalog/pg_proc.dat   |  28 +-
 src/include/commands/vacuum.h |   1 +
 src/include/pgstat.h  | 118 -
 src/include/utils/elog.h  |   2 +-
 src/include/utils/pgstat_internal.h   |  36 +-
 .../expected/vacuum-extended-statistic.out| 419 ++
 .../isolation/expected/vacuum-extending.out   |  68 +++
 src/test/isolation/isolation_schedule |   2 +
 .../specs/vacuum-extended-statistic.spec  | 179 
 .../isolation/specs/vacuum-extending.spec |  58 +++
 src/test/regress/expected/opr_sanity.out  |   9 +-
 src/test/regress/expected/rules.out   |  79 
 22 files changed, 1812 insertions(+), 46 deletions(-)
 create mode 100644 src/test/isolation/expected/vacuum-extended-statistic.out
 create mode 100644 src/test/isolation/expected/vacuum-extending.out
 create mode 100644 src/test/isolation/specs/vacuum-extended-statistic.spec
 create mode 100644 src/test/isolation/specs/vacuum-extending.spec

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap

Vacuum statistics

2024-05-30 Thread Alena Rybakina

Hello, everyone!

I think we don't have enough information to analyze vacuum functionality.

Needless to say that the vacuum is the most important process for a 
database system. It prevents problems like table and index bloating and 
emergency freezing if we have a wraparound problem. Furthermore, it 
keeps the visibility map up to date. On the other hand, because of 
incorrectly adjusted aggressive settings of autovacuum it can consume a 
lot of computing resources that lead to all queries to the system 
running longer.


Nowadays the vacuum gathers statistical information about tables, but it 
is important not for optimizer only.


Because the vacuum is an automation process, there are a lot of settings 
that determine their aggressive functionality to other objects of the 
database. Besides, sometimes it is important to set a correct parameter 
for the specified table, because of its dynamic changes.


An administrator of a database needs to set the settings of autovacuum 
to have a balance between the vacuum's useful action in the database 
system on the one hand, and the overhead of its workload on the other. 
However, it is not enough for him to decide on vacuum functionality 
through statistical information about the number of vacuum passes 
through tables and operational data from progress_vacuum, because it is 
available only during vacuum operation and does not provide a strategic 
overview over the considered period.


To sum up, an automation vacuum has a strategic behavior because the 
frequency of its functionality and resource consumption depends on the 
workload of the database. Its workload on the database is minimal for an 
append-only table and it is a maximum for the table with a 
high-frequency updating. Furthermore, there is a high dependence of the 
vacuum load on the number and volume of indexes. Because of the absence 
of the visibility map for indexes, the vacuum scans the index 
completely, and the worst situation when it needs to do it during a 
bloating index situation in a small table.



I suggest gathering information about vacuum resource consumption for 
processing indexes and tables and storing it in the table and index 
relationships (for example, PgStat_StatTabEntry structure like it has 
realized for usual statistics). It will allow us to determine how well 
the vacuum is configured and evaluate the effect of overhead on the 
system at the strategic level, the vacuum has gathered this information 
already, but this valuable information doesn't store it.


--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: Sort operation displays more tuples than it contains its subnode

2024-05-23 Thread Alena Rybakina

Yes, I got it. Thank you very much for the explanation.

On 23.05.2024 00:17, Tom Lane wrote:

"a.rybakina"  writes:

I faced the issue, when the sorting node in the actual information
shows a larger number of tuples than it actually is. And I can not
understand why?

If I'm reading this correctly, the sort node you're worrying about
feeds the inner side of a merge join.  Merge join will rewind its
inner side to the start of the current group of equal-keyed tuples
whenever it sees that the next outer tuple must also be joined to
that group.  Since what EXPLAIN is counting is the number of tuples
returned from the node, that causes it to double-count those tuples.
The more duplicate-keyed tuples on the outer side, the bigger the
effect.

You can see the same thing happening at the Materialize a little
further up, which is feeding the inside of the other merge join.


--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: Fix parallel vacuum buffer usage reporting

2024-05-10 Thread Alena Rybakina
Hi! I could try to check it with the test, but I want to ask you about 
details, because I'm not sure that I completely understand the test case.


You mean that we need to have two backends and on one of them we deleted 
the tuples before vacuum called the other, do you?


On 10.05.2024 13:25, Nazir Bilal Yavuz wrote:

Hi,

Thank you for working on this!

On Wed, 1 May 2024 at 06:37, Masahiko Sawada  wrote:

Thank you for further testing! I've pushed the patch.

I realized a behaviour change while looking at 'Use pgBufferUsage for
block reporting in analyze' thread [1]. Since that change applies here
as well, I thought it is better to mention it here.

Before this commit, VacuumPageMiss did not count the blocks if its
read was already completed by other backends [2]. Now,
'pgBufferUsage.local_blks_read + pgBufferUsage.shared_blks_read'
counts every block attempted to be read; possibly double counting if
someone else has already completed the read. I do not know which
behaviour is correct but I wanted to mention this.

[1] 
https://postgr.es/m/CAO6_Xqr__kTTCLkftqS0qSCm-J7_xbRG3Ge2rWhucxQJMJhcRA%40mail.gmail.com

[2] In the WaitReadBuffers() function, see comment:
 /*
  * Skip this block if someone else has already completed it.  If an
  * I/O is already in progress in another backend, this will wait for
  * the outcome: either done, or something went wrong and we will
  * retry.
  */


--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: Fix parallel vacuum buffer usage reporting

2024-04-30 Thread Alena Rybakina

Hi!

On 30.04.2024 05:18, Masahiko Sawada wrote:

On Fri, Apr 26, 2024 at 9:12 PM Alena Rybakina  wrote:

Hi!

The same script was run, but using vacuum verbose analyze, and I saw the 
difference again in the fifth step:
with your patch: buffer usage: 32312 hits, 607 misses, 1566 dirtied
master: buffer usage: 32346 hits, 573 misses, 1360 dirtied

Isn't there a chance for the checkpointer to run during this time? That could 
make the conditions between the two runs slightly different and explain the 
change in buffer report.

[0]https://github.com/postgres/postgres/blob/8a1b31e6e59631807a08a4e9465134c343bbdf5e/src/backend/access/heap/vacuumlazy.c#L2826-L2831

Looking at the script, you won't trigger the problem.

Thank you for the link I accounted it in my next experiments.

I repeated the test without processing checkpoints with a single index, and the 
number of pages in the buffer used almost matched:

master branch: buffer usage: 32315 hits, 606 misses, 4486 dirtied

with applied patch v4 version: buffer usage: 32315 hits, 606 misses, 4489 
dirtied

I think you are right - the problem was interfering with the checkpoint 
process, by the way I checked the first version patch. To cut a long story 
short, everything is fine now with one index.

Just in case, I'll explain: I considered this case because your patch could 
have impact influenced it too.

On 25.04.2024 10:17, Anthonin Bonnefoy wrote:


On Wed, Apr 24, 2024 at 4:01 PM Alena Rybakina  wrote:

I tested the main postgres branch with and without your fix using a script that 
was written by me. It consists of five scenarios and I made a comparison in the 
logs between the original version of the master branch and the master branch 
with your patch:

  Hi! Thanks for the tests.


I have attached a test file (vacuum_check_logs.sql)

The reporting issue will only happen if there's a parallel index vacuum and it 
will only happen if there's at least 2 indexes [0]. You will need to create an 
additional index.

Speaking of the problem, I added another index and repeated the test and found 
a significant difference:

I found it when I commited the transaction (3):

master: 2964 hits, 0 misses, 0 dirtied

with applied patch v4 version: buffer usage: 33013 hits, 0 misses, 3 dirtied

When I deleted all the data from the table and later started vacuum verbose 
again (4):

master: buffer usage: 51486 hits, 0 misses, 0 dirtied

with applied patch v4 version:buffer usage: 77924 hits, 0 misses, 0 dirtied

when I inserted 1 million data into the table and updated it (5):

master:buffer usage: 27904 hits, 5021 misses, 1777 dirtied

with applied patch v4 version:buffer usage: 41051 hits, 9973 misses, 2564 
dirtied

As I see, the number of pages is significantly more than it was in the master 
branch and ,frankly, I couldn't fully figure out if it was a mistake or not.

I think that the patch fixes the problem correctly.

I've run pgindent and updated the commit message. I realized that
parallel vacuum was introduced in pg13 but buffer usage reporting in
VACUUM command was implemented in pg15. Therefore, in pg13 and pg14,
VACUUM (PARALLEL) is available but VACUUM (PARALLEL, VERBOSE) doesn't
show the buffer usage report. Autovacuum does show the buffer usage
report but parallel autovacuum is not supported. Therefore, we should
backpatch it down to 15, not 13.

I'm going to push the patch down to pg15, barring any objections.

Regards,
I agree with you about porting and I saw that the patch is working 
correctly.


--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Re: Fix parallel vacuum buffer usage reporting

2024-04-26 Thread Alena Rybakina

Hi!


The same script was run, but using vacuum verbose analyze, and I
saw the difference again in the fifth step:
with your patch: buffer usage: 32312 hits, 607 misses, 1566 dirtied
master: buffer usage: 32346 hits, 573 misses, 1360 dirtied

Isn't there a chance for the checkpointer to run during this time? 
That could make the conditions between the two runs slightly different 
and explain the change in buffer report.


[0] 
https://github.com/postgres/postgres/blob/8a1b31e6e59631807a08a4e9465134c343bbdf5e/src/backend/access/heap/vacuumlazy.c#L2826-L2831


Looking at the script, you won't trigger the problem.


Thank you for the link I accounted it in my next experiments.

I repeated the test without processing checkpoints with a single index, 
and the number of pages in the buffer used almost matched:


master branch: buffer usage: 32315 hits, 606 misses, 4486 dirtied

with applied patch v4 version: buffer usage: 32315 hits, 606 misses, 
4489 dirtied


I think you are right - the problem was interfering with the checkpoint 
process, by the way I checked the first version patch. To cut a long 
story short, everything is fine now with one index.


Just in case, I'll explain: I considered this case because your patch 
could have impact influenced it too.


On 25.04.2024 10:17, Anthonin Bonnefoy wrote:


On Wed, Apr 24, 2024 at 4:01 PM Alena Rybakina 
 wrote:


I tested the main postgres branch with and without your fix using
a script that was written by me. It consists of five scenarios and
I made a comparison in the logs between the original version of
the master branch and the master branch with your patch:

 Hi! Thanks for the tests.

I have attached a test file (vacuum_check_logs.sql)

The reporting issue will only happen if there's a parallel index 
vacuum and it will only happen if there's at least 2 indexes [0]. You 
will need to create an additional index.


Speaking of the problem, I added another index and repeated the test and 
found a significant difference:


 * I found it when I commited the transaction (3):

master: 2964hits, 0misses, 0dirtied

with applied patch v4 version: buffer usage: 33013hits, 0misses, 3dirtied

 * When I deleted all the data from the table and later started vacuum
   verbose again (4):

master: buffer usage: 51486hits, 0misses, 0dirtied

with applied patch v4 version:buffer usage: 77924hits, 0misses, 0dirtied

 * when I inserted 1 million data into the table and updated it (5):

master:buffer usage: 27904hits, 5021misses, 1777dirtied

with applied patch v4 version:buffer usage: 41051hits, 9973misses, 
2564dirtied


As I see, the number of pages is significantly more than it was in the 
master branch and ,frankly, I couldn't fully figure out if it was a 
mistake or not.


I attached a test script (vacuum_checks_logs.sql) with two indexes and 
no checkpoints, I also attached log files: the first one (vacuum_test) 
is the result of testing on the master branch, the second file with your 
applied patch (vacuum_test_v4).


--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


vacuum_check_logs.sql
Description: application/sql
postgres=# \i ~/vacuum_check_logs.sql 
psql:/home/alena/vacuum_check_logs.sql:2: ERROR:  extension "dblink" does not 
exist
CREATE EXTENSION
 dblink_connect 

 OK
(1 row)

psql:/home/alena/vacuum_check_logs.sql:5: ERROR:  table "vestat" does not exist
SET
SET
CREATE TABLE
CREATE INDEX
CREATE INDEX
psql:/home/alena/vacuum_check_logs.sql:16: INFO:  vacuuming 
"postgres.public.vestat"
psql:/home/alena/vacuum_check_logs.sql:16: INFO:  finished vacuuming 
"postgres.public.vestat": index scans: 0
pages: 0 removed, 0 remain, 0 scanned (100.00% of total)
tuples: 0 removed, 0 remain, 0 are dead but not yet removable
removable cutoff: 742, which was 0 XIDs old when operation ended
new relfrozenxid: 742, which is 3 XIDs ahead of previous value
frozen: 0 pages from table (100.00% of total) had 0 tuples frozen
index scan not needed: 0 pages from table (100.00% of total) had 0 dead item 
identifiers removed
I/O timings: read: 0.049 ms, write: 0.000 ms
avg read rate: 40.584 MB/s, avg write rate: 0.000 MB/s
buffer usage: 23 hits, 2 misses, 0 dirtied
WAL usage: 1 records, 0 full page images, 237 bytes
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
VACUUM
 pg_sleep 
--
 
(1 row)

INSERT 0 100
DELETE 10
psql:/home/alena/vacuum_check_logs.sql:25: INFO:  vacuuming 
"postgres.public.vestat"
psql:/home/alena/vacuum_check_logs.sql:25: INFO:  launched 1 parallel vacuum 
worker for index vacuuming (planned: 1)
psql:/home/alena/vacuum_check_logs.sql:25: INFO:  table "vestat": truncated 
3922 to 3530 pages
psql:/home/alena/vacuum_check_logs.sql:25: INFO:  finished vacuuming 
"postgres.public.vestat": index scans: 1
pages: 392 removed, 3530 remain, 3922 scanned (100.00% o

Re: Fix parallel vacuum buffer usage reporting

2024-04-24 Thread Alena Rybakina

On 22.04.2024 11:07, Anthonin Bonnefoy wrote:
On Sat, Apr 20, 2024 at 2:00 PM Alena Rybakina 
 wrote:


Hi, thank you for your work with this subject.

While I was reviewing your code, I noticed that your patch
conflicts with another patch [0] that been committed.

[0]

https://www.postgresql.org/message-id/flat/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com


I've rebased the patch and also split the changes:
1: Use pgBufferUsage in Vacuum and Analyze block reporting
2: Stop tracking buffer usage in the now unused Vacuum{Hit,Miss,Dirty} 
variables

3: Remove declarations of Vacuum{Hit,Miss,Dirty}


Hi! Thank you for your work, and I have reviewed your corrections.

I tested the main postgres branch with and without your fix using a 
script that was written by me. It consists of five scenarios and I made 
a comparison in the logs between the original version of the master 
branch and the master branch with your patch:


1. I added 1 million data to the table and deleted 10% of them. I ran 
vacuum verbose and didn't see any differences.

buffer usage: 12585 hits, 0 misses, 4 dirtied
2. I opened another connection with a repeatable read transaction 
through the dblink extension and launched a query updating the records 
in the table under test. Later, I ran vacuum verbose again and also 
didn't see any differences.

buffer usage: 19424 hits, 0 misses, 6 dirtied
3. I commited transaction and ran vacuum verbose again. Everything is 
fine in the logs.

buffer usage: 22960 hits, 0 misses, 11456 dirtied
4.  I deleted all the data from the table and later started vacuum 
verbose again. The number of pages in the buffer matched with your patch 
too.
5.I inserted 1 million data into the table and updated it, and I found 
the difference between the original master version and the version with 
your patch:

with your patch: buffer usage: 32315 hits, 606 misses, 1547 dirtied
original version: buffer usage: 32348 hits, 573 misses, 1456 dirtied
I suppose, something wasn't calculated.

The same script was run, but using vacuum verbose analyze, and I saw the 
difference again in the fifth step:

with your patch: buffer usage: 32312 hits, 607 misses, 1566 dirtied
master: buffer usage: 32346 hits, 573 misses, 1360 dirtied

I have attached a test file (vacuum_check_logs.sql) and four log files: 
two with the vacuum verbose command (vacuum_log and 
vacuum_log_with_patch files) and two others with the vacuum verbose 
analyze command (vacuum_analyze_test_with_patch and vacuum_analyze_test 
files). Both test approaches show logs with and without your changes.

1: Use pgBufferUsage in Vacuum and Analyze block reporting

I think that if the anayze command doesn't have the same issue, we
don't need to change it. Making the vacuum and the analyze consistent
is a good point but I'd like to avoid doing unnecessary changes in
back branches. I think the patch set would contain:

(a) make lazy vacuum use BufferUsage instead of
VacuumPage{Hit,Miss,Dirty}. (backpatched down to pg13).
(b) make analyze use BufferUsage and remove VacuumPage{Hit,Miss,Dirty}
variables for consistency and simplicity (only for HEAD, if we agree).

BTW I realized that VACUUM VERBOSE running on a temp table always
shows the number of dirtied buffers being 0, which seems to be a bug.
The patch (a) will resolve it as well.

I agree with that. I think we can leave these changes to the analysis 
command for master, but I also doubt the need to backport his changes to 
back versions.


--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company
postgres=# \i ~/vacuum_check_logs.sql 
psql:/home/alena/vacuum_check_logs.sql:2: ERROR:  extension "dblink" does not 
exist
CREATE EXTENSION
 dblink_connect 

 OK
(1 row)

psql:/home/alena/vacuum_check_logs.sql:5: ERROR:  table "vestat" does not exist
SET
SET
CREATE TABLE
CREATE INDEX
psql:/home/alena/vacuum_check_logs.sql:15: INFO:  vacuuming 
"postgres.public.vestat"
psql:/home/alena/vacuum_check_logs.sql:15: INFO:  finished vacuuming 
"postgres.public.vestat": index scans: 0
pages: 0 removed, 0 remain, 0 scanned (100.00% of total)
tuples: 0 removed, 0 remain, 0 are dead but not yet removable
removable cutoff: 741, which was 0 XIDs old when operation ended
new relfrozenxid: 741, which is 2 XIDs ahead of previous value
frozen: 0 pages from table (100.00% of total) had 0 tuples frozen
index scan not needed: 0 pages from table (100.00% of total) had 0 dead item 
identifiers removed
I/O timings: read: 0.100 ms, write: 0.000 ms
avg read rate: 29.260 MB/s, avg write rate: 0.000 MB/s
buffer usage: 13 hits, 1 misses, 0 dirtied
WAL usage: 1 records, 0 full page images, 237 bytes
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
VACUUM
 pg_sleep 
--
 
(1 row)

INSERT 0 100
DELETE 10
psql:/home/alena/vacuum_check_logs.sql:24: INFO:  vacuuming 
&

Re: Fix parallel vacuum buffer usage reporting

2024-04-20 Thread Alena Rybakina

Hi, thank you for your work with this subject.

While I was reviewing your code, I noticed that your patch conflicts 
with another patch [0] that been committed.


[0] 
https://www.postgresql.org/message-id/flat/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com


On 28.03.2024 13:12, Anthonin Bonnefoy wrote:

Hi,

Thanks for the review.
On Thu, Mar 28, 2024 at 4:07 AM Masahiko Sawada 
 wrote:


As for the proposed patch, the following part should handle the
temp tables too:


True, I've missed the local blocks. I've updated the patch:
- read_rate and write_rate now include local block usage
- I've added a specific output for reporting local blocks instead of 
combining them in the same output.


Regards,
Anthonin


--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Re: post-freeze damage control

2024-04-11 Thread Alena Rybakina

Hi!

I also worked with this patch and until your explanation I didn’t fully 
understand the reasons why it was wrong to have this implementation when 
removing duplicate OR expressions.


Thank you, now I understand it!

I agree with explanation of Andrei Lepikhov regarding the fact that 
there were difficulties in moving the patch to another place and
the explanation of Alexander Korotkov and Peter Geoghegan regarding the 
need to apply this transformation.


Let me just add that initially this patch tried to solve a problem where 
50,000 OR expressions were generated and
there was a problem running that query using a plan with BitmapScan. I 
wrote more about this here [0].


If the patch can be improved and you can tell me how, because I don’t 
quite understand how to do it yet, to be honest, then I’ll be happy to 
work on it too.


[0] 
https://www.postgresql.org/message-id/052172e4-6d75-8069-3179-26de339dca03%40postgrespro.ru


--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: Parallel Bitmap Heap Scan reports per-worker stats in EXPLAIN ANALYZE

2024-04-08 Thread Alena Rybakina

Hi! Thank you for your work on this issue!

Your patch required a little revision. I did this and attached the patch.

Also, I think you should add some clarification to the comments about 
printing 'exact' and 'loosy' pages in show_hashagg_info function, which 
you get from planstate->stats, whereas previously it was output only 
from planstate. Perhaps it is enough to mention this in the comment to 
the commit.


I mean this place:

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 926d70afaf..02251994c6 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -3467,26 +3467,57 @@ show_hashagg_info(AggState *aggstate, 
ExplainState *es)

 static void
 show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
 {
+    Assert(es->analyze);
+
 if (es->format != EXPLAIN_FORMAT_TEXT)
 {
     ExplainPropertyInteger("Exact Heap Blocks", NULL,
-                               planstate->exact_pages, es);
+                               planstate->stats.exact_pages, es);
     ExplainPropertyInteger("Lossy Heap Blocks", NULL,
-                               planstate->lossy_pages, es);
+                               planstate->stats.lossy_pages, es);
 }
 else
 {
-        if (planstate->exact_pages > 0 || planstate->lossy_pages > 0)
+        if (planstate->stats.exact_pages > 0 || 
planstate->stats.lossy_pages > 0)

     {
         ExplainIndentText(es);
         appendStringInfoString(es->str, "Heap Blocks:");
-            if (planstate->exact_pages > 0)
-                appendStringInfo(es->str, " exact=%ld", 
planstate->exact_pages);

-            if (planstate->lossy_pages > 0)
-                appendStringInfo(es->str, " lossy=%ld", 
planstate->lossy_pages);

+            if (planstate->stats.exact_pages > 0)
+                appendStringInfo(es->str, " exact=%ld", 
planstate->stats.exact_pages);

+            if (planstate->stats.lossy_pages > 0)
+                appendStringInfo(es->str, " lossy=%ld", 
planstate->stats.lossy_pages);

         appendStringInfoChar(es->str, '\n');
     }
 }
+
+    if (planstate->pstate != NULL)
+    {
+        for (int n = 0; n < planstate->sinstrument->num_workers; n++)
+        {
+            BitmapHeapScanInstrumentation *si = 
>sinstrument->sinstrument[n];

+
+            if (si->exact_pages == 0 && si->lossy_pages == 0)
+                continue;
+
+            if (es->workers_state)
+                ExplainOpenWorker(n, es);
+
+            if (es->format == EXPLAIN_FORMAT_TEXT)
+            {
+                ExplainIndentText(es);
+                appendStringInfo(es->str, "Heap Blocks: exact=%ld 
lossy=%ld\n",

+                         si->exact_pages, si->lossy_pages);
+            }
+            else
+            {
+                ExplainPropertyInteger("Exact Heap Blocks", NULL, 
si->exact_pages, es);
+                ExplainPropertyInteger("Lossy Heap Blocks", NULL, 
si->lossy_pages, es);

+            }
+
+            if (es->workers_state)
+                ExplainCloseWorker(n, es);
+        }
+    }
 }

I suggest some code refactoring (diff.diff.no-cfbot file) that allows 
you to improve your code.


--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/src/backend/executor/nodeBitmapHeapscan.c 
b/src/backend/executor/nodeBitmapHeapscan.c
index 973bf56022d..ca8d94ba09a 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -634,8 +634,9 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
 */
if (node->sinstrument != NULL && IsParallelWorker())
{
+   BitmapHeapScanInstrumentation *si;
Assert(ParallelWorkerNumber <= node->sinstrument->num_workers);
-   BitmapHeapScanInstrumentation *si = 
>sinstrument->sinstrument[ParallelWorkerNumber];
+   si = >sinstrument->sinstrument[ParallelWorkerNumber];
si->exact_pages += node->stats.exact_pages;
si->lossy_pages += node->stats.lossy_pages;
}
@@ -864,7 +865,7 @@ ExecBitmapHeapInitializeDSM(BitmapHeapScanState *node,
 
ptr = shm_toc_allocate(pcxt->toc, size);
pstate = (ParallelBitmapHeapState *) ptr;
-   ptr += MAXALIGN(sizeof(ParallelBitmapHeapState));
+   ptr += size;
if (node->ss.ps.instrument && pcxt->nworkers > 0)
sinstrument = (SharedBitmapHeapInstrumentation *) ptr;
 
From 8dc939749a3f0cea9ba337c020b9ccd474e7c8e3 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Mon, 8 Apr 2024 23:41:41 +0300
Subject: [P

Re: Support "Right Semi Join" plan shapes

2024-03-09 Thread Alena Rybakina


On 06.03.2024 05:23, wenhui qiu wrote:



  Hi Alena Rybakina
  For merge join
  + /*
  + * For now we do not support RIGHT_SEMI join in mergejoin.
  + */
  + if (jointype == JOIN_RIGHT_SEMI)
  + {
  + *mergejoin_allowed = false;
  + return NIL;
  + }
  +
  Tanks



Yes, I see it, thank you. Sorry for the noise.

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Re: POC, WIP: OR-clause support for indexes

2024-03-07 Thread Alena Rybakina

Hi!

On 07.03.2024 17:51, Alexander Korotkov wrote:

Hi!

On Tue, Mar 5, 2024 at 9:59 AM Andrei Lepikhov 
 wrote:

> On 5/3/2024 12:30, Andrei Lepikhov wrote:
> > On 4/3/2024 09:26, jian he wrote:
> ... and the new version of the patchset is attached.

I made some revisions for the patchset.
1) Use hash_combine() to combine hash values.
2) Upper limit the number of array elements by MAX_SAOP_ARRAY_SIZE.
3) Better save the original order of clauses by putting hash entries 
and untransformable clauses to the same list.  A lot of differences in 
regression tests output have gone.

Thank you for your changes. I agree with them.


One important issue I found.

# create table t as (select i::int%100 i from generate_series(1,1) i);
# analyze t;
# explain select * from t where i = 1 or i = 1;
                     QUERY PLAN
-
 Seq Scan on t  (cost=0.00..189.00 rows=200 width=4)
   Filter: (i = ANY ('{1,1}'::integer[]))
(2 rows)

# set enable_or_transformation = false;
SET
# explain select * from t where i = 1 or i = 1;
                     QUERY PLAN
-
 Seq Scan on t  (cost=0.00..189.00 rows=100 width=4)
   Filter: (i = 1)
(2 rows)

We don't make array values unique.  That might make query execution 
performance somewhat worse, and also makes selectivity estimation 
worse.  I suggest Andrei and/or Alena should implement making array 
values unique.



I have corrected this and some spelling mistakes. The 
unique_any_elements_change.no-cfbot file contains changes.


While I was correcting the test results caused by such changes, I 
noticed that the same behavior was when converting the IN expression,

and this can be seen in the result of the regression test:

 EXPLAIN (COSTS OFF)
 SELECT unique2 FROM onek2
 WHERE stringu1 IN ('A', 'A') AND (stringu1 = 'A' OR stringu1 = 'A');
    QUERY PLAN
---
  Bitmap Heap Scan on onek2
    Recheck Cond: (stringu1 < 'B'::name)
   Filter: ((stringu1 = ANY ('{A,A}'::name[])) AND (stringu1 = 'A'::name))
    ->  Bitmap Index Scan on onek2_u2_prtl
 (4 rows)

--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index baeb83e58b..e9813ef6ab 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -312,8 +312,8 @@ TransformOrExprToANY(ParseState *pstate, List *args, int 
location)
 
if (unlikely(found))
{
-   entry->consts = lappend(entry->consts, const_expr);
-   entry->exprs = lappend(entry->exprs, orqual);
+   entry->consts = list_append_unique(entry->consts, 
const_expr);
+   entry->exprs = list_append_unique(entry->exprs, orqual);
}
else
{
@@ -352,7 +352,6 @@ TransformOrExprToANY(ParseState *pstate, List *args, int 
location)
entry = (OrClauseGroupEntry *) lfirst(lc);
 
Assert(list_length(entry->consts) > 0);
-   Assert(list_length(entry->exprs) == list_length(entry->consts));
 
if (list_length(entry->consts) == 1 ||
list_length(entry->consts) > MAX_SAOP_ARRAY_SIZE)
diff --git a/src/test/regress/expected/create_index.out 
b/src/test/regress/expected/create_index.out
index 7b721bac71..606a4399f9 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -1915,16 +1915,16 @@ SELECT count(*) FROM tenk1
 EXPLAIN (COSTS OFF)
 SELECT count(*) FROM tenk1
   WHERE hundred = 42 AND (thousand < 42 OR thousand < 99 OR 43 > thousand OR 
42 > thousand);
-QUERY PLAN 
   
---
+  QUERY PLAN   

+---
  Aggregate
->  Bitmap Heap Scan on tenk1
- Recheck Cond: ((hundred = 42) AND (thousand < ANY 
('{42,99,43,42}'::integer[])))
+ Recheck Cond: ((hundred = 42) AND (thousand < ANY 
('{42,99,43}'::integer[])))
  ->  BitmapAnd
->  Bitmap Index Scan on tenk1_hundred
  Index Cond: (hundred = 42)
->  Bitmap Index Scan on tenk1_thous_tenthous
- Index Cond: (thousand < ANY ('{42,99,43,42}'::integer[]))
+ Index Cond: (thousand < ANY ('{42,99,43}'::integer[]))
 (8 rows)
 
 EXPLAIN (COSTS OFF)
diff --git a/src/test/regress/expected/se

Re: Support "Right Semi Join" plan shapes

2024-03-05 Thread Alena Rybakina
To be honest, I didn't see it in the code, could you tell me where they 
are, please?


On 05.03.2024 05:44, Richard Guo wrote:


On Tue, Jan 30, 2024 at 2:51 PM Alena Rybakina 
 wrote:


I have reviewed your patch and I think it is better to add an
Assert for
JOIN_RIGHT_SEMI to the ExecMergeJoin and ExecNestLoop functions to
prevent the use of RIGHT_SEMI for these types of connections
(NestedLoop
and MergeJoin).


Hmm, I don't see why this is necessary.  The planner should already
guarantee that we won't have nestloops/mergejoins with right-semi joins.

Thanks
Richard


--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Re: POC, WIP: OR-clause support for indexes

2024-03-03 Thread Alena Rybakina
Sorry, I found explanation - 
https://www.postgresql.org/message-id/CACJufxFS-xcjaWq2Du2OyJUjRAyqCk12Q_zGOPxv61sgrdpw9w%40mail.gmail.com


On 03.03.2024 12:26, Alena Rybakina wrote:


I found that it was mentioned here - 
https://www.postgresql.org/message-id/CACJufxFrZS07oBHMk1_c8P3A84VZ3ysXiZV8NeU6gAnvu%2BHsVA%40mail.gmail.com.


To be honest, I couldn't find any explanation for that.

On 01.03.2024 18:33, Alexander Korotkov wrote:

Hi, Andrei,
Hi, Alena!

On Thu, Feb 29, 2024 at 10:59 AM Andrei Lepikhov 
 wrote:


On 28/2/2024 17:27, Alena Rybakina wrote:
> Maybe like that:
>
> It also considers the way to generate a path using BitmapScan
indexes,
> converting the transformed expression into expressions
separated by "OR"
> operations, and if it turns out to be the best and finally
selects the
> best one.
Thanks,
I spent some time describing the feature with documentation.
A condensed description of the GUC is in the runtime-config file.
Feature description has spread between TransformOrExprToANY and
generate_saop_pathlist routines.
Also, I've made tiny changes in the code to look more smoothly.
All modifications are integrated into the two new patches.

Feel free to add, change or totally rewrite these changes.


I'm going to review and revise the patch.

One question I have yet.

>        /*
>         * Transformation only works with both side type is not
>         * { array | composite | domain | record }.

Why do we limit transformation for these types?  Also, it doesn't 
seem the current code restricts anything except composite/record.


--
Regards,
Alexander Korotkov

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Re: POC, WIP: OR-clause support for indexes

2024-03-03 Thread Alena Rybakina
I found that it was mentioned here - 
https://www.postgresql.org/message-id/CACJufxFrZS07oBHMk1_c8P3A84VZ3ysXiZV8NeU6gAnvu%2BHsVA%40mail.gmail.com.


To be honest, I couldn't find any explanation for that.

On 01.03.2024 18:33, Alexander Korotkov wrote:

Hi, Andrei,
Hi, Alena!

On Thu, Feb 29, 2024 at 10:59 AM Andrei Lepikhov 
 wrote:


On 28/2/2024 17:27, Alena Rybakina wrote:
> Maybe like that:
>
> It also considers the way to generate a path using BitmapScan
indexes,
> converting the transformed expression into expressions separated
by "OR"
> operations, and if it turns out to be the best and finally
selects the
> best one.
Thanks,
I spent some time describing the feature with documentation.
A condensed description of the GUC is in the runtime-config file.
Feature description has spread between TransformOrExprToANY and
generate_saop_pathlist routines.
Also, I've made tiny changes in the code to look more smoothly.
All modifications are integrated into the two new patches.

Feel free to add, change or totally rewrite these changes.


I'm going to review and revise the patch.

One question I have yet.

>        /*
>         * Transformation only works with both side type is not
>         * { array | composite | domain | record }.

Why do we limit transformation for these types?  Also, it doesn't seem 
the current code restricts anything except composite/record.


--
Regards,
Alexander Korotkov


--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Re: POC, WIP: OR-clause support for indexes

2024-02-28 Thread Alena Rybakina



On 28.02.2024 13:07, jian he wrote:

On Wed, Feb 28, 2024 at 12:19 PM Andrei Lepikhov
 wrote:

On 26/2/2024 11:10, Alena Rybakina wrote:

On 24.02.2024 14:28, jian he wrote:

Hi.
I wrote the first draft patch of the documentation.
it's under the section: Planner Method Configuration
(runtime-config-query.html)
but this feature's main meat is in src/backend/parser/parse_expr.c
so it may be slightly inconsistent, as mentioned by others.

You can further furnish it.

Thank you for your work. I found a few spelling mistakes - I fixed this
and added some information about generating a partial index plan. I
attached it.

Thanks Jian and Alena,
It is a good start for the documentation. But I think the runtime-config
section needs only a condensed description of a feature underlying the
GUC. The explanations in this section look a bit awkward.
Having looked through the documentation for a better place for a
detailed explanation, I found array.sgml as a candidate. Also, we have
the parser's short overview section. I'm unsure about the best place but
it is better when the server config section.

doc/src/sgml/array.sgml corresponds to
https://www.postgresql.org/docs/current/arrays.html.
  this GUC is related to parser|optimzier.
adding a GUC to array.sgml seems strange. (I think).


I suggest describing our feature in array.sgml and mentioning a GUC there.

We can describe a GUC in config.sgml.


2. We should describe the second part of the feature, where the
optimiser can split an array to fit the optimal BitmapOr scan path.

we can add a sentence explaining that:
it may not do the expression transformation when the original
expression can be utilized by index mechanism.
I am not sure how to rephrase it.


Maybe like that:

It also considers the way to generate a path using BitmapScan indexes, 
converting the transformed expression into expressions separated by "OR" 
operations, and if it turns out to be the best and finally selects the 
best one.


--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: POC, WIP: OR-clause support for indexes

2024-02-25 Thread Alena Rybakina

On 24.02.2024 14:28, jian he wrote:

Hi.
I wrote the first draft patch of the documentation.
it's under the section: Planner Method Configuration (runtime-config-query.html)
but this feature's main meat is in src/backend/parser/parse_expr.c
so it may be slightly inconsistent, as mentioned by others.

You can further furnish it.


Thank you for your work. I found a few spelling mistakes - I fixed this 
and added some information about generating a partial index plan. I 
attached it.


--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From e3a0e01c43a70099f6870a468d0cc3a8bdcb2775 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Mon, 26 Feb 2024 06:37:36 +0300
Subject: [PATCH] doc1

---
 doc/src/sgml/config.sgml | 74 
 1 file changed, 74 insertions(+)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 36a2a5ce431..47f82ca2dc9 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5294,6 +5294,80 @@ ANY num_sync ( 
+  enable_or_transformation (boolean)
+   
+enable_or_transformation configuration 
parameter
+   
+  
+  
+   
+Enables or disables in query planner's ability to transform multiple 
expressions in ()
+ to a ANY expression ().
+This transformations only apply under the following condition:
+
+  
+  
+Each expression should be formed as: 
expression operator  
(expression).
+The right-hand side of the operator should be just a plain 
constant.
+The left-hand side of these expressions should remain unchanged.
+  
+  
+
+
+  
+  
+At the stage of index formation, a check is made on the 
restructuring of the plan using partial indexes or the formation of expressions 
combined by the "OR" operator.
+  
+  
+
+
+  
+
+  Each expression form should return Boolean (true/false) 
result.
+
+  
+
+
+  
+
+These expressions are logically linked in a OR condition.
+
+  
+
+  The default is on.
+  
+
+  For example, the following query without setting 
enable_or_transformation is usually applied to the three 
filtering conditions separately,
+  but if we set enable_or_transformation, we 
combine the three expressions into only one expression: unique1 = ANY 
('{1,2,3}'::integer[]) .
+  
+  EXPLAIN SELECT * FROM tenk1 WHERE unique1 = 1 or unique1 = 2 or 
unique1 = 3;
+
+  QUERY PLAN
+  -
+  Seq Scan on tenk1  (cost=0.00..482.50 rows=3 width=244)
+Filter: (unique1 = ANY ('{1,2,3}'::integer[]))
+  
+
+
+  Another example is the following query with a given 
enable_or_transformation value, but we have generated a plan 
with partial indexes.
+  
+  EXPLAIN SELECT unique2, stringu1 FROM onek2 WHERE unique1 = 1 OR 
unique1 = PI()::integer;
+  QUERY PLAN
+  --
+  Bitmap Heap Scan on onek2
+Recheck Cond: ((unique1 = 3) OR (unique1 = 1))
+->  BitmapOr
+  ->  Bitmap Index Scan on onek2_u1_prtl
+Index Cond: (unique1 = 3)
+  ->  Bitmap Index Scan on onek2_u1_prtl
+Index Cond: (unique1 = 1)
+  (7 rows)
+  
+
+  
+ 
+
  
   enable_parallel_append (boolean)
   
-- 
2.34.1



Re: Optimize planner memory consumption for huge arrays

2024-02-22 Thread Alena Rybakina

Hi!

On 20.02.2024 07:41, Andrei Lepikhov wrote:

On 20/2/2024 04:51, Tom Lane wrote:

Tomas Vondra  writes:

On 2/19/24 16:45, Tom Lane wrote:

Tomas Vondra  writes:
For example, I don't think we expect selectivity functions to 
allocate

long-lived objects, right? So maybe we could run them in a dedicated
memory context, and reset it aggressively (after each call).

Here's a quick and probably-incomplete implementation of that idea.
I've not tried to study its effects on memory consumption, just made
sure it passes check-world.
Thanks for the sketch. The trick with the planner_tmp_cxt_depth 
especially looks interesting.
I think we should design small memory contexts - one per scalable 
direction of memory utilization, like selectivity or partitioning 
(appending ?).
My coding experience shows that short-lived GEQO memory context forces 
people to learn on Postgres internals more precisely :).


I think there was a problem in your patch when you freed up the memory 
of a variable in the eqsel_internal function, because we have a case 
where the variable was deleted by reference in the 
eval_const_expressions_mutator function (it is only for T_SubPlan and 
T_AlternativeSubPlan type of nodes.


This query just causes an error in your case:

create table a (id bigint, c1 bigint, primary key(id));
create table b (a_id bigint, b_id bigint, b2 bigint, primary key(a_id, 
b_id));

explain select id
    from a, b
    where id = a_id
  and b2 = (select  min(b2)
    from    b
    where   id = a_id);
drop table a;
drop table b;

We can return a copy of the variable or not release the memory of this 
variable.


I attached two patch: the first one is removing your memory cleanup and 
another one returns the copy of variable.


The author of the corrections is not only me, but also Daniil Anisimov.

--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/src/backend/optimizer/util/clauses.c 
b/src/backend/optimizer/util/clauses.c
index edc25d712e9..ac560b1605e 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -2915,7 +2915,7 @@ eval_const_expressions_mutator(Node *node,
 * XXX should we ereport() here instead?  Probably this 
routine
 * should never be invoked after SubPlan creation.
 */
-   return node;
+   return CopyObject(node);
case T_RelabelType:
{
RelabelType *relabel = (RelabelType *) node;
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index cea777e9d40..e5bad75ec1c 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -281,6 +281,7 @@ eqsel_internal(PG_FUNCTION_ARGS, bool negate)
selec = var_eq_non_const(, operator, collation, other,
 varonleft, 
negate);
 
+   pfree(other);
ReleaseVariableStats(vardata);
 
return selec;
@@ -1961,15 +1962,15 @@ scalararraysel(PlannerInfo *root,
{
List   *args;
Selectivity s2;
-
-   args = list_make2(leftop,
- 
makeConst(nominal_element_type,
-   
-1,
-   
nominal_element_collation,
-   
elmlen,
-   
elem_values[i],
-   
elem_nulls[i],
-   
elmbyval));
+   Const *c = makeConst(nominal_element_type,
+   -1,
+   
nominal_element_collation,
+   elmlen,
+   elem_values[i],
+   elem_nulls[i],
+   elmbyval);
+
+   args = list_make2(leftop, c);
if (is_join_clause)
s2 = 
DatumGetFloat8(FunctionCall5Coll(,

  clause->inputcollid,
@@ -1985,7 +1986,8 @@ scalararraysel(PlannerInfo *r

Running the fdw test from the terminal crashes into the core-dump

2024-02-18 Thread Alena Rybakina

Hi, hackers!

After starting the server (initdb + pg_ctl start) I ran a regress test 
create_misc.sql ('\i src/test/regress/sql/create_misc.sql') and, after 
that,
I ran the fdw test ('\i contrib/postgres_fdw/sql/postgres_fdw.sql') in 
the psql, and it failed in the core-dump due to the worked assert.


To be honest, such a failure occurred only if the fdw extension was not 
installed earlier.



script to reproduce the error:

./configure CFLAGS='-g -ggdb -O0' --enable-debug --enable-cassert 
--prefix=`pwd`/tmp_install && make -j8 -s install


export CDIR=$(pwd)
export PGDATA=$CDIR/postgres_data
rm -r $PGDATA
mkdir $PGDATA
${CDIR}/tmp_install/bin/initdb -D $PGDATA >> log.txt
${CDIR}/tmp_install/bin/pg_ctl -D $PGDATA -l logfile start

${CDIR}/tmp_install/bin/psql -d postgres -f 
src/test/regress/sql/create_misc.sql &&
${CDIR}/tmp_install/bin/psql -d postgres -f 
contrib/postgres_fdw/sql/postgres_fdw.sql



The query, where the problem is:

-- MERGE ought to fail cleanly
merge into itrtest using (select 1, 'foo') as source on (true)
  when matched then do nothing;

Coredump:

#5  0x555d1451f483 in ExceptionalCondition 
(conditionName=0x555d146bba13 "requiredPerms != 0", 
fileName=0x555d146bb7b0 "execMain.c",

    lineNumber=654) at assert.c:66
#6  0x555d14064ebe in ExecCheckOneRelPerms (perminfo=0x555d1565ef90) 
at execMain.c:654
#7  0x555d14064d94 in ExecCheckPermissions 
(rangeTable=0x555d1565eef0, rteperminfos=0x555d1565efe0, 
ereport_on_violation=true) at execMain.c:623
#8  0x555d140652ca in InitPlan (queryDesc=0x555d156cde50, eflags=0) 
at execMain.c:850
#9  0x555d140644a8 in standard_ExecutorStart 
(queryDesc=0x555d156cde50, eflags=0) at execMain.c:266
#10 0x555d140641ec in ExecutorStart (queryDesc=0x555d156cde50, 
eflags=0) at execMain.c:145

#11 0x555d1432c025 in ProcessQuery (plan=0x555d1565f3e0,
    sourceText=0x555d1551b020 "merge into itrtest using (select 1, 
'foo') as source on (true)\n  when matched then do nothing;", params=0x0,

    queryEnv=0x0, dest=0x555d1565f540, qc=0x7fffc9454080) at pquery.c:155
#12 0x555d1432dbd8 in PortalRunMulti (portal=0x555d15597ef0, 
isTopLevel=true, setHoldSnapshot=false, dest=0x555d1565f540, 
altdest=0x555d1565f540,

    qc=0x7fffc9454080) at pquery.c:1277
#13 0x555d1432d0cf in PortalRun (portal=0x555d15597ef0, 
count=9223372036854775807, isTopLevel=true, run_once=true, 
dest=0x555d1565f540,

    altdest=0x555d1565f540, qc=0x7fffc9454080) at pquery.c:791
#14 0x555d14325ec8 in exec_simple_query (
--Type  for more, q to quit, c to continue without paging--
    query_string=0x555d1551b020 "merge into itrtest using (select 1, 
'foo') as source on (true)\n  when matched then do nothing;") at 
postgres.c:1273
#15 0x555d1432ae4c in PostgresMain (dbname=0x555d1ab0 "aaa", 
username=0x555d1a98 "alena") at postgres.c:4645
#16 0x555d14244a5d in BackendRun (port=0x555d1554b3b0) at 
postmaster.c:4440
#17 0x555d14244072 in BackendStartup (port=0x555d1554b3b0) at 
postmaster.c:4116

#18 0x555d142405c5 in ServerLoop () at postmaster.c:1768
#19 0x555d1423fec2 in PostmasterMain (argc=3, argv=0x555d1547fcf0) 
at postmaster.c:1467

#20 0x555d140f3122 in main (argc=3, argv=0x555d1547fcf0) at main.c:198

This error is consistently reproduced.

To be honest, I wasn't able to fully figure out the reason for this, but 
it seems that this operation on partitions should not be available at all?


alena@postgres=# SELECT relname, relkind FROM pg_class where 
relname='itrtest';

 relname | relkind
-+-
 itrtest | p
(1 row)

--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: [PoC] Reducing planning time when tables have many partitions

2024-02-12 Thread Alena Rybakina

Hi! Sorry my delayed reply too.

On 17.01.2024 12:33, Yuya Watari wrote:

Hello Alena,

Thank you for your quick response, and I'm sorry for my delayed reply.

On Sun, Dec 17, 2023 at 12:41 AM Alena Rybakina
 wrote:

I thought about this earlier and was worried that the index links of the 
equivalence classes might not be referenced correctly for outer joins,
so I decided to just overwrite them and reset the previous ones.

Thank you for pointing this out. I have investigated this problem and
found a potential bug place. The code quoted below modifies
RestrictInfo's clause_relids. Here, our indexes, namely
eclass_source_indexes and eclass_derive_indexes, are based on
clause_relids, so they should be adjusted after the modification.
However, my patch didn't do that, so it may have missed some
references. The same problem occurs in places other than the quoted
one.

=
/*
  * Walker function for replace_varno()
  */
static bool
replace_varno_walker(Node *node, ReplaceVarnoContext *ctx)
{
 ...
 else if (IsA(node, RestrictInfo))
 {
 RestrictInfo *rinfo = (RestrictInfo *) node;
 ...

 if (bms_is_member(ctx->from, rinfo->clause_relids))
 {
 replace_varno((Node *) rinfo->clause, ctx->from, ctx->to);
 replace_varno((Node *) rinfo->orclause, ctx->from, ctx->to);
 rinfo->clause_relids = replace_relid(rinfo->clause_relids,
ctx->from, ctx->to);
 ...
 }
 ...
 }
 ...
}
=

I have attached a new version of the patch, v23, to fix this problem.
v23-0006 adds a helper function called update_clause_relids(). This
function modifies RestrictInfo->clause_relids while adjusting its
related indexes. I have also attached a sanity check patch
(sanity-check.txt) to this email. This sanity check patch verifies
that there are no missing references between RestrictInfos and our
indexes. I don't intend to commit this patch, but it helps find
potential bugs. v23 passes this sanity check, but the v21 you
submitted before does not. This means that the adjustment by
update_clause_relids() is needed to prevent missing references after
modifying clause_relids. I'd appreciate your letting me know if v23
doesn't solve your concern.

One of the things I don't think is good about my approach is that it
adds some complexity to the code. In my approach, all modifications to
clause_relids must be done through the update_clause_relids()
function, but enforcing this rule is not so easy. In this sense, my
patch may need to be simplified more.


this is due to the fact that I explained before: we zeroed the values indicated 
by the indexes,
then this check is not correct either - since the zeroed value indicated by the 
index is correct.
That's why I removed this check.

Thank you for letting me know. I fixed this in v23-0005 to adjust the
indexes in update_eclasses(). With this change, the assertion check
will be correct.

Yes, it is working correctly now with the assertion check. I suppose 
it's better to add this code with an additional comment and a 
recommendation for other developers
to use it for checking in case of manipulations with the list of 
equivalences.


--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: POC, WIP: OR-clause support for indexes

2024-02-12 Thread Alena Rybakina

On 12.02.2024 12:01, Andrei Lepikhov wrote:

On 12/2/2024 15:55, Alena Rybakina wrote:

Hi!

I can't unnderstand this part of code:

/* Time to generate index paths */

MemSet(, 0, sizeof(clauseset));
match_clauses_to_index(root, list_make1(rinfo1), index, );

As I understand it, match_clauses_to_index is necessary if you have a 
RestrictInfo (rinfo1) variable, so maybe we should run it after the 
make_restrictonfo procedure, otherwise calling it, I think, is useless.
I think you must explain your note in more detail. Before this call, 
we already called make_restrictinfo() and built rinfo1, haven't we?


I got it, I think, I was confused by the “else“ block when we can 
process the index, otherwise we move on to the next element.


I think maybe “else“ block of creating restrictinfo with the indexpaths 
list creation should be moved to a separate function or just remove "else"?



I think we need to check that rinfo->clause is not empty, because if it 
is we can miss calling build_paths_for_OR function. We should add it there:


restriction_is_saop_clause(RestrictInfo *restrictinfo)
{
    if (IsA(restrictinfo->clause, ScalarArrayOpExpr))
...


By the way, I think we need to add a check that the clauseset is not 
empty (if (!clauseset.nonempty)) otherwise we could get an error. The 
same check I noticed in build_paths_for_OR function.


--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: POC, WIP: OR-clause support for indexes

2024-02-12 Thread Alena Rybakina

Hi!

I can't unnderstand this part of code:

/* Time to generate index paths */

MemSet(, 0, sizeof(clauseset));
match_clauses_to_index(root, list_make1(rinfo1), index, );

As I understand it, match_clauses_to_index is necessary if you have a 
RestrictInfo (rinfo1) variable, so maybe we should run it after the 
make_restrictonfo procedure, otherwise calling it, I think, is useless.


--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Re: POC, WIP: OR-clause support for indexes

2024-02-02 Thread Alena Rybakina


On 01.02.2024 08:00, jian he wrote:

On Wed, Jan 31, 2024 at 7:10 PM Alena Rybakina
  wrote:

Hi, thank you for your review and interest in this subject.

On 31.01.2024 13:15, jian he wrote:

On Wed, Jan 31, 2024 at 10:55 AM jian he  wrote:

based on my understanding of
https://www.postgresql.org/docs/current/xoper-optimization.html#XOPER-COMMUTATOR
I think you need move commutator check right after the `if
(get_op_rettype(opno) != BOOLOID)` branch

I was wrong about this part. sorry for the noise.


I have made some changes (attachment).
* if the operator expression left or right side type category is
{array | domain | composite}, then don't do the transformation.
(i am not 10% sure with composite)

To be honest, I'm not sure about this check, because we check the type of 
variable there:

if (!IsA(orqual, OpExpr))
 {
 or_list = lappend(or_list, orqual);
 continue;
 }
And below:
if (IsA(leftop, Const))
 {
 opno = get_commutator(opno);

 if (!OidIsValid(opno))
 {
 /* Commuter doesn't exist, we can't reverse the order */
 or_list = lappend(or_list, orqual);
 continue;
 }

 nconst_expr = get_rightop(orqual);
 const_expr = get_leftop(orqual);
 }
 else if (IsA(rightop, Const))
 {
 const_expr = get_rightop(orqual);
 nconst_expr = get_leftop(orqual);
 }
 else
 {
 or_list = lappend(or_list, orqual);
 continue;
 }

Isn't that enough?

alter table tenk1 add column arr int[];
set enable_or_transformation to on;
EXPLAIN (COSTS OFF)
SELECT count(*) FROM tenk1
WHERE arr = '{1,2,3}' or arr = '{1,2}';

the above query will not do the OR transformation. because array type
doesn't have array type.
`
scalar_type = entry->key.exprtype;
if (scalar_type != RECORDOID && OidIsValid(scalar_type))
array_type = get_array_type(scalar_type);
else
array_type = InvalidOid;
`

If either side of the operator expression is array or array related type,
we can be sure it cannot do the transformation
(get_array_type will return InvalidOid for anyarray type).
we can check it earlier, so hash related code will not be invoked for
array related types.

Agree.

Besides, some of examples (with ARRAY) works fine:

postgres=# CREATE TABLE sal_emp (
 pay_by_quarter  integer[],
 pay_by_quater1 integer[]
);
CREATE TABLE
postgres=# INSERT INTO sal_emp
 VALUES (
 '{1, 1, 1, 1}',
 '{1,2,3,4}');
INSERT 0 1
postgres=# select * from sal_emp where pay_by_quarter[1] = 1 or 
pay_by_quarter[1]=2;
   pay_by_quarter   | pay_by_quater1
---+
  {1,1,1,1} | {1,2,3,4}
(1 row)

postgres=# explain select * from sal_emp where pay_by_quarter[1] = 1 or 
pay_by_quarter[1]=2;
   QUERY PLAN
--
  Seq Scan on sal_emp  (cost=0.00..21.00 rows=9 width=64)
Filter: (pay_by_quarter[1] = ANY ('{1,2}'::integer[]))
(2 rows)

* if the left side of the operator expression node contains volatile
functions, then don't do the transformation.

I'm also not sure about the volatility check function, because we perform such 
a conversion at the parsing stage, and at this stage we don't have a RelOptInfo 
variable and especially a RestictInfo such as PathTarget.


  see the example in here:
https://www.postgresql.org/message-id/CACJufxGXhJ823cdAdp2Ho7qC-HZ3_-dtdj-myaAi_u9RQLn45g%40mail.gmail.com

set enable_or_transformation to on;
create or replace function retint(int) returns int as
$func$
begin raise notice 'hello';
 return $1 + round(10 * random()); end
$func$ LANGUAGE plpgsql;

SELECT count(*) FROM tenk1 WHERE thousand = 42;
will return 10 rows.

SELECT count(*) FROM tenk1 WHERE thousand = 42 AND (retint(1) = 4 OR
retint(1) = 3);
this query I should return 20 notices 'hello', but now only 10.

EXPLAIN (COSTS OFF)
SELECT count(*) FROM tenk1
WHERE thousand = 42 AND (retint(1) = 4 OR  retint(1) = 3);
   QUERY PLAN
--
  Aggregate
->  Seq Scan on tenk1
  Filter: ((thousand = 42) AND (retint(1) = ANY ('{4,3}'::integer[])))
(3 rows)


Agree.

I added your code to the patch.

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company
From 0fcad72153c9eaaf436e5b417c803a70fbeaf8ac Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Fri, 2 Feb 2024 22:01:09 +0300
Subject: [PATCH] Transform OR clause to ANY expressions.

Replace (expr op C1) OR (expr op C2) ... with expr op ANY(C1, C2, ...) on the
preliminary stage of optimization when we are still working with the tree
expression.
Here C is a constant expression, 'expr' is non-constant e

Re: POC, WIP: OR-clause support for indexes

2024-01-31 Thread Alena Rybakina

Hi, thank you for your review and interest in this subject.

On 31.01.2024 13:15, jian he wrote:

On Wed, Jan 31, 2024 at 10:55 AM jian he  wrote:

based on my understanding of
https://www.postgresql.org/docs/current/xoper-optimization.html#XOPER-COMMUTATOR
I think you need move commutator check right after the `if
(get_op_rettype(opno) != BOOLOID)` branch


I was wrong about this part. sorry for the noise.


I have made some changes (attachment).
* if the operator expression left or right side type category is
{array | domain | composite}, then don't do the transformation.
(i am not 10% sure with composite)


To be honest, I'm not sure about this check, because we check the type 
of variable there:


if (!IsA(orqual, OpExpr))
    {
    or_list = lappend(or_list, orqual);
    continue;
    }
And below:
if (IsA(leftop, Const))
    {
        opno = get_commutator(opno);

        if (!OidIsValid(opno))
        {
            /* Commuter doesn't exist, we can't reverse the order */
            or_list = lappend(or_list, orqual);
            continue;
        }

        nconst_expr = get_rightop(orqual);
        const_expr = get_leftop(orqual);
    }
    else if (IsA(rightop, Const))
    {
        const_expr = get_rightop(orqual);
        nconst_expr = get_leftop(orqual);
    }
    else
    {
        or_list = lappend(or_list, orqual);
        continue;
    }

Isn't that enough?

Besides, some of examples (with ARRAY) works fine:

postgres=# CREATE TABLE sal_emp (
    pay_by_quarter  integer[],
    pay_by_quater1 integer[]
);
CREATE TABLE
postgres=# INSERT INTO sal_emp
    VALUES (
    '{1, 1, 1, 1}',
    '{1,2,3,4}');
INSERT 0 1
postgres=# select * from sal_emp where pay_by_quarter[1] = 1 or 
pay_by_quarter[1]=2;

  pay_by_quarter   | pay_by_quater1
---+
 {1,1,1,1} | {1,2,3,4}
(1 row)

postgres=# explain select * from sal_emp where pay_by_quarter[1] = 1 
or pay_by_quarter[1]=2;

  QUERY PLAN
--
 Seq Scan on sal_emp  (cost=0.00..21.00 rows=9 width=64)
   Filter: (pay_by_quarter[1] = ANY ('{1,2}'::integer[]))
(2 rows)


* if the left side of the operator expression node contains volatile
functions, then don't do the transformation.


I'm also not sure about the volatility check function, because we 
perform such a conversion at the parsing stage, and at this stage we 
don't have a RelOptInfo variable and especially a RestictInfo such as 
PathTarget.


Speaking of NextValueExpr, I couldn't find any examples where the 
current patch wouldn't work. I wrote one of them below:


postgres=# create table foo (f1 int, f2 int generated always as identity);
CREATE TABLE
postgres=# insert into foo values(1);
INSERT 0 1

postgres=# explain verbose update foo set f1 = 2 where f1=1 or f1=2 ;
    QUERY PLAN
---
 Update on public.foo  (cost=0.00..38.25 rows=0 width=0)
   ->  Seq Scan on public.foo  (cost=0.00..38.25 rows=23 width=10)
 Output: 2, ctid
 Filter: (foo.f1 = ANY ('{1,2}'::integer[]))
(4 rows)

Maybe I missed something. Do you have any examples?


* some other minor  cosmetic changes.

Thank you, I agree with them.

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Re: Support "Right Semi Join" plan shapes

2024-01-29 Thread Alena Rybakina

Hi! Thank you for your work on this subject.

I have reviewed your patch and I think it is better to add an Assert for 
JOIN_RIGHT_SEMI to the ExecMergeJoin and ExecNestLoop functions to 
prevent the use of RIGHT_SEMI for these types of connections (NestedLoop 
and MergeJoin).
Mostly I'm suggesting this because of the set_join_pathlist_hook 
function, which is in the add_paths_to_joinrel function, which allows 
you to create a custom node. What do you think?


--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: Evaluate arguments of correlated SubPlans in the referencing ExprState

2024-01-28 Thread Alena Rybakina

On 26.01.2024 05:37, vignesh C wrote:

On Tue, 24 Oct 2023 at 01:47, Alena Rybakina  wrote:

Hi!

I looked through your patch and noticed that it was not applied to the current 
version of the master. I rebased it and attached a version. I didn't see any 
problems and, honestly, no big changes were needed, all regression tests were 
passed.

I think it's better to add a test, but to be honest, I haven't been able to 
come up with something yet.

The patch does not apply anymore as in CFBot at [1]:

=== Applying patches on top of PostgreSQL commit ID
7014c9a4bba2d1b67d60687afb5b2091c1d07f73 ===
=== applying patch
./v2-0001-WIP-Evaluate-arguments-of-correlated-SubPlans-in-the.patch

patching file src/include/executor/execExpr.h
Hunk #1 succeeded at 160 (offset 1 line).
Hunk #2 succeeded at 382 (offset 2 lines).
Hunk #3 FAILED at 778.
1 out of 3 hunks FAILED -- saving rejects to file
src/include/executor/execExpr.h.rej
patching file src/include/nodes/execnodes.h
Hunk #1 succeeded at 959 (offset 7 lines).

Please have a look and post an updated version.

[1] - http://cfbot.cputube.org/patch_46_4209.log

Regards,
Vignesh


Thank you!

I fixed it. The code remains the same.

--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From bf40b14c0cb63f47280299fd3f76a1711db6aada Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Sun, 28 Jan 2024 11:58:44 +0300
Subject: [PATCH] WIP: Evaluate arguments of correlated SubPlans in the
 referencing ExprState.

---
 src/backend/executor/execExpr.c   | 93 +--
 src/backend/executor/execExprInterp.c | 22 +++
 src/backend/executor/execProcnode.c   |  5 ++
 src/backend/executor/nodeSubplan.c| 30 -
 src/backend/jit/llvm/llvmjit_expr.c   |  6 ++
 src/backend/jit/llvm/llvmjit_types.c  |  1 +
 src/include/executor/execExpr.h   |  6 +-
 src/include/nodes/execnodes.h |  1 -
 8 files changed, 112 insertions(+), 52 deletions(-)

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 3181b1136a2..d2e539e7b28 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -88,6 +88,9 @@ static void ExecBuildAggTransCall(ExprState *state, AggState *aggstate,
   FunctionCallInfo fcinfo, AggStatePerTrans pertrans,
   int transno, int setno, int setoff, bool ishash,
   bool nullcheck);
+static void ExecInitSubPlanExpr(SubPlan *subplan,
+ExprState *state,
+Datum *resv, bool *resnull);
 
 
 /*
@@ -1386,7 +1389,6 @@ ExecInitExprRec(Expr *node, ExprState *state,
 		case T_SubPlan:
 			{
 SubPlan*subplan = (SubPlan *) node;
-SubPlanState *sstate;
 
 /*
  * Real execution of a MULTIEXPR SubPlan has already been
@@ -1403,19 +1405,7 @@ ExecInitExprRec(Expr *node, ExprState *state,
 	break;
 }
 
-if (!state->parent)
-	elog(ERROR, "SubPlan found with no parent plan");
-
-sstate = ExecInitSubPlan(subplan, state->parent);
-
-/* add SubPlanState nodes to state->parent->subPlan */
-state->parent->subPlan = lappend(state->parent->subPlan,
- sstate);
-
-scratch.opcode = EEOP_SUBPLAN;
-scratch.d.subplan.sstate = sstate;
-
-ExprEvalPushStep(state, );
+ExecInitSubPlanExpr(subplan, state, resv, resnull);
 break;
 			}
 
@@ -2752,29 +2742,12 @@ ExecPushExprSetupSteps(ExprState *state, ExprSetupInfo *info)
 	foreach(lc, info->multiexpr_subplans)
 	{
 		SubPlan*subplan = (SubPlan *) lfirst(lc);
-		SubPlanState *sstate;
 
 		Assert(subplan->subLinkType == MULTIEXPR_SUBLINK);
 
-		/* This should match what ExecInitExprRec does for other SubPlans: */
-
-		if (!state->parent)
-			elog(ERROR, "SubPlan found with no parent plan");
-
-		sstate = ExecInitSubPlan(subplan, state->parent);
-
-		/* add SubPlanState nodes to state->parent->subPlan */
-		state->parent->subPlan = lappend(state->parent->subPlan,
-		 sstate);
-
-		scratch.opcode = EEOP_SUBPLAN;
-		scratch.d.subplan.sstate = sstate;
-
 		/* The result can be ignored, but we better put it somewhere */
-		scratch.resvalue = >resvalue;
-		scratch.resnull = >resnull;
-
-		ExprEvalPushStep(state, );
+		ExecInitSubPlanExpr(subplan, state,
+			>resvalue, >resnull);
 	}
 }
 
@@ -4181,3 +4154,57 @@ ExecBuildParamSetEqual(TupleDesc desc,
 
 	return state;
 }
+
+static void
+ExecInitSubPlanExpr(SubPlan *subplan,
+	ExprState *state,
+	Datum *resv, bool *resnull)
+{
+	ExprEvalStep scratch = {0};
+	SubPlanState *sstate;
+	ListCell   *pvar;
+	ListCell   *l;
+
+	if (!state->parent)
+		elog(ERROR, "SubPlan found with no parent plan");
+
+	/*
+	 * Generate steps to evaluate input arguments for the subplan.
+	 *
+	 * We evaluate the argument expressions into ExprState's resvalue/resnull,
+	 * and then use PARAM_SET to update the parameter. We do that, instead of
+	 * evaluating dire

Re: POC: GROUP BY optimization

2024-01-15 Thread Alena Rybakina

On 15.01.2024 12:46, Andrei Lepikhov wrote:

On 15/1/2024 13:42, Richard Guo wrote:


On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov 
mailto:aekorot...@gmail.com>> wrote:


    Thank you for providing the test case relevant for this code change.
    The revised patch incorporating this change is attached. Now the
    patchset looks good to me.  I'm going to push it if there are no
    objections.


Seems I'm late for the party.  Can we hold for several more days?  I'd
like to have a review on this patch.
Get on board! It looks like this feature needs as much review as 
possible (likewise SJE).


Hi! Thank you for your work on this issue! I believe that this will help 
the scheduler to make a more optimal query plan here and therefore speed 
up their execution.


I have reviewed patches and noticed that we can add some code 
refactoring. I have attached a diff file (group_by.diff) to this email.


The changes involve spelling corrections, renaming variables and porting 
some common parts.



In addition, I have a few questions, since some points in the code 
remained unclear to me.


1.  I didn't understand why we have a question in the comment next to 
the enable_group_by_reordering variable in 
src/backend/optimizer/path/pathkeys.c file, I assumed it was spelling 
and fixed it in the diff file.


2. Why do we set the variable (path = path_save) here 
(add_paths_to_grouping_rel function) if we change its variable below and 
we can pass path_save as a parameter?


foreach(lc2, pathkey_orderings)
{
    PathKeyInfo *info = (PathKeyInfo *) lfirst(lc2);

    /* restore the path (we replace it in the loop) */
    path = path_save;

    path = make_ordered_path(root,
                         grouped_rel,
                         path,
                         cheapest_path,
                         info->pathkeys);
    if (path == NULL)
    continue;

--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 5aac6d66776..8be58fa2b0e 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -29,7 +29,7 @@
 #include "partitioning/partbounds.h"
 #include "utils/lsyscache.h"
 
-/* Consider reordering of GROUP BY keys? */
+/* Consider reordering of GROUP BY keys */
 bool		enable_group_by_reordering = true;
 
 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
@@ -362,7 +362,7 @@ pathkeys_contained_in(List *keys1, List *keys2)
  *
  * Returns the number of GROUP BY keys with a matching pathkey.
  */
-static int
+static PathKeyInfo *
 group_keys_reorder_by_pathkeys(List *pathkeys, List **group_pathkeys,
 			   List **group_clauses,
 			   int num_groupby_pathkeys)
@@ -421,7 +421,16 @@ group_keys_reorder_by_pathkeys(List *pathkeys, List **group_pathkeys,
 	*group_clauses = list_concat_unique_ptr(new_group_clauses,
 			*group_clauses);
 
-	return n;
+	if (n > 0 &&
+		(enable_incremental_sort || n == list_length(*group_pathkeys)))
+	{
+		PathKeyInfo *info = makeNode(PathKeyInfo);
+		info->pathkeys = *group_pathkeys;
+		info->clauses = *group_clauses;
+		return info;
+	}
+
+	return NULL;
 }
 
 /*
@@ -436,7 +445,7 @@ group_keys_reorder_by_pathkeys(List *pathkeys, List **group_pathkeys,
  *
  * - the original ordering, as specified by the GROUP BY clause,
  * - GROUP BY keys reordered to match 'path' ordering (as much as possible),
- * - GROUP BY keys to match target ORDER BY clause (as much as possible).
+ * - GROUP BY keys should match the target ORDER BY clause (as much as possible).
  */
 List *
 get_useful_group_keys_orderings(PlannerInfo *root, Path *path)
@@ -475,20 +484,11 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path)
 	 */
 	if (path->pathkeys)
 	{
-		int			n;
-
-		n = group_keys_reorder_by_pathkeys(path->pathkeys, , ,
+		info = group_keys_reorder_by_pathkeys(path->pathkeys, , ,
 		   root->num_groupby_pathkeys);
 
-		if (n > 0 &&
-			(enable_incremental_sort || n == list_length(path->pathkeys)))
-		{
-			info = makeNode(PathKeyInfo);
-			info->pathkeys = pathkeys;
-			info->clauses = clauses;
-
+		if (info)
 			infos = lappend(infos, info);
-		}
 	}
 
 	/*
@@ -497,21 +497,12 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path)
 	 */
 	if (root->sort_pathkeys)
 	{
-		int			n;
-
-		n = group_keys_reorder_by_pathkeys(root->sort_pathkeys, ,
+		info = group_keys_reorder_by_pathkeys(root->sort_pathkeys, ,
 		   ,
 		   root->num_groupby_pathkeys);
 
-		if (n > 0 &&
-			(enable_incremental_sort || n == list_length(path->pathkeys)))
-		{
-			info = makeNode(PathKeyInfo);
-			info->pathkeys = pathkeys;
-			info->clauses = clauses;
-
+		if (info)
 			infos = lappend(infos, info);
-		}
 	}
 
 	return infos;
@@ -2163

Re: [PoC] Reducing planning time when tables have many partitions

2023-12-16 Thread Alena Rybakina
rte = root->simple_rte_array[i];
+   RelOptInfo *rel = root->simple_rel_array[i];

-   rte->eclass_source_indexes = bms_add_member(rte->eclass_source_indexes,
+   /*
+* If the corresponding RelOptInfo does not exist, we create a 'dummy'
+* RelOptInfo for storing EquivalenceClass indexes.
+*/
+   if (rel == NULL)
+   {
+   rel = root->simple_rel_array[i] = makeNode(RelOptInfo);
+   rel->eclass_source_indexes = NULL;
+   rel->eclass_derive_indexes = NULL;
+   }
+   rel->eclass_source_indexes = bms_add_member(rel->eclass_source_indexes,
 source_idx);
 }
=

At this point, I'm not sure if this approach is correct. It seems to
pass the regression tests, but we should doubt its correctness. I will
continue to experiment with this idea.

[1]https://www.postgresql.org/message-id/CAPpHfdseB13zJJPZuBORevRnZ0vcFyUaaJeSGfAysX7S5er%2BEQ%40mail.gmail.com

Yes, I also thought in this direction before and I agree that this is 
the best way to develop the patch.


--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-12-14 Thread Alena Rybakina

On 12.12.2023 16:04, jian he wrote:

On Mon, Dec 11, 2023 at 10:05 PM Alena Rybakina
  wrote:

Hi! Thank you for your work. Your patch looks better!
Yes, thank you! It works fine, and I see that the regression tests have been 
passed. 
However, when I ran 'copy from with save_error' operation with simple csv files 
(copy_test.csv, copy_test1.csv) for tables test, test1 (how I created it, I 
described below):

postgres=# create table test (x int primary key, y int not null);
postgres=# create table test1 (x int, z int, CONSTRAINT fk_x
   FOREIGN KEY(x)
   REFERENCES test(x));

I did not find a table with saved errors after operation, although I received a 
log about it:

postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV 
save_error
NOTICE:  2 rows were skipped because of error. skipped row saved to table 
public.test_error
ERROR:  duplicate key value violates unique constraint "test_pkey"
DETAIL:  Key (x)=(2) already exists.
CONTEXT:  COPY test, line 3

postgres=# select * from public.test_error;
ERROR:  relation "public.test_error" does not exist
LINE 1: select * from public.test_error;

postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ',' CSV 
save_error
NOTICE:  2 rows were skipped because of error. skipped row saved to table 
public.test1_error
ERROR:  insert or update on table "test1" violates foreign key constraint "fk_x"
DETAIL:  Key (x)=(2) is not present in table "test".

postgres=# select * from public.test1_error;
ERROR:  relation "public.test1_error" does not exist
LINE 1: select * from public.test1_error;

Two lines were written correctly in the csv files, therefore they should have 
been added to the tables, but they were not added to the tables test and test1.

If I leave only the correct rows, everything works fine and the rows are added 
to the tables.

in copy_test.csv:

2,0

1,1

in copy_test1.csv:

2,0

2,1

1,1

postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV
COPY 2
postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ',' CSV 
save_error
NOTICE:  No error happened.Error holding table public.test1_error will be droped
COPY 3

Maybe I'm launching it the wrong way. If so, let me know about it.

looks like the above is about constraints violation while copying.
constraints violation while copying not in the scope of this patch.

Since COPY FROM is very like the INSERT command,
you do want all the valid constraints to check all the copied rows?

No, I think it will be too much.

but the notice raised by the patch is not right.
So I place the drop error saving table or raise notice logic above
`ExecResetTupleTable(estate->es_tupleTable, false)` in the function
CopyFrom.

Yes, I see it and agree with you.

I also notice interesting behavior if the table was previously created by the 
user. When I was creating an error_table before the 'copy from' operation,
I received a message saying that it is impossible to create a table with the 
same name (it is shown below) during the 'copy from' operation.
I think you should add information about this in the documentation, since this 
seems to be normal behavior to me.


doc changed. you may check it.

Yes, I saw it. Thank you.

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-12-11 Thread Alena Rybakina

Hi! Thank you for your work. Your patch looks better!

On 10.12.2023 13:32, jian he wrote:

On Fri, Dec 8, 2023 at 3:09 PM Alena Rybakina  wrote:

Thank you for your work. Unfortunately, your code contained errors during the 
make installation:

'SAVEPOINT' after 'SAVE_ERROR' in unreserved_keyword list is misplaced
'SAVEPOINT' after 'SAVE_ERROR' in bare_label_keyword list is misplaced
make[2]: *** [../../../src/Makefile.global:783: gram.c] Error 1
make[1]: *** [Makefile:131: parser/gram.h] Error 2
make[1]: *** Waiting for unfinished jobs
make: *** [src/Makefile.global:383: submake-generated-headers] Error 2

I have ubuntu 22.04 operation system.

On 06.12.2023 13:47, jian he wrote:

On Tue, Dec 5, 2023 at 6:07 PM Alena Rybakina  wrote:

Hi!

Thank you for your contribution to this thread.


I reviewed it and have a few questions.

1. I have seen that you delete a table before creating it, to which you want to add 
errors due to a failed "copy from" operation. I think this is wrong because 
this table can save useful data for the user.
At a minimum, we should warn the user about this, but I think we can just add 
some number at the end of the name, such as name_table1, name_table_2.

Sorry. I don't understand this part.
Currently, if the error table name already exists, then the copy will
fail, an error will be reported.
I try to first create a table, if no error then the error table will be dropped.

To be honest, first of all, I misunderstood this part of the code. Now I see 
that it works the way you mentioned.

However, I didn't see if you dealt with cases where we already had a table with 
the same name as the table error.
I mean, when is he trying to create for the first time, or will we never be 
able to face such a problem?

Can you demo the expected behavior?

Unfortunately, I was unable to launch it due to a build issue.


Hopefully attached will work.


Yes, thank you! It works fine, and I see that the regression tests have 
been passed. 



However, when I ran 'copy from with save_error' operation with simple 
csv files (copy_test.csv, copy_test1.csv) for tables test, test1 (how I 
created it, I described below):


postgres=# create table test (x int primary key, y int not null);
postgres=# create table test1 (x int, z int, CONSTRAINT fk_x
  FOREIGN KEY(x)
  REFERENCES test(x));

I did not find a table with saved errors after operation, although I 
received a log about it:


postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV 
save_error
NOTICE:  2 rows were skipped because of error. skipped row saved to 
table public.test_error

ERROR:  duplicate key value violates unique constraint "test_pkey"
DETAIL:  Key (x)=(2) already exists.
CONTEXT:  COPY test, line 3

postgres=# select * from public.test_error;
ERROR:  relation "public.test_error" does not exist
LINE 1: select * from public.test_error;

postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ',' 
CSV save_error
NOTICE:  2 rows were skipped because of error. skipped row saved to 
table public.test1_error
ERROR:  insert or update on table "test1" violates foreign key 
constraint "fk_x"

DETAIL:  Key (x)=(2) is not present in table "test".

postgres=# select * from public.test1_error;
ERROR:  relation "public.test1_error" does not exist
LINE 1: select * from public.test1_error;

Two lines were written correctly in the csv files, therefore they should 
have been added to the tables, but they were not added to the tables 
test and test1.


If I leave only the correct rows, everything works fine and the rows are 
added to the tables.


in copy_test.csv:

2,0

1,1

in copy_test1.csv:

2,0

2,1

1,1

postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV
COPY 2
postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ',' 
CSV save_error
NOTICE:  No error happened.Error holding table public.test1_error will 
be droped

COPY 3

Maybe I'm launching it the wrong way. If so, let me know about it.


I also notice interesting behavior if the table was previously created 
by the user. When I was creating an error_table before the 'copy from' 
operation,
I received a message saying that it is impossible to create a table with 
the same name (it is shown below) during the 'copy from' operation.
I think you should add information about this in the documentation, 
since this seems to be normal behavior to me.


postgres=# CREATE TABLE test_error (LINENO BIGINT, LINE TEXT,
FIELD TEXT, SOURCE TEXT, ERR_MESSAGE TEXT,
ERR_DETAIL TEXT, ERRORCODE TEXT);
CREATE TABLE
postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV 
save_error
ERROR:  Error save table public.test_error already exists. Cannot use it 
for COPY FROM error saving



2. I noticed that you are forming a table name using the type of errors that 
prevent rows from being added during 'copy from' operation.
I think it would be bet

Re: Oversight in reparameterize_path_by_child leading to executor crash

2023-12-08 Thread Alena Rybakina

On 06.12.2023 10:30, Richard Guo wrote:

I've self-reviewed this patch again and I think it's now in a
committable state.  I'm wondering if we can mark it as 'Ready for
Committer' now, or we need more review comments/feedbacks.

To recap, this patch postpones reparameterization of paths until
createplan.c, which would help avoid building the reparameterized
expressions we might not use.  More importantly, it makes it possible to
modify the expressions in RTEs (e.g. tablesample) and in RelOptInfos
(e.g. baserestrictinfo) for reparameterization.  Failing to do that can
lead to executor crashes, wrong results, or planning errors, as we have
already seen.

I marked it as 'Ready for Committer'. I think it is ready.

--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-12-07 Thread Alena Rybakina
Thank you for your work. Unfortunately, your code contained errors 
during the make installation:


'SAVEPOINT' after 'SAVE_ERROR' in unreserved_keyword list is misplaced
'SAVEPOINT' after 'SAVE_ERROR' in bare_label_keyword list is misplaced
make[2]: *** [../../../src/Makefile.global:783: gram.c] Error 1
make[1]: *** [Makefile:131: parser/gram.h] Error 2
make[1]: *** Waiting for unfinished jobs
make: *** [src/Makefile.global:383: submake-generated-headers] Error 2

I have ubuntu 22.04 operation system.

On 06.12.2023 13:47, jian he wrote:

On Tue, Dec 5, 2023 at 6:07 PM Alena Rybakina  wrote:

Hi!

Thank you for your contribution to this thread.


I reviewed it and have a few questions.

1. I have seen that you delete a table before creating it, to which you want to add 
errors due to a failed "copy from" operation. I think this is wrong because 
this table can save useful data for the user.
At a minimum, we should warn the user about this, but I think we can just add 
some number at the end of the name, such as name_table1, name_table_2.

Sorry. I don't understand this part.
Currently, if the error table name already exists, then the copy will
fail, an error will be reported.
I try to first create a table, if no error then the error table will be dropped.
To be honest, first of all, I misunderstood this part of the code. Now I 
see that it works the way you mentioned.


However, I didn't see if you dealt with cases where we already had a 
table with the same name as the table error.
I mean, when is he trying to create for the first time, or will we never 
be able to face such a problem?

Can you demo the expected behavior?

Unfortunately, I was unable to launch it due to a build issue.



2. I noticed that you are forming a table name using the type of errors that 
prevent rows from being added during 'copy from' operation.
I think it would be better to use the name of the source file that was used 
while 'copy from' was running.
In addition, there may be several such files, it is also worth considering.


Another column added.
now it looks like:

SELECT * FROM save_error_csv_error;
  filename | lineno |line
  | field | source | err_message |
err_detail | errorcode
--+++---++-++---
  STDIN|  1 | 2002232 40  50  60  70
80 | NULL  | NULL   | extra data after last expected column   |
NULL   | 22P04
  STDIN|  1 | 2000230 23
  | d | NULL   | missing data for column "d" | NULL
   | 22P04
  STDIN|  1 | z,,""
  | a | z  | invalid input syntax for type integer: "z"  | NULL
   | 22P02
  STDIN|  2 | \0,,
  | a | \0 | invalid input syntax for type integer: "\0" | NULL
   | 22P02

Yes, I see the "filename" column, and this will solve the problem, but 
"STDIN" is unclear to me.

3. I found spelling:

/* no err_nsp.error_rel table then crete one. for holding error. */


fixed.


4. Maybe rewrite this comment

these info need, no error will drop err_nsp.error_rel table
to:
this information is necessary, no error will lead to the deletion of the 
err_sp.error_rel table.


fixed.

Thank you.

5. Is this part of the comment needed? I think it duplicates the information 
below when we form the query.

  * . column list(order by attnum, begin from ctid) =
  *{ctid, lineno,line,field,source,err_message,err_detail,errorcode}
  * . data types (from attnum = -1) ={tid, int8,text,text,text,text,text,text}

I'm not sure if we need to order the rows by number. It might be easier to work 
with these lines in the order they appear.


Simplified the comment. "order by attnum" is to make sure that if
there is a table already existing, and the column name is like X and
the data type like Y, then we consider this table is good for holding
potential error info.

COPY FROM, main entry point is NextCopyFrom.
Now for non-binary mode, if you specified save_error then it will not
fail at NextCopyFrom.
all these three errors will be tolerated: extra data after last
expected column, missing data for column, data type conversion.

It looks clearer and better, thanks!

Comments in the format of questions are unusual for me, I perceive them 
to think about it, for example, as here (contrib/bloom/blinsert.c:312):


/*

 * Didn't find place to insert in notFullPage array.  Allocate new page.
 * (XXX is it good to do this while holding ex-lock on the metapage??)
 */

Maybe we can rewrite it like this:

/* Check, the err_nsp.error_rel table has already existed
* and if it is, check its column name and data types.

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-12-05 Thread Alena Rybakina
re if we need to order the rows by number. It might be easier 
to work with these lines in the order they appear.


--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Re: Implement missing join selectivity estimation for range types

2023-11-30 Thread Alena Rybakina

Hi!

Thank you for your work on the subject, I think it's a really useful 
feature and it allows optimizer to estimate more correctly clauses with 
such type of operator.


I rewieved your patch and noticed that some comments are repeated into 
multirangejoinsel functions, I suggest combining them.


The proposed changes are in the attached patch.


If this topic is about calculating selectivity, have you thought about 
adding cardinality calculation test for queries with this type of operator?


For example, you could form queries similar to those that you use in 
src/test/regress/sql/multirangetypes.sql and 
src/test/regress/sql/rangetypes.sql.


I added a few in the attached patch.

--
Regards,
Alena Rybakina
diff --git a/src/backend/utils/adt/multirangetypes_selfuncs.c 
b/src/backend/utils/adt/multirangetypes_selfuncs.c
index c670d225a0c..7708768b89f 100644
--- a/src/backend/utils/adt/multirangetypes_selfuncs.c
+++ b/src/backend/utils/adt/multirangetypes_selfuncs.c
@@ -1620,14 +1620,15 @@ multirangejoinsel(PG_FUNCTION_ARGS)

hist1_lower, nhist1);
break;
 
+   /*
+* Start by comparing lower bounds and if they are equal
+* compare upper bounds for comparison operators
+*/
case OID_MULTIRANGE_LESS_EQUAL_OP:
 
/*
 * A <= B
 *
-* Start by comparing lower bounds and if they 
are equal
-* compare upper bounds
-*
 * Negation of OID_RANGE_GREATER_OP.
 *
 * Overestimate by comparing only the lower 
bounds. Higher
@@ -1644,9 +1645,6 @@ multirangejoinsel(PG_FUNCTION_ARGS)
/*
 * A < B
 *
-* Start by comparing lower bounds and if they 
are equal
-* compare upper bounds
-*
 * Underestimate by comparing only the lower 
bounds. Higher
 * accuracy would require us to add P(lower1 = 
lower2) *
 * P(upper1 < upper2)
@@ -1661,9 +1659,6 @@ multirangejoinsel(PG_FUNCTION_ARGS)
/*
 * A >= B
 *
-* Start by comparing lower bounds and if they 
are equal
-* compare upper bounds
-*
 * Negation of OID_RANGE_LESS_OP.
 *
 * Overestimate by comparing only the lower 
bounds. Higher
@@ -1680,9 +1675,6 @@ multirangejoinsel(PG_FUNCTION_ARGS)
/*
 * A > B == B < A
 *
-* Start by comparing lower bounds and if they 
are equal
-* compare upper bounds
-*
 * Underestimate by comparing only the lower 
bounds. Higher
 * accuracy would require us to add P(lower1 = 
lower2) *
 * P(upper1 > upper2)
@@ -1773,18 +1765,16 @@ multirangejoinsel(PG_FUNCTION_ARGS)
case OID_MULTIRANGE_ADJACENT_MULTIRANGE_OP:
case OID_MULTIRANGE_ADJACENT_RANGE_OP:
case OID_RANGE_ADJACENT_MULTIRANGE_OP:
-
-   /*
-* just punt for now, estimation would require 
equality
-* selectivity for bounds
-*/
case OID_MULTIRANGE_CONTAINS_ELEM_OP:
case OID_MULTIRANGE_ELEM_CONTAINED_OP:
 
-   /*
-* just punt for now, estimation would require 
extraction of
-* histograms for the anyelement
-*/
+  /*
+   * just punt for now:
+   * if it is a type of adjucent operation 
estimation
+   * it will require equality selectivity for 
bounds;
+   * if it is one of type of contain operation
+   * it will extraction of histograms for the any 
element.
+   */
  

Re: POC, WIP: OR-clause support for indexes

2023-11-30 Thread Alena Rybakina

On 30.11.2023 11:00, Alena Rybakina wrote:

Hi!




Honestly, it seems very hard to avoid the conclusion that this
transformation is being done at too early a stage. Parse analysis is
not the time to try to do query optimization. I can't really believe
that there's a way to produce a committable patch along these lines.
Ideally, a transformation like this should be done after we know what
plan shape we're using (or considering using), so that we can make
cost-based decisions about whether to transform or not. But at the
very least it should happen somewhere in the planner. There's really
no justification for parse analysis rewriting the SQL that the user
entered.


Here, we assume that array operation is generally better than many ORs.
As a result, it should be more effective to make OR->ANY 
transformation in the parser (it is a relatively lightweight 
operation here) and, as a second phase, decompose that in the optimizer.
We implemented earlier prototypes in different places of the 
optimizer, and I'm convinced that only this approach resolves the 
issues we found.

Does this approach look weird? Maybe. We can debate it in this thread.


I think this is incorrect, and the example of A. Korotkov confirms 
this. If we perform the conversion at the parsing stage, we will skip 
the more important conversion using OR expressions. I'll show you in 
the example below.


First of all, I will describe my idea to combine two approaches to 
obtaining plans with OR to ANY transformation and ANY to OR 
transformation. I think they are both good, and we can't work with 
just one of them, we should consider both the option of OR 
expressions, and with ANY.




Just in case, I have attached a patch or->any transformation where this 
happens at the index creation stage.


I get diff file during make check, but judging by the changes, it shows 
that the transformation is going well. I also attached it.



--
Regards,
Alena Rybakina
Postgres Professional
From 9f75b58525b4892db2c360401771f69599ed21c8 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Wed, 29 Nov 2023 18:58:55 +0300
Subject: [PATCH] Transform OR clause to ANY expressions.

Replace (X=N1) OR (X=N2) ... with X = ANY(N1, N2) on the preliminary stage of
optimization when we are still working with a tree expression.
Sometimes it can lead to not optimal plan. But we think it is better to have
array of elements instead of a lot of OR clauses. Here is a room for further
optimizations on decomposing that array into more optimal parts.

---
 src/backend/nodes/queryjumblefuncs.c  |  30 ++
 src/backend/optimizer/path/indxpath.c | 394 +-
 src/backend/utils/misc/guc_tables.c   |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/nodes/queryjumble.h   |   1 +
 src/include/optimizer/paths.h |   1 +
 src/test/regress/expected/create_index.out| 120 ++
 src/test/regress/expected/guc.out |   3 +-
 src/test/regress/expected/join.out|  48 +++
 src/test/regress/expected/partition_prune.out | 179 
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/expected/tidscan.out |  17 +
 src/test/regress/sql/create_index.sql |  32 ++
 src/test/regress/sql/join.sql |  10 +
 src/test/regress/sql/partition_prune.sql  |  22 +
 src/test/regress/sql/tidscan.sql  |   6 +
 src/tools/pgindent/typedefs.list  |   1 +
 17 files changed, 875 insertions(+), 3 deletions(-)

diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index 281907a4d83..c98fda73d17 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -283,6 +283,36 @@ _jumbleNode(JumbleState *jstate, Node *node)
 	}
 }
 
+JumbleState *
+JumbleExpr(Expr *expr, uint64 *queryId)
+{
+	JumbleState *jstate = NULL;
+
+	Assert(queryId != NULL);
+
+	jstate = (JumbleState *) palloc(sizeof(JumbleState));
+
+	/* Set up workspace for query jumbling */
+	jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
+	jstate->jumble_len = 0;
+	jstate->clocations_buf_size = 32;
+	jstate->clocations = (LocationLen *)
+		palloc(jstate->clocations_buf_size * sizeof(LocationLen));
+	jstate->clocations_count = 0;
+	jstate->highest_extern_param_id = 0;
+
+	/* Compute query ID */
+	_jumbleNode(jstate, (Node *) expr);
+	*queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
+jstate->jumble_len,
+0));
+
+	if (*queryId == UINT64CONST(0))
+		*queryId = UINT64CONST(1);
+
+	return jstate;
+}
+
 static void
 _jumbleList(JumbleState *jstate, Node *node)
 {
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 03a5fbdc6dc..acf4937db01 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -34,7 +34,12 @@
 #include "optimizer/restrictinfo.h&qu

Re: POC, WIP: OR-clause support for indexes

2023-11-30 Thread Alena Rybakina

On 30.11.2023 11:30, Andrei Lepikhov wrote:

On 30/11/2023 15:00, Alena Rybakina wrote:
2. The second patch is my patch version when I moved the OR 
transformation in the s index formation stage:


So, I got the best query plan despite the possible OR to ANY 
transformation:


If the user uses a clause like "x IN (1,2) AND y=100", it will break 
your 'good' solution.


No, unfortunately I still see the plan with Seq scan node:

postgres=# explain analyze select * from test where x in (1,2) and y = 100;

 QUERY PLAN

 Gather  (cost=1000.00..12690.10 rows=1 width=12) (actual 
time=72.985..74.832 rows=0 loops=1)

   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on test  (cost=0.00..11690.00 rows=1 width=12) 
(actual time=68.573..68.573 rows=0 loops=3)
 Filter: ((x = ANY ('{1,2}'::integer[])) AND (y = '100'::double 
precision))

 Rows Removed by Filter: 33
 Planning Time: 0.264 ms
 Execution Time: 74.887 ms

(8 rows)

In my opinion, the general approach here is to stay with OR->ANY 
transformation at the parsing stage and invent one more way for 
picking an index by looking into the array and attempting to find a 
compound index.
Having a shorter list of expressions, where uniform ORs are grouped 
into arrays, the optimizer will do such work with less overhead. 


Looking at the current index generation code, implementing this approach 
will require a lot of refactoring so that functions starting with 
get_indexes do not rely on the current baserestrictinfo, but use only 
the indexrestrictinfo, which is a copy of baserestrictinfo. And I think, 
potentially, there may be complexity also with the equivalences that we 
can get from OR expressions. All interesting transformations are 
available only for OR expressions, not for ANY, that is, it makes sense 
to try the last chance to find a suitable plan with the available OR 
expressions and if that plan turns out to be better, use it.



--
Regards,
Alena Rybakina
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2023-11-30 Thread Alena Rybakina
Sorry, I forgot to apply my patches. For the first experiment was 
0001-OR-to-ANY-in-parser-and-ANY-to-OR-in-index.diff and for the second 
experiment was 0002-OR-to-ANY-in-index.diff.


On 30.11.2023 11:00, Alena Rybakina wrote:

Hi!




Honestly, it seems very hard to avoid the conclusion that this
transformation is being done at too early a stage. Parse analysis is
not the time to try to do query optimization. I can't really believe
that there's a way to produce a committable patch along these lines.
Ideally, a transformation like this should be done after we know what
plan shape we're using (or considering using), so that we can make
cost-based decisions about whether to transform or not. But at the
very least it should happen somewhere in the planner. There's really
no justification for parse analysis rewriting the SQL that the user
entered.


Here, we assume that array operation is generally better than many ORs.
As a result, it should be more effective to make OR->ANY 
transformation in the parser (it is a relatively lightweight 
operation here) and, as a second phase, decompose that in the optimizer.
We implemented earlier prototypes in different places of the 
optimizer, and I'm convinced that only this approach resolves the 
issues we found.

Does this approach look weird? Maybe. We can debate it in this thread.


I think this is incorrect, and the example of A. Korotkov confirms 
this. If we perform the conversion at the parsing stage, we will skip 
the more important conversion using OR expressions. I'll show you in 
the example below.


First of all, I will describe my idea to combine two approaches to 
obtaining plans with OR to ANY transformation and ANY to OR 
transformation. I think they are both good, and we can't work with 
just one of them, we should consider both the option of OR 
expressions, and with ANY.


I did this by creating a RelOptInfo with which has references from the 
original RelOptInfo, for which conversion is possible either from 
ANY->OR, or vice versa. After obtaining the necessary transformation, 
I started the procedure for obtaining the seq and index paths for both 
relations and then calculated their cost. The relation with the lowest 
cost is considered the best.


I'm not sure if this is the best approach, but it's less complicated.

I noticed that I got a lower cost for not the best plan, but I think 
this corresponds to another topic related to the wrong estimate 
calculation.


1. The first patch is a mixture of the original patch (when we perform 
the conversion of OR to ANY at the parsing stage), and when we perform 
the conversion at the index creation stage with the conversion to an 
OR expression. We can see that the query proposed by A.Korotkov did 
not have the best plan with ANY expression at all, and even despite 
receiving a query with OR expressions, we cannot get anything better 
than SeqScan, due to the lack of effective logical transformations 
that would have been performed if we had left the OR expressions.


So, I got query plans using enable_or_transformation if it is enabled:

postgres=# create table test as (select (random()*10)::int x, 
(random()*1000) y

from generate_series(1,100) i);
create index test_x_1_y on test (y) where x = 1;
create index test_x_2_y on test (y) where x = 2;
vacuum analyze test;
SELECT 100
CREATE INDEX
CREATE INDEX
VACUUM
postgres=# explain select * from test where (x = 1 or x = 2) and y = 100;
WARNING:  cost with original approach: - 20440.00
WARNING:  cost with OR to ANY applied transfomation: - 15440.00
    QUERY PLAN
-- 


 Gather  (cost=1000.00..12690.10 rows=1 width=12)
   Workers Planned: 2
   ->  Parallel Seq Scan on test  (cost=0.00..11690.00 rows=1 width=12)
 Filter: (((x = 1) OR (x = 2)) AND (y = '100'::double precision))
(4 rows)

and if it is off:

postgres=# set enable_or_transformation =off;
SET
postgres=# explain select * from test where (x = 1 or x = 2) and y = 100;
  QUERY PLAN
-- 


 Bitmap Heap Scan on test  (cost=8.60..12.62 rows=1 width=12)
   Recheck Cond: (((y = '100'::double precision) AND (x = 1)) OR ((y = 
'100'::double precision) AND (x = 2)))

   ->  BitmapOr  (cost=8.60..8.60 rows=1 width=0)
 ->  Bitmap Index Scan on test_x_1_y  (cost=0.00..4.30 rows=1 
width=0)

   Index Cond: (y = '100'::double precision)
 ->  Bitmap Index Scan on test_x_2_y  (cost=0.00..4.30 rows=1 
width=0)

   Index Cond: (y = '100'::double precision)
(7 rows)

2. The second patch is my patch version when I moved the OR 
transformation in the s index formation stage:


So, I got the best query plan despite the possible OR to ANY 
transformation:


postgres=# create table test as (s

Re: POC, WIP: OR-clause support for indexes

2023-11-30 Thread Alena Rybakina
REATE INDEX
VACUUM
postgres=# explain select * from test where (x = 1 or x = 2) and y = 100;
WARNING:  cost with original approach: - 12.618000
WARNING:  cost with OR to ANY applied transfomation: - 15440.00
  QUERY PLAN
--
 Bitmap Heap Scan on test  (cost=8.60..12.62 rows=1 width=12)
   Recheck Cond: (((y = '100'::double precision) AND (x = 1)) OR ((y = 
'100'::double precision) AND (x = 2)))

   ->  BitmapOr  (cost=8.60..8.60 rows=1 width=0)
 ->  Bitmap Index Scan on test_x_1_y  (cost=0.00..4.30 rows=1 
width=0)

   Index Cond: (y = '100'::double precision)
 ->  Bitmap Index Scan on test_x_2_y  (cost=0.00..4.30 rows=1 
width=0)

   Index Cond: (y = '100'::double precision)
(7 rows)



--
Regards,
Alena Rybakina
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2023-11-27 Thread Alena Rybakina

On 25.11.2023 19:13, Alexander Korotkov wrote:

On Sat, Nov 25, 2023 at 1:10 PM Alena Rybakina
 wrote:

On 25.11.2023 04:13, Alexander Korotkov wrote:

It seems to me there is a confusion.  I didn't mean we need to move
conversion of OR-expressions to ANY into choose_bitmap_and() function
or anything like this.  My idea was to avoid degradation of plans,
which I've seen in [1].  Current code for generation of bitmap paths
considers the possibility to split OR-expressions into distinct bitmap
index scans.  But it doesn't consider this possibility for
ANY-expressions.  So, my idea was to enhance our bitmap scan
generation to consider split values of ANY-expressions into distinct
bitmap index scans.  So, in the example [1] and similar queries
conversion of OR-expressions to ANY wouldn't affect the generation of
bitmap paths.

Thanks for the explanation, yes, I did not understand the idea correctly at 
first. I will try to implement something similar.

Alena, great, thank you.  I'm looking forward to the updated patch.

I wrote the patch (any_to_or.diff.txt), although it is still under 
development (not all regression tests have been successful so far), it 
is already clear that for a query where a bad plan was chosen before, it 
is now choosing a more optimal query plan.


postgres=# set enable_or_transformation =on;
SET
postgres=# explain select * from test where (x = 1 or x = 2) and y = 100;
  QUERY PLAN
--
 Bitmap Heap Scan on test  (cost=8.60..12.62 rows=1 width=12)
   Recheck Cond: (((y = '100'::double precision) AND (x = 1)) OR ((y = 
'100'::double precision) AND (x = 2)))

   ->  BitmapOr  (cost=8.60..8.60 rows=1 width=0)
 ->  Bitmap Index Scan on test_x_1_y  (cost=0.00..4.30 rows=1 
width=0)

   Index Cond: (y = '100'::double precision)
 ->  Bitmap Index Scan on test_x_2_y  (cost=0.00..4.30 rows=1 
width=0)

   Index Cond: (y = '100'::double precision)
(7 rows)

While I'm thinking how to combine it now.

BTW, while I was figuring out how create_index_paths works and creating 
bitmapscan indexes, I think I found a bug with unallocated memory (fix 
patch is bugfix.diff.txt). Without a fix here, it falls into the crust 
at the stage of assigning a value to any of the variables, specifically, 
skip_lower_stop and skip_nonnative_saop. I discovered it when I forced 
to form a bitmapindex plan for ANY (any_to_or.diff.txt). I'm causing a 
problem with my OR->ANY conversion patch.


--
Regards,
Alena Rybakina
Postgres Professional
diff --git a/src/backend/optimizer/path/indxpath.c 
b/src/backend/optimizer/path/indxpath.c
index 03a5fbdc6dc..c242eec34d6 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -34,6 +34,7 @@
 #include "optimizer/restrictinfo.h"
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
+#include "parser/parse_expr.h"
 
 
 /* XXX see PartCollMatchesExprColl */
@@ -715,8 +716,8 @@ get_index_paths(PlannerInfo *root, RelOptInfo *rel,
List **bitindexpaths)
 {
List   *indexpaths;
-   boolskip_nonnative_saop = false;
-   boolskip_lower_saop = false;
+   boolskip_nonnative_saop;
+   boolskip_lower_saop;
ListCell   *lc;
 
/*
@@ -737,7 +738,7 @@ get_index_paths(PlannerInfo *root, RelOptInfo *rel,
 * that supports them, then try again including those clauses.  This 
will
 * produce paths with more selectivity but no ordering.
 */
-   if (skip_lower_saop)
+   if (skip_lower_saop || enable_or_transformation)
{
indexpaths = list_concat(indexpaths,
 
build_index_paths(root, rel,
@@ -778,7 +779,7 @@ get_index_paths(PlannerInfo *root, RelOptInfo *rel,
 * natively, generate bitmap scan paths relying on executor-managed
 * ScalarArrayOpExpr.
 */
-   if (skip_nonnative_saop)
+   if (skip_nonnative_saop || enable_or_transformation)
{
indexpaths = build_index_paths(root, rel,
   
index, clauses,
@@ -908,7 +912,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
{
if (!index->amsearcharray)
{
-   if (skip_nonnative_saop)
+   if (skip_nonnative_saop || 
enable_or_transformation)
{
/* Ignore because not supported 
by index */
*skip_nonnative_saop = true;

Re: POC, WIP: OR-clause support for indexes

2023-11-25 Thread Alena Rybakina

On 25.11.2023 04:13, Alexander Korotkov wrote:

It seems to me there is a confusion.  I didn't mean we need to move
conversion of OR-expressions to ANY into choose_bitmap_and() function
or anything like this.  My idea was to avoid degradation of plans,
which I've seen in [1].  Current code for generation of bitmap paths
considers the possibility to split OR-expressions into distinct bitmap
index scans.  But it doesn't consider this possibility for
ANY-expressions.  So, my idea was to enhance our bitmap scan
generation to consider split values of ANY-expressions into distinct
bitmap index scans.  So, in the example [1] and similar queries
conversion of OR-expressions to ANY wouldn't affect the generation of
bitmap paths.
Thanks for the explanation, yes, I did not understand the idea correctly 
at first. I will try to implement something similar.


--
Regards,
Alena Rybakina
Postgres Professional


Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

2023-11-24 Thread Alena Rybakina

On 24.11.2023 13:20, Alena Rybakina wrote:


Hi! Thank you for your work on the subject, I think it's a really 
useful feature.


I've reviewed your patch and I have a few questions.

First of all, have you thought about creating a gun parameter to 
display memory scheduling information? I agree that this is an 
important feature, but I think only for debugging.


Secondly, I noticed that for the child_rinfo_hash key you use a 
counter (int) and can it lead to collisions? Why didn't you generate a 
hash from childRestrictInfo for this? For example, something like how 
it is formed here [0].


[0] 
https://www.postgresql.org/message-id/43ad8a48-b980-410d-a83c-5beebf82a4ed%40postgrespro.ru 



Sorry, my first question was not clear, I mean: have you thought about 
creating a guc parameter to display memory planning information?


--
Regards,
Alena Rybakina


Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

2023-11-24 Thread Alena Rybakina
Hi! Thank you for your work on the subject, I think it's a really useful 
feature.


I've reviewed your patch and I have a few questions.

First of all, have you thought about creating a gun parameter to display 
memory scheduling information? I agree that this is an important 
feature, but I think only for debugging.


Secondly, I noticed that for the child_rinfo_hash key you use a counter 
(int) and can it lead to collisions? Why didn't you generate a hash from 
childRestrictInfo for this? For example, something like how it is formed 
here [0].


[0] 
https://www.postgresql.org/message-id/43ad8a48-b980-410d-a83c-5beebf82a4ed%40postgrespro.ru 



--
Regards,
Alena Rybakina
Postgres Professional


Re: POC, WIP: OR-clause support for indexes

2023-11-23 Thread Alena Rybakina

On 23.11.2023 12:23, Andrei Lepikhov wrote:
I think the usage of nodeToString for the generation of clause hash is 
too expensive and buggy.
Also, in the code, you didn't resolve hash collisions. So, I've 
rewritten the patch a bit (see the attachment).
One more thing: I propose to enable transformation by default at least 
for quick detection of possible issues.
This code changes tests in many places. But, as I see it, it mostly 
demonstrates the positive effect of the transformation.


On 24.11.2023 06:30, Andrei Lepikhov wrote:


On 23/11/2023 16:23, Andrei Lepikhov wrote:
This code changes tests in many places. But, as I see it, it mostly 
demonstrates the positive effect of the transformation.


I found out that the postgres_fdw tests were impacted by the feature. 
Fix it, because the patch is on the commitfest and passes buildfarm.
Taking advantage of this, I suppressed the expression evaluation 
procedure to make regression test changes more clear.


Thank you for your work. You are right, the patch with the current 
changes looks better and works more correctly.


To be honest, I didn't think we could use JumbleExpr in this way.

--
Regards,
Alena Rybakina
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2023-11-21 Thread Alena Rybakina

On 21.11.2023 03:50, Alena Rybakina wrote:

On 20.11.2023 11:52, Andrei Lepikhov wrote:
Looking into the patch, I found some trivial improvements (see 
attachment).
Also, it is not obvious that using a string representation of the 
clause as a hash table key is needed here. Also, by making a copy of 
the node in the get_key_nconst_node(), you replace the location 
field, but in the case of complex expression, you don't do the same 
with other nodes.
I propose to generate expression hash instead + prove the equality of 
two expressions by calling equal().


I was thinking about your last email and a possible error where the 
location field may not be cleared in complex expressions. 
Unfortunately, I didn't come up with any examples either, but I think 
I was able to handle this with a special function that removes 
location-related patterns. The alternative to this is to bypass this 
expression, but I think it will be more invasive. In addition, I have 
added changes related to the hash table: now the key is of type int.


All changes are displayed in the attached 
v9-0001-Replace-OR-clause-to_ANY.diff.txt file.


I haven't measured it yet. But what do you think about these changes?


Sorry, I lost your changes  during the revision process. I returned 
them. I raised the patch version just in case to run ci successfully.


--
Regards,
Alena Rybakina
Postgres Professional
From 66d1c479347d80ea45fc7aebaed873d45f9c4351 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Tue, 21 Nov 2023 14:20:56 +0300
Subject: [PATCH] [PATCH] Replace OR clause to ANY expressions. Replace (X=N1)
 OR (X=N2) ... with X = ANY(N1, N2) on the stage of the optimiser when we are
 still working with a tree expression. Firstly, we do not try to make a
 transformation for "non-or" expressions or inequalities and the creation of a
 relation with "or" expressions occurs according to the same scenario.
 Secondly, we do not make transformations if there are less than set
 or_transform_limit. Thirdly, it is worth considering that we consider "or"
 expressions only at the current level.

---
 src/backend/parser/parse_expr.c   | 280 +-
 src/backend/utils/misc/guc_tables.c   |  10 +
 src/include/parser/parse_expr.h   |   1 +
 src/test/regress/expected/create_index.out| 115 +++
 src/test/regress/expected/guc.out |   3 +-
 src/test/regress/expected/join.out|  50 
 src/test/regress/expected/partition_prune.out | 156 ++
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/expected/tidscan.out |  17 ++
 src/test/regress/sql/create_index.sql |  32 ++
 src/test/regress/sql/join.sql |  10 +
 src/test/regress/sql/partition_prune.sql  |  22 ++
 src/test/regress/sql/tidscan.sql  |   6 +
 src/tools/pgindent/typedefs.list  |   1 +
 14 files changed, 703 insertions(+), 3 deletions(-)

diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 64c582c344c..7b76c9f29a1 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -18,6 +18,7 @@
 #include "catalog/pg_aggregate.h"
 #include "catalog/pg_proc.h"
 #include "catalog/pg_type.h"
+#include "common/hashfn.h"
 #include "commands/dbcommands.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -43,6 +44,7 @@
 
 /* GUC parameters */
 bool		Transform_null_equals = false;
+bool		enable_or_transformation = false;
 
 
 static Node *transformExprRecurse(ParseState *pstate, Node *expr);
@@ -98,7 +100,283 @@ static Expr *make_distinct_op(ParseState *pstate, List *opname,
 			  Node *ltree, Node *rtree, int location);
 static Node *make_nulltest_from_distinct(ParseState *pstate,
 		 A_Expr *distincta, Node *arg);
+typedef struct OrClauseGroupEntry
+{
+	int32			hash_leftvar_key;
+
+	Node		   *node;
+	List		   *consts;
+	Oidscalar_type;
+	Oidopno;
+	Node 		   *expr;
+} OrClauseGroupEntry;
+
+/*
+ * TODO: consider different algorithm to manage complexity
+ * in the case of many different clauses,
+ * like Rabin-Karp or Boyer–Moore algorithms.
+ */
+static char *
+clear_patterns(const char *str, const char *start_pattern)
+{
+	int			i = 0;
+	int			j = 0;
+	int			start_pattern_len = strlen(start_pattern);
+	char	   *res = palloc0(sizeof(*res) * (strlen(str) + 1));
+
+	for (i = 0; str[i];)
+	{
+		if (i >= start_pattern_len && strncmp([i - start_pattern_len],
+			  start_pattern,
+			  start_pattern_len) == 0)
+		{
+			while (str[i] && !(str[i] == '{' || str[i] == '}'))
+i++;
+		}
+		if (str[i])
+			res[j++] = str[i++];
+	}
+
+	return res;
+}
+
+static int32
+get_key_nconst_node(Node *nconst_node)
+{
+	char *str = nodeToString(nconst_node);
+
+	str = clear_patterns(str, " :location");
+
+	if (str == NULL)
+		return 0;
+

Re: POC, WIP: OR-clause support for indexes

2023-11-20 Thread Alena Rybakina

On 20.11.2023 11:52, Andrei Lepikhov wrote:
Looking into the patch, I found some trivial improvements (see 
attachment).
Also, it is not obvious that using a string representation of the 
clause as a hash table key is needed here. Also, by making a copy of 
the node in the get_key_nconst_node(), you replace the location field, 
but in the case of complex expression, you don't do the same with 
other nodes.
I propose to generate expression hash instead + prove the equality of 
two expressions by calling equal().


I was thinking about your last email and a possible error where the 
location field may not be cleared in complex expressions. Unfortunately, 
I didn't come up with any examples either, but I think I was able to 
handle this with a special function that removes location-related 
patterns. The alternative to this is to bypass this expression, but I 
think it will be more invasive. In addition, I have added changes 
related to the hash table: now the key is of type int.


All changes are displayed in the attached 
v9-0001-Replace-OR-clause-to_ANY.diff.txt file.


I haven't measured it yet. But what do you think about these changes?


--
Regards,
Alena Rybakina
Postgres Professional
From 8e579b059ffd415fc477e0e8930e718084e58683 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Tue, 21 Nov 2023 03:22:13 +0300
Subject: [PATCH] [PATCH] Replace OR clause to ANY expressions. Replace (X=N1)
 OR (X=N2) ... with X = ANY(N1, N2) on the stage of the optimiser when we are
 still working with a tree expression. Firstly, we do not try to make a
 transformation for "non-or" expressions or inequalities and the creation of a
 relation with "or" expressions occurs according to the same scenario.
 Secondly, we do not make transformations if there are less than set
 or_transform_limit. Thirdly, it is worth considering that we consider "or"
 expressions only at the current level.

Authors: Alena Rybakina , Andrey Lepikhov 
Reviewed-by: Peter Geoghegan , Ranier Vilela 
Reviewed-by: Alexander Korotkov , Robert Haas 
---
 src/backend/parser/parse_expr.c   | 280 +-
 src/backend/utils/misc/guc_tables.c   |  10 +
 src/include/parser/parse_expr.h   |   1 +
 src/test/regress/expected/create_index.out| 115 +++
 src/test/regress/expected/guc.out |   3 +-
 src/test/regress/expected/join.out|  50 
 src/test/regress/expected/partition_prune.out | 156 ++
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/expected/tidscan.out |  17 ++
 src/test/regress/sql/create_index.sql |  32 ++
 src/test/regress/sql/join.sql |  10 +
 src/test/regress/sql/partition_prune.sql  |  22 ++
 src/test/regress/sql/tidscan.sql  |   6 +
 src/tools/pgindent/typedefs.list  |   1 +
 14 files changed, 703 insertions(+), 3 deletions(-)

diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 64c582c344c..290422005a3 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -18,6 +18,7 @@
 #include "catalog/pg_aggregate.h"
 #include "catalog/pg_proc.h"
 #include "catalog/pg_type.h"
+#include "common/hashfn.h"
 #include "commands/dbcommands.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -43,6 +44,7 @@
 
 /* GUC parameters */
 bool		Transform_null_equals = false;
+bool		enable_or_transformation = false;
 
 
 static Node *transformExprRecurse(ParseState *pstate, Node *expr);
@@ -98,7 +100,283 @@ static Expr *make_distinct_op(ParseState *pstate, List *opname,
 			  Node *ltree, Node *rtree, int location);
 static Node *make_nulltest_from_distinct(ParseState *pstate,
 		 A_Expr *distincta, Node *arg);
+typedef struct OrClauseGroupEntry
+{
+	int32			hash_leftvar_key;
+
+	Node		   *node;
+	List		   *consts;
+	Oidscalar_type;
+	Oidopno;
+	Node 		   *expr;
+} OrClauseGroupEntry;
+
+/*
+ * TODO: consider different algorithm to manage complexity
+ * in the case of many different clauses,
+ * like Rabin-Karp or Boyer–Moore algorithms.
+ */
+static char *
+clear_patterns(const char *str, const char *start_pattern)
+{
+	int			i = 0;
+	int			j = 0;
+	int			start_pattern_len = strlen(start_pattern);
+	char	   *res = palloc0(sizeof(*res) * (strlen(str) + 1));
+
+	for (i = 0; str[i];)
+	{
+		if (i >= start_pattern_len && strncmp([i - start_pattern_len],
+			  start_pattern,
+			  start_pattern_len) == 0)
+		{
+			while (str[i] && !(str[i] == '{' || str[i] == '}'))
+i++;
+		}
+		if (str[i])
+			res[j++] = str[i++];
+	}
+
+	return res;
+}
+
+static int32
+get_key_nconst_node(Node *nconst_node)
+{
+	char *str = nodeToString(nconst_node);
+
+	str = clear_patterns(str, " :location");
+
+	if (str == NULL)
+		return 0;
+
+	return DatumGetInt32(hash_any((unsigned char *)s

Re: Wrong results with grouping sets

2023-11-16 Thread Alena Rybakina

Hi! Thank you for your work on the subject.

On 25.09.2023 10:11, Richard Guo wrote:

I think I've come across a wrong result issue with grouping sets, as
shown by the query below.

-- result is correct with only grouping sets
select a, b
from (values (1, 1), (2, 2)) as t (a, b) where a = b
group by grouping sets((a, b), (a));
 a | b
---+---
 1 | 1
 1 |
 2 | 2
 2 |
(4 rows)

-- result is NOT correct with grouping sets and distinct on
select distinct on (a, b) a, b
from (values (1, 1), (2, 2)) as t (a, b) where a = b
group by grouping sets((a, b), (a));
 a | b
---+---
 1 | 1
 2 | 2
(2 rows)

The distinct on expressions include both 'a' and 'b', so rows (1, 1) and
(1, NULL) should not be considered equal.  (The same for rows (2, 2) and
(2, NULL).)


I noticed that this query worked correctly in the main branch with the 
inequality operator:


postgres=# select distinct on (a, b) a, b from (values (3, 1), (2, 2)) 
as t (a, b) where a > b group by grouping sets((a, b), (a)); a | b 
---+--- 3 | 1 3 | (2 rows)


So, I think you are right)



I think the root cause is that when we generate distinct_pathkeys, we
failed to realize that Var 'b' might be nullable by the grouping sets,
so it's no longer always equal to Var 'a'.  It's not correct to deem
that the PathKey for 'b' is redundant and thus remove it from the
pathkeys list.

We have the same issue when generating sort_pathkeys.  As a result, we
may have the final output in the wrong order.  There were several
reports about this issue before, such as [1][2].

To fix this issue, I'm thinking that we mark the grouping expressions
nullable by grouping sets with a dummy RTE for grouping sets, something
like attached.  In practice we do not need to create a real RTE for
that, what we need is just a RT index.  In the patch I use 0, because
it's not a valid outer join relid, so it would not conflict with the
outer-join-aware-Var infrastructure.

If the grouping expression is a Var or PHV, we can just set its
nullingrels, very straightforward.   For an expression that is neither a
Var nor a PHV, I'm not quite sure how to set the nullingrels.  I tried
the idea of wrapping it in a new PHV to carry the nullingrels, but that
may cause some unnecessary plan diffs.  In the patch for such an
expression I just set the nullingrels of Vars or PHVs that are contained
in it.  This is not really 'correct' in theory, because it is the whole
expression that can be nullable by grouping sets, not its individual
vars.  But it works in practice, because what we need is that the
expression can be somehow distinguished from the same expression in ECs,
and marking its vars is sufficient for this purpose.  But what if the
expression is variable-free?  This is the point that needs more work.
Fow now the patch just handles variable-free expressions of type Const,
by wrapping it in a new PHV.

There are some places where we need to artificially remove the RT index
of grouping sets from the nullingrels, such as when we generate
PathTarget for initial input to grouping nodes, or when we generate
PathKeys for the grouping clauses, because the expressions there are
logically below the grouping sets.  We also need to do that in
set_upper_references when we update the targetlist of an Agg node, so
that we can perform exact match for nullingrels, rather than superset
match.

Since the fix depends on the nullingrels stuff, it seems not easy for
back-patching.  I'm not sure what we should do in back branches.

Any thoughts?

[1] 
https://www.postgresql.org/message-id/cambws48atqtqgk37msydk_eagdo3y0ia_lzvuvgq2uo_wh2...@mail.gmail.com
[2] 
https://www.postgresql.org/message-id/17071-24dc13fbfa296...@postgresql.org



I looked at your patch and noticed a few things:

1. I think you should add a test with the cube operator, because I 
noticed that the order of the query in the result has also changed:


master:
postgres=# select a,b from (values(1,1),(2,2)) as t (a,b) where a=b 
group by cube (a, (a,b)) order by b,a; a | b ---+--- 1 | 1 1 | 1 1 | 2 | 
2 2 | 2 2 | | (7 rows)


with patch:

postgres=# select a, b from (values (1, 1), (2, 2)) as t (a, b) where a 
= b group by cube(a, (a, b)) order by b,a; a | b ---+--- 1 | 1 1 | 1 2 | 
2 2 | 2 1 | 2 | | (7 rows)


--
Regards,
Alena Rybakina


Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-11-14 Thread Alena Rybakina

Hi!

On 14.11.2023 13:10, Damir Belyalov wrote:


 Here is a very straw-man-level sketch of what I think might work.
 The option to COPY FROM looks something like

      ERRORS TO other_table_name (item [, item [, ...]])


I tried to implement the patch using a table and came across a number 
of questions.


Which table should we implement for this feature: a system catalog 
table or store this table as a file or create a new table?


In these cases, security and user rights management issues arise.
It is better for other users not to see error lines from another user. 
It is also not clear how access rights to this table are inherited and 
be given.



Maybe we can add a guc or a parameter to output such errors during the 
execution of the copy function with errors and check whether the user 
has enough rights to set such a parameter?


That is, I propose to give the user a choice to run copy with and 
without saving errors and at the same time immediately check whether the 
option with error output is possible for him in principle?


--
Regards,
Alena Rybakina


Re: POC, WIP: OR-clause support for indexes

2023-11-10 Thread Alena Rybakina

On 06.11.2023 16:51, Alena Rybakina wrote:
I also support this approach. I have almost finished writing a patch 
that fixes the first problem related to the quadratic complexity of 
processing expressions by adding a hash table.


I also added a check: if the number of groups is equal to the number 
of OR expressions, we assume that no expressions need to be converted 
and interrupt further execution.


Now I am trying to fix the last problem in this patch: three tests 
have indicated a problem related to incorrect conversion. I don't 
think it can be serious, but I haven't figured out where the mistake 
is yet.


I added log like that: ERROR:  unrecognized node type: 0.


I fixed this issue and added some cosmetic refactoring.

The changes are presented in the or_patch_changes.diff file.

--
Regards,
Alena Rybakina
Postgres Professional
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 25a4235dbd9..46212a77c64 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -44,7 +44,7 @@
 
 /* GUC parameters */
 bool		Transform_null_equals = false;
-bool		or_transform_limit = false;
+bool		enable_or_transformation = false;
 
 
 static Node *transformExprRecurse(ParseState *pstate, Node *expr);
@@ -108,7 +108,7 @@ typedef struct OrClauseGroupEntry
 	List		   *consts;
 	Oidscalar_type;
 	Oidopno;
-	Expr 		   *expr;
+	Node 		   *expr;
 } OrClauseGroupEntry;
 
 static int
@@ -189,16 +189,16 @@ transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig)
 	  HASH_ELEM | HASH_FUNCTION | HASH_COMPARE);
 
 	/* If this is not an 'OR' expression, skip the transformation */
-	if (expr_orig->boolop != OR_EXPR || !or_transform_limit || len_ors == 1 || !or_group_htab)
+	if (expr_orig->boolop != OR_EXPR || !enable_or_transformation || len_ors == 1 || !or_group_htab)
 		return transformBoolExpr(pstate, (BoolExpr *) expr_orig);
 
 	foreach(lc, expr_orig->args)
 	{
 		Node			   *arg = lfirst(lc);
-		Node			   *orqual;
-		Node			   *const_expr;
-		Node			   *nconst_expr;
-		OrClauseGroupEntry *gentry;
+		Node			   *orqual = NULL;
+		Node			   *const_expr = NULL;
+		Node			   *nconst_expr = NULL;
+		OrClauseGroupEntry *gentry = NULL;
 		boolfound;
 		char		   	   *str;
 
@@ -270,7 +270,7 @@ transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig)
 		gentry = hash_search(or_group_htab, , HASH_ENTER, );
 		gentry->node = nconst_expr;
 		gentry->consts = list_make1(const_expr);
-		gentry->expr = (Expr *) orqual;
+		gentry->expr = orqual;
 		gentry->hash_leftvar_key = str;
 	}
 
@@ -283,7 +283,6 @@ transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig)
 		* transformed bool expression.
 		*/
 		hash_destroy(or_group_htab);
-		list_free(or_list);
 		return (Node *) makeBoolExpr(OR_EXPR, or_list, expr_orig->location);
 	}
 	else
@@ -345,9 +344,9 @@ transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig)
  * OK: coerce all the right-hand non-Var inputs to the common
  * type and build an ArrayExpr for them.
  */
-List	   *aexprs;
-ArrayExpr  *newa;
-ScalarArrayOpExpr *saopexpr;
+List	   *aexprs = NIL;
+ArrayExpr  *newa = NULL;
+ScalarArrayOpExpr *saopexpr = NULL;
 ListCell *l;
 
 aexprs = NIL;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 54fd09abde7..3411f023df8 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1049,12 +1049,12 @@ struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 	{
-		{"or_transform_limit", PGC_USERSET, QUERY_TUNING_OTHER,
+		{"enable_or_transformation", PGC_USERSET, QUERY_TUNING_OTHER,
 			gettext_noop("Transform a sequence of OR clauses to an IN expression."),
 			gettext_noop("The planner will replace clauses like 'x=c1 OR x=c2 .."
 		 "to the clause 'x IN (c1,c2,...)'")
 		},
-		_transform_limit,
+		_or_transformation,
 		false,
 		NULL, NULL, NULL
 	},
diff --git a/src/include/parser/parse_expr.h b/src/include/parser/parse_expr.h
index 7a6943c116c..3a87de02859 100644
--- a/src/include/parser/parse_expr.h
+++ b/src/include/parser/parse_expr.h
@@ -17,7 +17,7 @@
 
 /* GUC parameters */
 extern PGDLLIMPORT bool Transform_null_equals;
-extern PGDLLIMPORT bool or_transform_limit;
+extern PGDLLIMPORT bool enable_or_transformation;
 
 extern Node *transformExpr(ParseState *pstate, Node *expr, ParseExprKind exprKind);
 
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 29c2bc6a2b2..dbc8bc3bed0 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -1883,7 +1883,7 @@ SELECT count(*) FROM tenk1
 10
 (1 row)
 
-SET or_transform_limit = on;
+SET enable_or_transformation = on;
 EXPLAIN (COSTS OFF)
 SELECT * FROM tenk1
   WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR ten

Re: POC, WIP: OR-clause support for indexes

2023-11-06 Thread Alena Rybakina

On 30.10.2023 17:06, Alexander Korotkov wrote:

On Mon, Oct 30, 2023 at 3:40 PM Robert Haas  wrote:

On Thu, Oct 26, 2023 at 5:05 PM Peter Geoghegan  wrote:

On Thu, Oct 26, 2023 at 12:59 PM Robert Haas  wrote:

Alexander's example seems to show that it's not that simple. If I'm
reading his example correctly, with things like aid = 1, the
transformation usually wins even if the number of things in the OR
expression is large, but with things like aid + 1 * bid = 1, the
transformation seems to lose at least with larger numbers of items. So
it's not JUST the number of OR elements but also what they contain,
unless I'm misunderstanding his point.

Alexander said "Generally, I don't see why ANY could be executed
slower than the equivalent OR clause". I understood that this was his
way of expressing the following idea:

"In principle, there is no reason to expect execution of ANY() to be
slower than execution of an equivalent OR clause (except for
noise-level differences). While it might not actually look that way
for every single type of plan you can imagine right now, that doesn't
argue for making a cost-based decision. It actually argues for fixing
the underlying issue, which can't possibly be due to some kind of
fundamental advantage enjoyed by expression evaluation with ORs".

This is also what I think of all this.

I agree with that, with some caveats, mainly that the reverse is to
some extent also true. Maybe not completely, because arguably the
ANY() formulation should just be straight-up easier to deal with, but
in principle, the two are equivalent and it shouldn't matter which
representation we pick.

But practically, it may, and we need to be sure that we don't put in
place a translation that is theoretically a win but in practice leads
to large regressions. Avoiding regressions here is more important than
capturing all the possible gains. A patch that wins in some scenarios
and does nothing in others can be committed; a patch that wins in even
more scenarios but causes serious regressions in some cases probably
can't.

+1
Sure, I've identified two cases where patch shows regression [1].  The
first one (quadratic complexity of expression processing) should be
already addressed by usage of hash.  The second one (planning
regression with Bitmap OR) is not yet addressed.

Links
1. 
https://www.postgresql.org/message-id/CAPpHfduJtO0s9E%3DSHUTzrCD88BH0eik0UNog1_q3XBF2wLmH6g%40mail.gmail.com

I also support this approach. I have almost finished writing a patch 
that fixes the first problem related to the quadratic complexity of 
processing expressions by adding a hash table.


I also added a check: if the number of groups is equal to the number of 
OR expressions, we assume that no expressions need to be converted and 
interrupt further execution.


Now I am trying to fix the last problem in this patch: three tests have 
indicated a problem related to incorrect conversion. I don't think it 
can be serious, but I haven't figured out where the mistake is yet.


I added log like that: ERROR:  unrecognized node type: 0.

--
Regards,
Alena Rybakina
Postgres Professional
From 3062a2408af258b9e327219020221b75501e8530 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Mon, 6 Nov 2023 16:48:00 +0300
Subject: [PATCH] Replace OR clause to ANY expressions. Replace (X=N1) OR
 (X=N2) ... with X = ANY(N1, N2) on the stage of the optimiser when we are
 still working with a tree expression. Firstly, we do not try to make a
 transformation for "non-or" expressions or inequalities and the creation of a
 relation with "or" expressions occurs according to the same scenario.
 Secondly, we do not make transformations if there are less than set
 or_transform_limit. Thirdly, it is worth considering that we consider "or"
 expressions only at the current level.

Authors: Alena Rybakina , Andrey Lepikhov 
Reviewed-by: Peter Geoghegan , Ranier Vilela 
Reviewed-by: Alexander Korotkov , Robert Haas 
---
 src/backend/parser/parse_expr.c   | 299 +-
 src/backend/utils/misc/guc_tables.c   |  10 +
 src/include/parser/parse_expr.h   |   1 +
 src/test/regress/expected/create_index.out| 115 +++
 src/test/regress/expected/guc.out |   3 +-
 src/test/regress/expected/join.out|  50 +++
 src/test/regress/expected/partition_prune.out | 179 +++
 src/test/regress/expected/tidscan.out |  17 +
 src/test/regress/sql/create_index.sql |  32 ++
 src/test/regress/sql/join.sql |  10 +
 src/test/regress/sql/partition_prune.sql  |  22 ++
 src/test/regress/sql/tidscan.sql  |   6 +
 src/tools/pgindent/typedefs.list  |   1 +
 13 files changed, 743 insertions(+), 2 deletions(-)

diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 64c582c344c..25a4235dbd9 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser

Not deleted mentions of the cleared path

2023-10-30 Thread Alena Rybakina

Hi, hackers!

I have already written about the problem of InvalidPath [0] appearing. I 
investigated this and found an error in the add_path() function, when we 
reject a path, we free up the memory of the path, but do not delete 
various mentions of it (for example, in the ancestor of relation, as in 
the example below).


Thus, we may face the problem of accessing the freed memory.

I demonstrated this below using gdb when I call a query after running a 
test in test/regression/sql/create_misc.sql:


*Query:*

:-- That ALTER TABLE should have added TOAST tables.

SELECT relname, reltoastrelid <> 0 AS has_toast_table
   FROM pg_class
   WHERE oid::regclass IN ('a_star', 'c_star')
   ORDER BY 1;

--UPDATE b_star*
--   SET a = text 'gazpacho'
--   WHERE aa > 4;

SELECT class, aa, a FROM a_star*;


*gdb:
*

0x7ff3f7325fda in epoll_wait (epfd=5, events=0x55bf9ee75c38, 
maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30

30  ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
(gdb) b /home/alena/postgrespro_3/src/backend/optimizer/util/pathnode.c:620
Breakpoint 1 at 0x55bf9cfe4c65: file pathnode.c, line 621.
(gdb) c
Continuing.

Breakpoint 1, add_path (parent_rel=0x55bf9ef7f5c0, 
new_path=0x55bf9ef7f4e0) at pathnode.c:621

621 if (!IsA(new_path, IndexPath))
(gdb) n
622   pfree(new_path);
(gdb) n
624 }
(gdb) p *new_path
$1 = {type = T_Invalid, pathtype = T_Invalid, parent = 
0x7f7f7f7f7f7f7f7f, pathtarget = 0x7f7f7f7f7f7f7f7f,
  param_info = 0x7f7f7f7f7f7f7f7f, parallel_aware = 127, parallel_safe 
= 127, parallel_workers = 2139062143,
  rows = 1.3824172084878715e+306, startup_cost = 
1.3824172084878715e+306, total_cost = 1.3824172084878715e+306,

  pathkeys = 0x7f7f7f7f7f7f7f7f}
*(gdb) p new_path
$2 = (Path *) 0x55bf9ef7f4e0*

(gdb) p ((ProjectionPath *)((SortPath*)parent_rel->pathlist->elements 
[0].ptr_value)->subpath)->path->parent->cheapest_startup_path

*$20 = (struct Path *) 0x55bf9ef7f4e0*

(gdb) p *((ProjectionPath *)((SortPath*)parent_rel->pathlist->elements 
[0].ptr_value)->subpath)->path->parent->cheapest_startup_path
$17 = {type = T_Invalid, pathtype = T_Invalid, parent = 
0x7f7f7f7f7f7f7f7f, pathtarget = 0x7f7f7f7f7f7f7f7f,
  param_info = 0x7f7f7f7f7f7f7f7f, parallel_aware = 127, parallel_safe 
= 127, parallel_workers = 2139062143,
  rows = 1.3824172084878715e+306, startup_cost = 
1.3824172084878715e+306, total_cost = 1.3824172084878715e+306,

  pathkeys = 0x7f7f7f7f7f7f7f7f}

(gdb) p (Path*)(((ProjectionPath 
*)((SortPath*)parent_rel->pathlist->elements 
[0].ptr_value)->subpath)->path->parent->pathlist->elements[1].ptr_value)

*$21 = (Path *) 0x55bf9ef7f4e0*

(gdb) p *(Path*)(((ProjectionPath 
*)((SortPath*)parent_rel->pathlist->elements 
[0].ptr_value)->subpath)->path->parent->pathlist->elements[1].ptr_value)
$19 = {type = T_Invalid, pathtype = T_Invalid, parent = 
0x7f7f7f7f7f7f7f7f, pathtarget = 0x7f7f7f7f7f7f7f7f,
  param_info = 0x7f7f7f7f7f7f7f7f, parallel_aware = 127, parallel_safe 
= 127, parallel_workers = 2139062143,
  rows = 1.3824172084878715e+306, startup_cost = 
1.3824172084878715e+306, total_cost = 1.3824172084878715e+306,

  pathkeys = 0x7f7f7f7f7f7f7f7f}
(gdb)


The same problem may be in the add_partial_path() function.

Unfortunately, I have not yet been able to find a problematic query with 
the described case, but I have prepared a patch to fix this problem.


What do you think?

0. 
https://www.postgresql.org/message-id/flat/01aa76cc-93e1-4de9-ab34-b3a890bf7...@postgrespro.ru


--
Regards,
Alena Rybakina
Postgres Professional
From b9483800eb9d8dc9bafb114c602c63d5cf596ee7 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Mon, 30 Oct 2023 14:22:21 +0300
Subject: [PATCH] Fix reject released path: we shouldn't only clean path, but
 we should delete all mentions ottherwice we will face the problem access to
 released memory.

---
 src/backend/optimizer/util/pathnode.c | 26 --
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 0b1d17b9d33..4df9d34c9f1 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -619,7 +619,19 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
 	{
 		/* Reject and recycle the new path */
 		if (!IsA(new_path, IndexPath))
-			pfree(new_path);
+		{
+			if(new_path == new_path->parent->cheapest_startup_path)
+new_path->parent->cheapest_startup_path = NULL;
+			if(new_path == new_path->parent->cheapest_total_path)
+new_path->parent->cheapest_total_path = NULL;
+			foreach(p1, new_path->parent->pathlist)
+			{
+Path *path = (Path *) lfirst(p1);
+
+if (path == new_path)
+	new_path->parent->pathlist = foreach_delete_current(new_path->parent->pathlist, p1);
+			}
+		}
 	}
 }
 
@@ -849,7 +861,17 

Re: POC, WIP: OR-clause support for indexes

2023-10-29 Thread Alena Rybakina

Hi!

On 27.10.2023 00:04, Peter Geoghegan wrote:

On Thu, Oct 26, 2023 at 12:59 PM Robert Haas  wrote:

Alexander's example seems to show that it's not that simple. If I'm
reading his example correctly, with things like aid = 1, the
transformation usually wins even if the number of things in the OR
expression is large, but with things like aid + 1 * bid = 1, the
transformation seems to lose at least with larger numbers of items. So
it's not JUST the number of OR elements but also what they contain,
unless I'm misunderstanding his point.

Alexander said "Generally, I don't see why ANY could be executed
slower than the equivalent OR clause". I understood that this was his
way of expressing the following idea:

"In principle, there is no reason to expect execution of ANY() to be
slower than execution of an equivalent OR clause (except for
noise-level differences). While it might not actually look that way
for every single type of plan you can imagine right now, that doesn't
argue for making a cost-based decision. It actually argues for fixing
the underlying issue, which can't possibly be due to some kind of
fundamental advantage enjoyed by expression evaluation with ORs".

This is also what I think of all this.

Alexander's partial index example had this quality to it. Obviously,
the planner *could* be taught to do the right thing with such a case,
with a little more work. The fact that it doesn't right now is
definitely a problem, and should probably be treated as a blocker for
this patch. But that doesn't really argue against the general idea
behind the patch -- it just argues for fixing that one problem.

There may also be a separate problem that comes from the added planner
cycles required to do the transformation -- particularly in extreme or
adversarial cases. We should worry about that, too. But, again, it
doesn't change the basic fact, which is that having a
standard/normalized representation of OR lists/DNF transformation is
extremely useful in general, and *shouldn't* result in any real
slowdowns at execution time if done well.
I think it would be more correct to finalize the current approach to 
converting "OR" expressions to "ANY", since quite a few problems related 
to this patch have already been found here, I think you can solve them 
first, and then you can move on.



To me, this sort of output suggests that perhaps the transformation is
being done in the wrong place. I expect that we have to decide whether
to convert from OR to = ANY(...) at a very early stage of the planner,
before we have any idea what the selected path will ultimately be. But
this output suggests that we want the answer to depend on which kind
of path is going to be faster, which would seem to argue for doing
this sort of transformation as part of path generation for only those
paths that will benefit from it, rather than during earlier phases of
expression processing/simplification.

I don't think that that's the right direction. They're semantically
equivalent things. But a SAOP-based plan can be fundamentally better,
since SAOPs enable passing down useful context to index AMs (at least
nbtree). And because we can use a hash table for SAOP expression
evaluation. It's a higher level, standardized, well optimized way of
expressing exactly the same concept.

I can come up with a case that'll be orders of magnitude more
efficient with this patch, despite the transformation process only
affecting a small OR list of 3 or 5 elements -- a 100x reduction in
heap page accesses is quite possible. This is particularly likely to
come up if you assume that the nbtree patch that I'm currently working
on is also available. In general, I think that we totally over-rely on
bitmap index scans, especially BitmapOrs.


Regarding the application of the transformation at an early stage, the 
patch is almost ready, except for solving cases related to queries that 
work slower. I haven't figured out how to exclude such requests without 
comparing the cost or parameter by the number of OR elements yet. The 
simplest option is not to process Expr types (already mentioned earlier) 
in the queries that Alexander gave as an example, but as I already said, 
I don't like this approach very much.


--
Regards,
Alena Rybakina
Postgres Professional


Invalid Path with UpperRel

2023-10-27 Thread Alena Rybakina

Hi, hackers!

I recently encountered strange behavior when, after running the 
create_ms.sql test, I ran the last query from this test. In general, the 
playback looks like this:


\i src/test/regress/sql/create_misc.sql

I added Assert(0) in create_sort_plan() before calling 
create_plan_recurse and restarted postgres. After that I run query:


SELECT relname, reltoastrelid <> 0 AS has_toast_table
   FROM pg_class
   WHERE oid::regclass IN ('a_star', 'c_star')
   ORDER BY 1;

I found Invalid_path in cheapest_startup_path:

 (gdb) p *(IndexPath *)((SortPath 
*)best_path)->subpath->parent->cheapest_startup_path
$12 = {path = {type = T_Invalid, pathtype = T_Invalid, parent = 
0x7f7f7f7f7f7f7f7f, pathtarget = 0x7f7f7f7f7f7f7f7f, param_info = 
0x7f7f7f7f7f7f7f7f,
parallel_aware = 127, parallel_safe = 127, parallel_workers = 
2139062143 , rows = 1.3824172084878715e+306, 
startup_cost = 1.3824172084878715e+306,
total_cost = 1.3824172084878715e+306, pathkeys = 
0x7f7f7f7f7f7f7f7f}, indexinfo = 0x7f7f7f7f7f7f7f7f, indexclauses = 
0x7f7f7f7f7f7f7f7f,
  indexorderbys = 0x7f7f7f7f7f7f7f7f, indexorderbycols = 
0x7f7f7f7f7f7f7f7f, indexscandir = 2139062143 , 
indextotalcost = 1.3824172084878715e+306,

  indexselectivity = 1.3824172084878715e+306}

(gdb) p (IndexPath *)((SortPath 
*)best_path)->subpath->parent->cheapest_startup_path

$11 = (IndexPath *) 0x555febc66160

I found that this beginning since creation upperrel (fetch_upper_rel 
function):


/* primary planning entry point (may recurse for subqueries) */  root = 
subquery_planner(glob, parse, NULL,    false, tuple_fraction);  /* 
Select best Path and turn it into a Plan */ * final_rel = 
fetch_upper_rel(root, UPPERREL_FINAL, NULL);*  best_path = 
get_cheapest_fractional_path(final_rel, tuple_fraction);

Red Heart

(gdb) p *(IndexPath *)((SortPath *)final_rel->cheapest_total_path 
)->subpath->parent->cheapest_startup_path
$15 = {path = {type = T_Invalid, pathtype = T_Invalid, parent = 
0x7f7f7f7f7f7f7f7f, pathtarget = 0x7f7f7f7f7f7f7f7f, param_info = 
0x7f7f7f7f7f7f7f7f,
parallel_aware = 127, parallel_safe = 127, parallel_workers = 
2139062143 , rows = 1.3824172084878715e+306, 
startup_cost = 1.3824172084878715e+306,
total_cost = 1.3824172084878715e+306, pathkeys = 
0x7f7f7f7f7f7f7f7f}, indexinfo = 0x7f7f7f7f7f7f7f7f, indexclauses = 
0x7f7f7f7f7f7f7f7f,
  indexorderbys = 0x7f7f7f7f7f7f7f7f, indexorderbycols = 
0x7f7f7f7f7f7f7f7f, indexscandir = 2139062143 , 
indextotalcost = 1.3824172084878715e+306,

  indexselectivity = 1.3824172084878715e+306}
(gdb) p (IndexPath *)((SortPath *)final_rel->cheapest_total_path 
)->subpath->parent->cheapest_startup_path

$16 = (IndexPath *) 0x555febc66160

I know it doesn't cause a crash anywhere, but can anybody explain me 
what's going on here and why Invalid Path appears?


--
Regards,
Alena Rybakina


Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

2023-10-26 Thread Alena Rybakina

On 25.10.2023 18:35, Andrei Zubkov wrote:

Hi Alena,

On Wed, 2023-10-25 at 16:25 +0300, Alena Rybakina wrote:

  Hi! Thank you for your work on the subject.
1. I didn't understand why we first try to find pgssEntry with a
false top_level value, and later find it with a true top_level value.

In case of pg_stat_statements the top_level field is the bool field of
the pgssHashKey struct used as the key for pgss_hash hashtable. When we
are performing a reset only userid, dbid and queryid values are
provided. We need to reset both top-level and non-top level entries. We
have only one way to get them all from a hashtable - search for entries
having top_level=true and with top_level=false in their keys.

Thank you for explanation, I got it.

2. And honestly, I think you should change
  "Remove the entry if it exists, starting with the non-top-level
entry." on
  "Remove the entry or reset min/max time statistic information and
the timestamp if it exists, starting with the non-top-level entry."
And the same with the comment bellow:
"Also reset the top-level entry if it exists."
  "Also remove the entry or reset min/max time statistic information
and the timestamp if it exists."

There are four such places actually - every time when the
SINGLE_ENTRY_RESET macro is used. The logic of reset performed is
described a bit in this macro definition. It seems quite redundant to
repeat this description four times. But I've noted that "remove" terms
should be replaced by "reset".

I've attached v24 with commits updated.

Yes, I agree with the changes.

--
Regards,
Alena Rybakina


Re: POC, WIP: OR-clause support for indexes

2023-10-26 Thread Alena Rybakina

On 26.10.2023 22:58, Robert Haas wrote:

On Thu, Oct 26, 2023 at 3:47 PM Alena Rybakina
 wrote:

With small amounts of "OR" elements, the cost of orexpr is lower than with 
"ANY", on the contrary, higher.

Alexander's example seems to show that it's not that simple. If I'm
reading his example correctly, with things like aid = 1, the
transformation usually wins even if the number of things in the OR
expression is large, but with things like aid + 1 * bid = 1, the
transformation seems to lose at least with larger numbers of items. So
it's not JUST the number of OR elements but also what they contain,
unless I'm misunderstanding his point.
Yes, I agree, with Alexander's example, this option will not help and 
here I need to look inside Expr itself. But I noticed that such a 
complex non-constant expression is always an OpExpr type, otherwise if 
the non-constant part contains only one variable, then it is a Var type. 
We can add a constraint that we will transform expressions with the 
simple variables like x=1 or x=2 or x=3, etc., but expressions like 
x*1+y=1 or x*2+y=2... we ignore.


But then, we do not consider expressions when the nonconstant part is 
always the same for expressions. For example, we could transform x*1+y=1 
or x*1+y=2... to x*1+y = ANY([1,2,...]). But I think it's not so 
critical, because such cases are rare.



Index Scan using pg_class_oid_index on pg_class  (cost=0.27..2859.42 rows=414 
width=68) (actual time=1.504..34.183 rows=260 loops=1)
Index Cond: (oid = ANY (ARRAY['1'::oid, '2'::oid, '3'::oid, '4'::oid, 
'5'::oid, '6'::oid, '7'::oid,

Bitmap Heap Scan on pg_class  (cost=43835.00..54202.14 rows=414 width=68) 
(actual time=39.958..41.293 rows=260 loops=1)
Recheck Cond: ((oid = '1'::oid) OR (oid = '2'::oid) OR (oid = '3'::oid) OR 
(oid = '4'::oid) OR (oid =

I think we could see which value is lower, and if lower with expressions converted to 
ANY, then work with it further, otherwise work with the original "OR" 
expressions. But we still need to make this conversion to find out its cost.

To me, this sort of output suggests that perhaps the transformation is
being done in the wrong place. I expect that we have to decide whether
to convert from OR to = ANY(...) at a very early stage of the planner,
before we have any idea what the selected path will ultimately be. But
this output suggests that we want the answer to depend on which kind
of path is going to be faster, which would seem to argue for doing
this sort of transformation as part of path generation for only those
paths that will benefit from it, rather than during earlier phases of
expression processing/simplification.

I'm not sure I have the full picture here, though, so I might have
this all wrong.

This would be the most ideal option, and to be honest, I like the 
conversion at an early stage also because there are no problems with 
selectivity or link updates if we changed the structure of RestrictInfo 
of relation.


But in terms of calculating which option is better to use transformed or 
original, I think this solution might be complicated, since we need not 
only to highlight the cases in which the transformation wins in 
principle, but also with which types of data it will work best and there 
is a risk of missing some cases and we may need the own evaluation 
model. Now it's hard for me to come up with something simple.


The cost option seems simpler and clearer to me, but yes, it is 
difficult to decide when it is better to do the conversion for the most 
correct estimate.



--
Regards,
Alena Rybakina





Re: POC, WIP: OR-clause support for indexes

2023-10-26 Thread Alena Rybakina

Hi! Thank you for your feedback!

On 25.10.2023 22:54, Robert Haas wrote:

On Sat, Oct 14, 2023 at 6:37 PM Alexander Korotkov  wrote:

Regarding the GUC parameter, I don't see we need a limit.  It's not
yet clear whether a small number or a large number of OR clauses are
more favorable for transformation.  I propose to have just a boolean
enable_or_transformation GUC.

That's a poor solution. So is the GUC patch currently has
(or_transform_limit). What you need is a heuristic that figures out
fairly reliably whether the transformation is going to be better or
worse. Or else, do the whole thing in some other way that is always
same-or-better.

In general, adding GUCs makes sense when the user knows something that
we can't know. For example, shared_buffers makes some sense because,
even if we discovered how much memory the machine has, we can't know
how much of it the user wants to devote to PostgreSQL as opposed to
anything else. And track_io_timing makes sense because we can't know
whether the user wants to pay the price of gathering that additional
data. But GUCs are a poor way of handling cases where the real problem
is that we don't know what code to write. In this case, some queries
will be better with enable_or_transformation=on, and some will be
better with enable_or_transformation=off. Since we don't know which
will work out better, we make the user figure it out and set the GUC,
possibly differently for each query. That's terrible. It's the query
optimizer's whole job to figure out which transformations will speed
up the query. It shouldn't turn around and punt the decision back to
the user.

Notice that superficially-similar GUCs like enable_seqscan aren't
really the same thing at all. That's just for developer testing and
debugging. Nobody expects that you have to adjust that GUC on a
production system - ever.


I noticed that the costs of expressions are different and it can help to 
assess when it is worth leaving the conversion, when not.


With small amounts of "OR" elements, the cost of orexpr is lower than 
with "ANY", on the contrary, higher.


postgres=# SET or_transform_limit = 500;
EXPLAIN (analyze)
SELECT oid,relname FROM pg_class
WHERE
  oid = 13779 AND (oid = 2 OR oid = 4 OR oid = 5)
;
SET
  QUERY PLAN
--
 Index Scan using pg_class_oid_index on pg_class  (*cost=0.27..8.30* 
rows=1 width=68) (actual time=0.105..0.106 rows=0 loops=1)

   Index Cond: (oid = '13779'::oid)
   Filter: ((oid = '2'::oid) OR (oid = '4'::oid) OR (oid = '5'::oid))
 Planning Time: 0.323 ms
 Execution Time: 0.160 ms

(5 rows)

postgres=# SET or_transform_limit = 0;
EXPLAIN (analyze)
SELECT oid,relname FROM pg_class
WHERE
  oid = 13779 AND (oid = 2 OR oid = 4 OR oid = 5)
;
SET
  QUERY PLAN
---
 Index Scan using pg_class_oid_index on pg_class  (*cost=0.27..16.86* 
rows=1 width=68) (actual time=0.160..0.161 rows=0 loops=1)
   Index Cond: ((oid = ANY (ARRAY['2'::oid, '4'::oid, '5'::oid])) AND 
(oid = '13779'::oid))

 Planning Time: 4.515 ms
 Execution Time: 0.313 ms
(4 rows)


Index Scan using pg_class_oid_index on pg_class  (*cost=0.27..2859.42* 
rows=414 width=68) (actual time=1.504..34.183 rows=260 loops=1)
   Index Cond: (oid = ANY (ARRAY['1'::oid, '2'::oid, '3'::oid, 
'4'::oid, '5'::oid, '6'::oid, '7'::oid,


Bitmap Heap Scan on pg_class  (*cost=43835.00..54202.14* rows=414 
width=68) (actual time=39.958..41.293 rows=260 loops=1)
   Recheck Cond: ((oid = '1'::oid) OR (oid = '2'::oid) OR (oid = 
'3'::oid) OR (oid = '4'::oid) OR (oid =


I think we could see which value is lower, and if lower with expressions 
converted to ANY, then work with it further, otherwise work with the 
original "OR" expressions. But we still need to make this conversion to 
find out its cost.


In addition, I will definitely have to postpone the transformation of 
"OR" to "ANY" at the stage of creating indexes (?) or maybe a little 
earlier so that I have to count only the cost of the transformed 
expression.


Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

2023-10-25 Thread Alena Rybakina

On 19.10.2023 15:40, Andrei Zubkov wrote:

Hi hackers,

New version 23 attached. It contains rebase to the current master.
Noted that v1.11 adds new fields to the pg_stat_sstatements view, but
leaves the PGSS_FILE_HEADER constant unchanged. It this correct?

Hi! Thank you for your work on the subject.

1. I didn't understand why we first try to find pgssEntry with a false 
top_level value, and later find it with a true top_level value.


/*
 * Remove the entry if it exists, starting with the non-top-level entry.
 */
*key.toplevel = false;*
entry = (pgssEntry *) hash_search(pgss_hash, , HASH_FIND, NULL);

SINGLE_ENTRY_RESET(entry);

*/* Also reset the top-level entry if it exists. */
key.toplevel = true;*
entry = (pgssEntry *) hash_search(pgss_hash, , HASH_FIND, NULL);

SINGLE_ENTRY_RESET(entry);

I looked through this topic and found some explanation in this email 
[0], but I didn't understand it. Can you explain it to me?


2. And honestly, I think you should change
"Remove the entry if it exists, starting with the non-top-level entry." on
"Remove the entry or reset min/max time statistic information and the 
timestamp if it exists, starting with the non-top-level entry."


And the same with the comment bellow:

"Also reset the top-level entry if it exists."
"Also remove the entry or reset min/max time statistic information and 
the timestamp if it exists."


In my opinion, this is necessary because the minmax_only parameter is 
set by the user, so both ways are possible.



0 - 
https://www.postgresql.org/message-id/62d16845-e74e-a6f9-9661-022e44f48922%40inbox.ru


--
Regards,
Alena Rybakina


Re: Simplify create_merge_append_path a bit for clarity

2023-10-24 Thread Alena Rybakina

Hi!

On 11.08.2023 05:31, Richard Guo wrote:

As explained in the comments for generate_orderedappend_paths, we don't
currently support parameterized MergeAppend paths, and it doesn't seem
like going to change anytime soon.  Based on that,  we could simplify
create_merge_append_path a bit, such as set param_info to NULL directly
rather than call get_appendrel_parampathinfo() for it.  We already have
an Assert on that in create_merge_append_plan.

I understand that the change would not make any difference for
performance, it's just for clarity's sake.


I agree with you, and we can indeed directly set the param_info value to 
NULL, and there are enough comments here to explain.


I didn't find anything else to add in your patch.

--
Regards,
Alena Rybakina





Re: Evaluate arguments of correlated SubPlans in the referencing ExprState

2023-10-23 Thread Alena Rybakina

Hi!

I looked through your patch and noticed that it was not applied to the 
current version of the master. I rebased it and attached a version. I 
didn't see any problems and, honestly, no big changes were needed, all 
regression tests were passed.


I think it's better to add a test, but to be honest, I haven't been able 
to come up with something yet.


--
Regards,
Alena Rybakina
From f7a8ca7f3263fa5f82056f90231cf937133622c9 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 23 Oct 2023 22:54:04 +0300
Subject: [PATCH] WIP: Evaluate arguments of correlated SubPlans in the
 referencing ExprState

---
 src/backend/executor/execExpr.c   | 93 +--
 src/backend/executor/execExprInterp.c | 23 +++
 src/backend/executor/execProcnode.c   |  5 ++
 src/backend/executor/nodeSubplan.c| 30 -
 src/backend/jit/llvm/llvmjit_expr.c   |  6 ++
 src/backend/jit/llvm/llvmjit_types.c  |  1 +
 src/include/executor/execExpr.h   |  6 +-
 src/include/nodes/execnodes.h |  1 -
 8 files changed, 113 insertions(+), 52 deletions(-)

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 2c62b0c9c84..14cd56a2db3 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -88,6 +88,9 @@ static void ExecBuildAggTransCall(ExprState *state, AggState *aggstate,
   FunctionCallInfo fcinfo, AggStatePerTrans pertrans,
   int transno, int setno, int setoff, bool ishash,
   bool nullcheck);
+static void ExecInitSubPlanExpr(SubPlan *subplan,
+ExprState *state,
+Datum *resv, bool *resnull);
 
 
 /*
@@ -1389,7 +1392,6 @@ ExecInitExprRec(Expr *node, ExprState *state,
 		case T_SubPlan:
 			{
 SubPlan*subplan = (SubPlan *) node;
-SubPlanState *sstate;
 
 /*
  * Real execution of a MULTIEXPR SubPlan has already been
@@ -1406,19 +1408,7 @@ ExecInitExprRec(Expr *node, ExprState *state,
 	break;
 }
 
-if (!state->parent)
-	elog(ERROR, "SubPlan found with no parent plan");
-
-sstate = ExecInitSubPlan(subplan, state->parent);
-
-/* add SubPlanState nodes to state->parent->subPlan */
-state->parent->subPlan = lappend(state->parent->subPlan,
- sstate);
-
-scratch.opcode = EEOP_SUBPLAN;
-scratch.d.subplan.sstate = sstate;
-
-ExprEvalPushStep(state, );
+ExecInitSubPlanExpr(subplan, state, resv, resnull);
 break;
 			}
 
@@ -2750,29 +2740,12 @@ ExecPushExprSetupSteps(ExprState *state, ExprSetupInfo *info)
 	foreach(lc, info->multiexpr_subplans)
 	{
 		SubPlan*subplan = (SubPlan *) lfirst(lc);
-		SubPlanState *sstate;
 
 		Assert(subplan->subLinkType == MULTIEXPR_SUBLINK);
 
-		/* This should match what ExecInitExprRec does for other SubPlans: */
-
-		if (!state->parent)
-			elog(ERROR, "SubPlan found with no parent plan");
-
-		sstate = ExecInitSubPlan(subplan, state->parent);
-
-		/* add SubPlanState nodes to state->parent->subPlan */
-		state->parent->subPlan = lappend(state->parent->subPlan,
-		 sstate);
-
-		scratch.opcode = EEOP_SUBPLAN;
-		scratch.d.subplan.sstate = sstate;
-
 		/* The result can be ignored, but we better put it somewhere */
-		scratch.resvalue = >resvalue;
-		scratch.resnull = >resnull;
-
-		ExprEvalPushStep(state, );
+		ExecInitSubPlanExpr(subplan, state,
+			>resvalue, >resnull);
 	}
 }
 
@@ -4178,3 +4151,57 @@ ExecBuildParamSetEqual(TupleDesc desc,
 
 	return state;
 }
+
+static void
+ExecInitSubPlanExpr(SubPlan *subplan,
+	ExprState *state,
+	Datum *resv, bool *resnull)
+{
+	ExprEvalStep scratch = {0};
+	SubPlanState *sstate;
+	ListCell   *pvar;
+	ListCell   *l;
+
+	if (!state->parent)
+		elog(ERROR, "SubPlan found with no parent plan");
+
+	/*
+	 * Generate steps to evaluate input arguments for the subplan.
+	 *
+	 * We evaluate the argument expressions into ExprState's resvalue/resnull,
+	 * and then use PARAM_SET to update the parameter. We do that, instead of
+	 * evaluating directly into the param, to avoid depending on the pointer
+	 * value remaining stable / being included in the generated expression. No
+	 * danger of conflicts with other uses of resvalue/resnull as storing and
+	 * using the value always is in subsequent steps.
+	 *
+	 * Any calculation we have to do can be done in the parent econtext, since
+	 * the Param values don't need to have per-query lifetime.
+	 */
+	forboth(l, subplan->parParam, pvar, subplan->args)
+	{
+		int			paramid = lfirst_int(l);
+
+		ExecInitExprRec(lfirst(pvar), state,
+		>resvalue, >resnull);
+
+		scratch.opcode = EEOP_PARAM_SET;
+		scratch.d.param.paramid = paramid;
+		/* type isn't needed, but an old value could be confusing */
+		scratch.d.param.paramtype = InvalidOid;
+		ExprEvalPushStep(state, );
+	}
+
+	sstate = ExecInitSubPlan(subplan, state->parent);
+
+	/* add SubPlanState nodes to state->parent->sub

Re: Oversight in reparameterize_path_by_child leading to executor crash

2023-10-19 Thread Alena Rybakina

Hi!

Thank you for your work on the subject.


During review your patch I didn't understand why are you checking that 
the variable is path and not new_path of type T_SamplePath (I 
highlighted it)?


Path *
reparameterize_path_by_child(PlannerInfo *root, Path *path,
 RelOptInfo *child_rel)

...
switch (nodeTag(path))
    {
    case T_Path:
        new_path = path;
ADJUST_CHILD_ATTRS(new_path->parent->baserestrictinfo);
        if (*path*->pathtype == T_SampleScan)
        {

Is it a typo and should be new_path?


Besides, it may not be important, but reviewing your code all the time 
stumbled on the statement of the comments while reading it (I 
highlighted it too). This:


* path_is_reparameterizable_by_child
 *         Given a path parameterized by the parent of the given child 
relation,
 *         see if it can be translated to be parameterized by the child 
relation.

 *
 * This must return true if *and only if *reparameterize_path_by_child()
 * would succeed on this path.

Maybe is it better to rewrite it simplier:

 * This must return true *only if *reparameterize_path_by_child()
 * would succeed on this path.


And can we add assert in reparameterize_pathlist_by_child function that 
pathlist is not a NIL, because according to the comment it needs to be 
added there:


Returns NIL to indicate failure, so pathlist had better not be NIL.

--
Regards,
Alena Rybakina


Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

2023-10-18 Thread Alena Rybakina

Hi!

Thank you for your work on the subject.


I reviewed your patch and found that your commit message does not fully 
explain your code, in addition, I found several spelling mistakes.


I think it's better to change to:

With parallel seqscan, we should consider materializing the cheapest 
inner path in
case of nested loop if it doesn't contain a unique node or it is unsafe 
to use it in a subquery.



Besides, I couldn't understand why we again check that material path is 
safe?


if (matpath != NULL && matpath->parallel_safe)
        try_partial_nestloop_path(root, joinrel, outerpath, matpath,
                                  pathkeys, jointype, extra);

--
Regards,
Alena Rybakina


Re: A new strategy for pull-up correlated ANY_SUBLINK

2023-10-13 Thread Alena Rybakina

On 13.10.2023 10:04, Andy Fan wrote:


It seems to me that the expressions "=" and "IN" are equivalent
here due to the fact that the aggregated subquery returns only one
value, and the result with the "IN" operation can be considered as
the intersection of elements on the left and right. In this query,
we have some kind of set on the left, among which there will be
found or not only one element on the right.


Yes, they are equivalent at the final result, but there are some
differences at the execution level.  the '=' case will be transformed
to a Subplan whose subPlanType is EXPR_SUBLINK, so if there
is more than 1 rows is returned in the subplan, error will be raised.

select * from tenk1 where
  ten =  (select ten from tenk1 i where i.two = tenk1.two );

ERROR:  more than one row returned by a subquery used as an expression

However the IN case would not.
select * from tenk1 where
  ten =  (select ten from tenk1 i where i.two = tenk1.two ) is OK.

I think the test case you added is not related to this feature. the
difference is there even without the patch.  so I kept the code
you changed, but not for the test  case.
Yes, I understand and agree with you that we should delete the last 
queries, except to one.


The query below have a different result compared to master, and it is 
correct.



Without your patch:

explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd);
 QUERY PLAN
-
 Nested Loop Left Join
   ->  Seq Scan on tenk1 a
   ->  Materialize
 ->  Seq Scan on tenk2 b
   Filter: (SubPlan 2)
   SubPlan 2
 ->  Result
   InitPlan 1 (returns $1)
 ->  Limit
   ->  Index Scan using tenk2_hundred on 
tenk2 c

 Index Cond: (hundred IS NOT NULL)
 Filter: (odd = b.odd)
(12 rows)


After your patch:

postgres=# explain (costs off)
SELECT * FROM tenk1 A LEFT JOIN tenk2 B
ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd);

   QUERY PLAN
--
  Nested Loop Left Join
    ->  Seq Scan on tenk1 a
    ->  Materialize
  ->  Nested Loop
    ->  Seq Scan on tenk2 b
*->  Subquery Scan on "ANY_subquery"
  Filter: (b.hundred = "ANY_subquery".min)*
  ->  Aggregate
    ->  Seq Scan on tenk2 c
  Filter: (odd = b.odd)
(10 rows)




I took the liberty of adding this to your patch and added
myself as reviewer, if you don't mind.

Sure, the patch after your modification looks better than the
original.
I'm not sure how the test case around "because of got one row" is
relevant to the current changes.  After we reach to some agreement
on the above discussion, I think v4 is good for committer to review!


Thank you!) I am ready to discuss it.

Actually I meant to discuss the "Unfortunately, I found a request..", 
looks

we have reached an agreement there:)


Yes, we have)

--
Regards,
Alena Rybakina
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index 17df6b5dc9c..e41b728df83 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -2028,3 +2028,27 @@ ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
  ->  Seq Scan on tenk2 b
 (11 rows)
 
+-- we can pull up the aggregate sublink into RHS of a left join.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd);
+  QUERY PLAN   
+---
+ Nested Loop Left Join
+   ->  Seq Scan on tenk1 a
+   ->  Materialize
+ ->  Nested Loop
+   ->  Seq Scan on tenk2 b
+   ->  Memoize
+ Cache Key: b.hundred, b.odd
+ Cache Mode: binary
+ ->  Subquery Scan on "ANY_subquery"
+   Filter: (b.hundred = "ANY_subquery".min)
+   ->  Result
+ InitPlan 1 (returns $1)
+   ->  Limit
+ ->  Index Scan using tenk2_hundred on tenk2 c
+   Index Cond: (hundred IS NOT NULL)
+   

Re: A new strategy for pull-up correlated ANY_SUBLINK

2023-10-12 Thread Alena Rybakina
    ->  Seq Scan on tenk2 c
(9 rows)



I took the liberty of adding this to your patch and added myself
as reviewer, if you don't mind.

Sure, the patch after your modification looks better than the original.
I'm not sure how the test case around "because of got one row" is
relevant to the current changes.  After we reach to some agreement
on the above discussion, I think v4 is good for committer to review!


Thank you!) I am ready to discuss it.

--
Regards,
Alena Rybakina
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index b70b346696d..0b8f28b157f 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -2028,7 +2028,7 @@ ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
  ->  Seq Scan on tenk2 b
 (11 rows)
 
--- we can pull up the aggregate sublink into the subquery scan because of got one row.
+-- we can pull up the aggregate sublink into RHS of a left join.
 explain (costs off)
 SELECT * FROM tenk1 A LEFT JOIN tenk2 B
 ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd);
@@ -2052,7 +2052,6 @@ ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd);
Filter: (odd = b.odd)
 (16 rows)
 
--- we can pull up the aggregate sublink into the JoinExpr.
 explain (costs off)
 SELECT * FROM tenk1 A LEFT JOIN tenk2 B
 ON B.hundred in (SELECT count(c.hundred) FROM tenk2 C group by (c.odd));
@@ -2071,3 +2070,39 @@ ON B.hundred in (SELECT count(c.hundred) FROM tenk2 C group by (c.odd));
  ->  Seq Scan on tenk2 c
 (11 rows)
 
+-- we can't pull up the aggregate sublink into LHS of a left join.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON A.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd);
+   QUERY PLAN
+-
+ Nested Loop Left Join
+   Join Filter: (SubPlan 2)
+   ->  Seq Scan on tenk1 a
+   ->  Materialize
+ ->  Seq Scan on tenk2 b
+   SubPlan 2
+ ->  Result
+   InitPlan 1 (returns $1)
+ ->  Limit
+   ->  Index Scan using tenk2_hundred on tenk2 c
+ Index Cond: (hundred IS NOT NULL)
+ Filter: (odd = b.odd)
+(12 rows)
+
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON A.hundred in (SELECT count(c.hundred) FROM tenk2 C group by (c.odd));
+QUERY PLAN 
+---
+ Nested Loop Left Join
+   Join Filter: (hashed SubPlan 1)
+   ->  Seq Scan on tenk1 a
+   ->  Materialize
+ ->  Seq Scan on tenk2 b
+   SubPlan 1
+ ->  HashAggregate
+   Group Key: c.odd
+   ->  Seq Scan on tenk2 c
+(9 rows)
+
diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql
index 5d33eb39baa..c02513c4989 100644
--- a/src/test/regress/sql/subselect.sql
+++ b/src/test/regress/sql/subselect.sql
@@ -1001,12 +1001,20 @@ explain (costs off)
 SELECT * FROM tenk1 A INNER JOIN tenk2 B
 ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
 
--- we can pull up the aggregate sublink into the subquery scan because of got one row.
+-- we can pull up the aggregate sublink into RHS of a left join.
 explain (costs off)
 SELECT * FROM tenk1 A LEFT JOIN tenk2 B
 ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd);
 
--- we can pull up the aggregate sublink into the JoinExpr.
 explain (costs off)
 SELECT * FROM tenk1 A LEFT JOIN tenk2 B
-ON B.hundred in (SELECT count(c.hundred) FROM tenk2 C group by (c.odd));
\ No newline at end of file
+ON B.hundred in (SELECT count(c.hundred) FROM tenk2 C group by (c.odd));
+
+-- we can't pull up the aggregate sublink into LHS of a left join.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON A.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd);
+
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON A.hundred in (SELECT count(c.hundred) FROM tenk2 C group by (c.odd));
\ No newline at end of file


Re: A new strategy for pull-up correlated ANY_SUBLINK

2023-10-11 Thread Alena Rybakina
    ->  HashAggregate
 Group Key: c.odd
 ->  Seq Scan on tenk2 c
(11 rows)


Unfortunately, I found a request when sublink did not pull-up, as in the 
examples above. I couldn't quite figure out why.


create table a (x int, y int, z int, t int);
create table b (x int, t int);
create unique index on a (t, x);
create index on b (t,x);
insert into a select id, id, id, id FROM generate_series(1,10) As id;
insert into b select id, id FROM generate_series(1,1000) As id;

explain (analyze, costs off, buffers)
select b.x, b.x, a.y
from b
    left join a
        on b.x=a.x and
*b.t in
            (select max(a0.t) *
         from a a0
         where a0.x = b.x and
           a0.t = b.t);

QUERY PLAN

 Hash Right Join (actual time=1.150..58.512 rows=1000 loops=1)
   Hash Cond: (a.x = b.x)
*Join Filter: (SubPlan 2)*
   Buffers: shared hit=3546
   ->  Seq Scan on a (actual time=0.023..15.798 rows=10 loops=1)
 Buffers: shared hit=541
   ->  Hash (actual time=1.038..1.042 rows=1000 loops=1)
 Buckets: 4096  Batches: 1  Memory Usage: 72kB
 Buffers: shared hit=5
 ->  Seq Scan on b (actual time=0.047..0.399 rows=1000 loops=1)
   Buffers: shared hit=5
   SubPlan 2
 ->  Result (actual time=0.018..0.018 rows=1 loops=1000)
   Buffers: shared hit=3000
   InitPlan 1 (returns $2)
 ->  Limit (actual time=0.015..0.016 rows=1 loops=1000)
   Buffers: shared hit=3000
   ->  Index Only Scan using a_t_x_idx on a a0 (actual 
time=0.014..0.014 rows=1 loops=1000)
 Index Cond: ((t IS NOT NULL) AND (t = b.t) AND 
(x = b.x))

 Heap Fetches: 1000
 Buffers: shared hit=3000
 Planning Time: 0.630 ms
 Execution Time: 58.941 ms
(23 rows)

I thought it would be:

explain (analyze, costs off, buffers)
select b.x, b.x, a.y
from b
    left join a on
        b.x=a.x and
*b.t =
            (select max(a0.t) *
         from a a0
         where a0.x = b.x and
                   a0.t <= b.t);

QUERY PLAN
-
 Hash Right Join (actual time=1.181..67.927 rows=1000 loops=1)
   Hash Cond: (a.x = b.x)
*Join Filter: (b.t = (SubPlan 2))*
   Buffers: shared hit=3546
   ->  Seq Scan on a (actual time=0.022..17.109 rows=10 loops=1)
 Buffers: shared hit=541
   ->  Hash (actual time=1.065..1.068 rows=1000 loops=1)
 Buckets: 4096  Batches: 1  Memory Usage: 72kB
 Buffers: shared hit=5
 ->  Seq Scan on b (actual time=0.049..0.401 rows=1000 loops=1)
   Buffers: shared hit=5
   SubPlan 2
 ->  Result (actual time=0.025..0.025 rows=1 loops=1000)
   Buffers: shared hit=3000
   InitPlan 1 (returns $2)
 ->  Limit (actual time=0.024..0.024 rows=1 loops=1000)
   Buffers: shared hit=3000
   ->  Index Only Scan Backward using a_t_x_idx on a a0 
(actual time=0.023..0.023 rows=1 loops=1000)
 Index Cond: ((t IS NOT NULL) AND (t <= b.t) 
AND (x = b.x))

 Heap Fetches: 1000
 Buffers: shared hit=3000
 Planning Time: 0.689 ms
 Execution Time: 68.220 ms
(23 rows)

If you noticed, it became possible after replacing the "in" operator 
with "=".



I took the liberty of adding this to your patch and added myself as 
reviewer, if you don't mind.



--
Regards,
Alena Rybakina
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 6af4a3183ac..f46ec3d1826 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1284,18 +1284,16 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 	Assert(sublink->subLinkType == ANY_SUBLINK);
 
 	/*
-	 * If the sub-select refers to any Vars of the parent query, we have to
-	 * treat it as LATERAL.  (Vars of higher levels don't matter here.)
+	 * If the sub-select refers to any Vars of the parent query, we so let's
+	 * considering it as LATERAL.  (Vars of higher levels don't matter here.)
 	 */
 	sub_ref_outer_relids = pull_varnos_of_level(NULL, (Node *) subselect, 1);
 
-	if (!bms_is_empty(sub_ref_outer_relids))
-	{
-		if (bms_is_subset(sub_ref_outer_relids, available_rels))
-			use_lateral = true;
-		else
-			return NULL;
-	}
+	use_lateral = !bms_is_empty(sub_ref_outer_relids) &&
+		 bms_is_subset(sub_ref_outer_relids, available_rels);
+
+	if (!use_lateral && !bms_is_empty(sub_ref_outer_relids))
+		return NULL;
 
 	/*
 	 * The test expression must contain some Vars of the pare

Re: Removing unneeded self joins

2023-10-10 Thread Alena Rybakina

Hi!

I have reviewed your patch and I noticed a few things.

First of all, I think I found a bug in your latest patch version, and 
this query shows it:


EXPLAIN (COSTS OFF)
SELECT c.oid, e.oid FROM pg_class c FULL JOIN (
  SELECT e1.oid FROM pg_extension e1, pg_extension e2
  WHERE e1.oid=e2.oid) AS e
  ON c.oid=e.oid;

In the current version we get such a query plan:

   QUERY PLAN
-
 Hash Full Join
   Hash Cond: (c.oid = e2.oid)
   ->  Seq Scan on pg_class c
   ->  Hash
 ->  Seq Scan on pg_extension e2
(5 rows)

But I think it should be:

QUERY PLAN
-
Hash Full Join
Hash Cond: (c.oid = e2.oid)
-> Seq Scan on pg_class c
-> Hash
-> Seq Scan on pg_extension e2
*Filter: (oid IS NOT NULL)*
(6 rows)

I have looked at the latest version of the code, I assume that the error 
lies in the replace_varno_walker function, especially in the place where 
we check the node by type Var, and does not form any NullTest node.


if (OidIsValid(reloid) && get_attnotnull(reloid, attnum)) -- this 
condition works

    {
  rinfo->clause = (Expr *) makeBoolConst(true, false);
    }
    else
    {
  NullTest   *ntest = makeNode(NullTest);

  ntest->arg = leftOp;
  ntest->nulltesttype = IS_NOT_NULL;
  ntest->argisrow = false;
  ntest->location = -1;
  rinfo->clause = (Expr *) ntest;
    }


Secondly, I added some code in some places to catch erroneous cases and 
added a condition when we should not try to apply the self-join-removal 
transformation due to the absence of an empty self-join list after 
searching for it and in general if there are no joins in the query. 
Besides, I added a query for testing and wrote about it above. I have 
attached my diff file.



In addition, I found a comment for myself that was not clear to me. I 
would be glad if you could explain it to me.


You mentioned superior outer join in the comment, unfortunately, I 
didn't find anything about it in the PostgreSQL code, and this 
explanation remained unclear to me. Could you explain in more detail 
what you meant?


/*
 * At this stage joininfo lists of inner and outer can contain
 * only clauses, required for *a superior outer join* that can't
 * influence on this optimization. So, we can avoid to call the
 * build_joinrel_restrictlist() routine.
*/
 restrictlist = generate_join_implied_equalities(root, joinrelids,
  inner->relids,
      outer, NULL);

--

Regards,
Alena Rybakina
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 3e10083905c..5ba5ca693f1 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -1704,6 +1704,8 @@ remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
 	List			   *binfo_candidates = NIL;
 	ReplaceVarnoContext	ctx = {.from = toRemove->relid, .to = toKeep->relid};
 
+	Assert(toKeep->relid != -1);
+
 	/*
 	 * Replace index of removing table with the keeping one. The technique of
 	 * removing/distributing restrictinfo is used here to attach just appeared
@@ -2007,6 +2009,8 @@ match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
 /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
 continue;
 
+			Assert(is_opclause(orinfo->clause));
+
 			oclause = bms_is_empty(orinfo->left_relids) ?
 	get_rightop(orinfo->clause) : get_leftop(orinfo->clause);
 			c2 = (bms_is_empty(orinfo->left_relids) ?
@@ -2150,6 +2154,18 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids)
 			split_selfjoin_quals(root, restrictlist, ,
  , inner->relid, outer->relid);
 
+			if (list_length(selfjoinquals) == 0)
+ 			{
+ /*
+  * XXX:
+  * we would detect self-join without quals like 'x==x' if we had
+  * an foreign key constraint on some of other quals and this join
+  * haven't any columns from the outer in the target list.
+  * But it is still complex task.
+  */
+ continue;
+ 			}
+
 			/*
 			 * To enable SJE for the only degenerate case without any self join
 			 * clauses at all, add baserestrictinfo to this list.
@@ -2332,7 +2348,7 @@ remove_useless_self_joins(PlannerInfo *root, List *joinlist)
 	Relids	ToRemove = NULL;
 	int		relid = -1;
 
-	if (!enable_self_join_removal)
+	if ((list_length(joinlist) <=1 && !IsA(linitial(joinlist), List)) || !enable_self_join_removal)
 		return joinlist;
 
 	/*
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 10b23944feb..800410d6b18 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5807,13 +5807,11 @@ explain (costs off)
 select p.* from
   (parent p left join child c on (p.k = c.k))

Re: POC, WIP: OR-clause support for indexes

2023-08-09 Thread Alena Rybakina
the focus was on expression evaluation overhead. But that's only
one of the benefits that we now expect from the patch, right? So it
seems like something that should be revisited soon.

I'm not suggesting that there is no need for some kind of limit. But
it seems like a set of heuristics might be a better approach. Although
I would like to get a better sense of the costs of the transformation
to be able to say too much more.


Yes, this may be revised in the future after some transformations. 
Initially, I was solving the problem described here [0]. So, after 
testing [1], I come to the conclusion that 500 is the ideal value for 
or_transform_limit.


[0] 
https://www.postgresql.org/message-id/919bfbcb-f812-758d-d687-71f89f0d9a68%40postgrespro.ru


[1] 
https://www.postgresql.org/message-id/6b97b517-f36a-f0c6-3b3a-0cf8cfba220c%40yandex.ru


--
Regards,
Alena Rybakina
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2023-08-03 Thread Alena Rybakina

On 02.08.2023 21:58, Peter Geoghegan wrote:

On Wed, Aug 2, 2023 at 8:58 AM Alena Rybakina  wrote:

No, I haven't thought about it yet. I studied the example and it would
really be nice to add optimization here. I didn't notice any problems
with its implementation. I also have an obvious example with the "or"
operator, for example
, select * from multi_test, where (a, b ) = ( 1, 1 ) or (a, b ) = ( 2, 1
) ...;

Although I think such a case will be used less often.

Right. As I said, I don't particularly care about the row constructor
syntax -- it's not essential.

In my experience patches like this one that ultimately don't succeed
usually *don't* have specific problems that cannot be fixed. The real
problem tends to be ambiguity about the general high level design. So
more than anything else, ambiguity is the thing that you need to
minimize to be successful here. This is the #1 practical problem, by
far. This may be the only thing about your patch that I feel 100% sure
of.

In my experience it can actually be easier to expand the scope of a
project, and to come up with a more general solution:

https://en.wikipedia.org/wiki/Inventor%27s_paradox

I'm not trying to make your work more difficult by expanding its
scope. I'm actually trying to make your work *easier* by expanding its
scope. I don't claim to know what the specific scope of your patch
should be at all. Just that it might be possible to get a much clearer
picture of what the ideal scope really is by *trying* to generalize it
further -- that understanding is what we lack right now. Even if this
exercise fails in some way, it won't really have been a failure. The
reasons why it fails will still be interesting and practically
relevant.

As I explained to Jim, I am trying to put things in this area on a
more rigorous footing. For example, I have said that the way that the
nbtree code executes SAOP quals is equivalent to DNF. That is
basically true, but it's also my own slightly optimistic
interpretation of history and of the design. That's a good start, but
it's not enough on its own.

My interpretation might still be wrong in some subtle way, that I have
yet to discover. That's really what I'm concerned about with your
patch, too. I'm currently trying to solve a problem that I don't yet
fully understand, so for me "getting a properly working flow of
information" seems like a good practical exercise. I'm trying to
generalize the design of my own patch as far as I can, to see what
breaks, and why it breaks. My intuition is that this will help me with
my own patch by forcing me to gain a truly rigorous understanding of
the problem.

My suggestion about generalizing your approach to cover RowCompareExpr
cases is what I would do, if I were you, and this was my patch. That's
almost exactly what I'm doing with my own patch already, in fact.

It's all right. I understand your position)

I also agree to try to find other optimization cases and generalize them.

I read the wiki article, and as I understand it, in such a situation we 
see a difficult problem with finding expressions that need to be 
converted into a logically correct expression and simplify execution for 
the executor. For example, this is a ROW type. It can have a simpler 
expression with AND and OR operations, besides we can exclude 
duplicates. But some of these transformations may be incorrect or they 
will have a more complex representation. We can try to find the 
problematic expressions and try to combine them into groups and finally 
find a solutions for each groups or, conversely, discover that the 
existing transformation is uncorrected. If I understand correctly, we 
should first start searching for "ROW" expressions (define a group for 
them) and think about a solution for the group.

It seems to me the most difficult thing is to notice problematic cases
where the transformations are incorrect, but I think it can be implemented.

Right. To be clear, I am sure that it won't be practical to come up
with a 100% theoretically pure approach. If for no other reason than
this: normalizing to CNF in all cases will run into problems with very
complex predicates. It might even be computationally intractable
(could just be very slow). So there is a clear need to keep
theoretical purity in balance with practical considerations. There is
a need for some kind of negotiation between those two things. Probably
some set of heuristics will ultimately be required to keep costs and
benefits in balance.

I don't expect you to determine what set of heuristics will ultimately
be required to determine when and how to perform CNF conversions, in
the general case. But having at least some vague idea seems like it
might increase confidence in your design.
I agree, but I think this will be the second step after solutions are 
found.

Do you think that this problem is just an accidental side-effect? It
isn't necessarily the responsibility of your patch to fix such things.
If it's

Re: POC, WIP: OR-clause support for indexes

2023-07-31 Thread Alena Rybakina

Hi!


I think it really helps to speed up the search with similar deep
filtering compared to cluster indexes, but do we have cases where we
don't use this algorithm because it takes longer than an usual index?
I thought about the situation with wide indexes (with a lot of multiple
columns) and having a lot of filtering predicates for them.

I think that it should be possible for the optimizer to only use
multi-column SAOP index paths when there is at least likely to be some
small advantage -- that's definitely my goal. Importantly, we may not
really need to accurately model the costs where the new approach turns
out to be much faster. The only essential thing is that we avoid cases
where the new approach is much slower than the old approach. Which is
possible (in at least some cases) by making the runtime behavior
adaptive.

The best decision that the planner can make may be no decision at all.
Better to wait until runtime where at all possible, since that gives
us the latest and most accurate picture of things.


But I'm not sure about this, so it seems to me that this is a problem of
improper use of indexes rather.

It's hard to say how true that is.

Certainly, workloads similar to the TPC-DS benchmark kinda need
something like MDAM. It's just not practical to have enough indexes to
support every possible query -- the benchmark is deliberately designed
to have unpredictable, hard-to-optimize access paths. It seems to
require having fewer, more general indexes that can support
multi-dimensional access reasonably efficiently.

Of course, with OLTP it's much more likely that the workload will have
predictable access patterns. That makes having exactly the right
indexes much more practical. So maybe you're right there. But, I still
see a lot of value in a design that is as forgiving as possible. Users
really like that kind of thing in my experience.
I tend to agree with you, but a runtime estimate cannot give us an 
accurate picture when using indexes correctly or
any other optimizations due to the unstable state of the environment in 
which the query is executed.

I believe that a more complex analysis is needed here.

Currently, the optimizer doesn't recognize multi-column indexes with
SAOPs on every column as having a valid sort order, except on the
first column. It seems possible that that has consequences for your
patch. (I'm really only guessing, though; don't trust anything that I
say about the optimizer too much.)


Honestly, I couldn't understand your concerns very well, could you
describe it in more detail?

Well, I'm not sure if there is any possible scenario where the
transformation from your patch makes it possible to go from an access
path that has a valid sort order (usually because there is an
underlying index scan) into an access path that doesn't. In fact, the
opposite situation seems more likely (which is good news) --
especially if you assume that my own patch is also present.

Going from a bitmap OR (which cannot return sorted output) to a
multi-column SAOP index scan (which now can) may have significant
value in some specific circumstances. Most obviously, it's really
useful when it enables us to feed tuples into a GroupAggregate without
a separate sort step, and without a hash aggregate (that's why I see
value in combining your patch with my own patch). You just need to be
careful about allowing the opposite situation to take place.

More generally, there is a need to think about strange second order
effects. We want to be open to useful second order effects that make
query execution much faster in some specific context, while avoiding
harmful second order effects. Intuitively, I think that it should be
possible to do this with the transformations performed by your patch.

In other words, "helpful serendipity" is an important advantage, while
"harmful anti-serendipity" is what we really want to avoid. Ideally by
making the harmful cases impossible "by construction".

I noticed only one thing there: when we have unsorted array values in 
SOAP, the query takes longer than
when it has a sorted array. I'll double-check it just in case and write 
about the results later.


I am also testing some experience with multi-column indexes using SAOPs.

--
Regards,
Alena Rybakina
Postgres Professional





Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

2023-07-31 Thread Alena Rybakina

Hi, all!


  CNF -> DNF conversion
=

Like many great papers, the MDAM paper takes one core idea, and finds
ways to leverage it to the hilt. Here the core idea is to take
predicates in conjunctive normal form (an "AND of ORs"), and convert
them into disjunctive normal form (an "OR of ANDs"). DNF quals are
logically equivalent to CNF quals, but ideally suited to SAOP-array
style processing by an ordered B-Tree index scan -- they reduce
everything to a series of non-overlapping primitive index scans, that
can be processed in keyspace order. We already do this today in the
case of SAOPs, in effect. The nbtree "next array keys" state machine
already materializes values that can be seen as MDAM style DNF single
value predicates. The state machine works by outputting the cartesian
product of each array as a multi-column index is scanned, but that
could be taken a lot further in the future. We can use essentially the
same kind of state machine to do everything described in the paper --
ultimately, it just needs to output a list of disjuncts, like the DNF
clauses that the paper shows in "Table 3".

In theory, anything can be supported via a sufficiently complete CNF
-> DNF conversion framework. There will likely always be the potential
for unsafe/unsupported clauses and/or types in an extensible system
like Postgres, though. So we will probably need to retain some notion
of safety. It seems like more of a job for nbtree preprocessing (or
some suitably index-AM-agnostic version of the same idea) than the
optimizer, in any case. But that's not entirely true, either (that
would be far too easy).

The optimizer still needs to optimize. It can't very well do that
without having some kind of advanced notice of what is and is not
supported by the index AM. And, the index AM cannot just unilaterally
decide that index quals actually should be treated as filter/qpquals,
after all -- it doesn't get a veto. So there is a mutual dependency
that needs to be resolved. I suspect that there needs to be a two way
conversation between the optimizer and nbtree code to break the
dependency -- a callback that does some of the preprocessing work
during planning. Tom said something along the same lines in passing,
when discussing the MDAM paper last year [2]. Much work remains here.

Honestly, I'm just reading and delving into this thread and other topics 
related to it, so excuse me if I ask you a few obvious questions.


I noticed that you are going to add CNF->DNF transformation at the index 
construction stage. If I understand correctly, you will rewrite 
restrictinfo node,
change boolean "AND" expressions to "OR" expressions, but would it be 
possible to apply such a procedure earlier? Otherwise I suppose you 
could face the problem of
incorrect selectivity of the calculation and, consequently, the 
cardinality calculation?
I can't clearly understand at what stage it is clear that the such a 
transformation needs to be applied?


--
Regards,
Alena Rybakina
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2023-07-26 Thread Alena Rybakina

Hi!

On 26.07.2023 02:47, Peter Geoghegan wrote:

On Thu, Jun 29, 2023 at 2:32 AM Alena Rybakina  wrote:

Hi! I'm sorry I didn't answer you right away, I was too busy with work.

Same for me, this time. I was busy working on my patch, which I
finally posted yesterday.
I'm glad to hear it, I've seen your thread ("Optimizing nbtree 
ScalarArrayOp execution, allowing multi-column ordered scans, skip 
scan"), but, unfortunately, I didn't have enough time to read it. I'll 
review it soon!

To be honest, I didn't think about the fact that my optimization can help 
eliminate duplicates before reading the data before.

I'm not surprised that you didn't specifically think of that, because
it's very subtle.


I am still only in the process of familiarizing myself with the thread [1] (reference 
from your letter), but I have already seen that there are problems related, for example, 
to when "or" expressions refer to the parent element.

I didn't intend to imply that you might have the same problem here. I
just meant that OR optimizations can have problems with duplicate
elimination, in general. I suspect that your patch doesn't have that
problem, because you are performing a transformation of one kind of OR
into another kind of OR.


Yes, you are right, but I studied this topic and two other sources to 
accumulate my knowledge. It was an exciting experience for me)


I was especially inspired after studying the interview with Goetz Graf 
[2], his life experience is the most inspiring, and from this article I 
was able to get a broad understanding of the field of databases:
current problems, future development, how it works... Thank you for the 
recommendation.


I discovered for myself that the idea described in the article [1] is 
similar to the idea of representing grouped data in OLAP cubes, and 
also, if I saw correctly, an algorithm like depth-first search is used 
there, but for indexes.


I think it really helps to speed up the search with similar deep 
filtering compared to cluster indexes, but do we have cases where we 
don't use this algorithm because it takes longer than an usual index?
I thought about the situation with wide indexes (with a lot of multiple 
columns) and having a lot of filtering predicates for them.
But I'm not sure about this, so it seems to me that this is a problem of 
improper use of indexes rather.

I think, I would face the similar problems if I complicate the current code, 
for example, so that not only or expressions standing on the same level are 
written in any, but also on different ones without violating the logic of the 
priority of executing operators.

I can't say that I am particularly experienced in this general area --
I have never tried to formally reason about how two different
statements are equivalent. It just hasn't been something that I've
needed to have a totally rigorous understanding of up until now. But
my recent patch changes that. Now I need to study this area to make
sure that I have a truly rigorous understanding.

Jeff Davis suggested that I read "Logic and Databases", by C.J. Date.
So now I have some homework to do.
I'll read this book too. Maybe I can finish work with the knowledge I 
got from there. Thank you for sharing!

Unfortunately, when I tried to make a transformation at the stage of index 
formation, I encountered too incorrect an assessment of the selectivity of 
relation, which affected the incorrect calculation of the cost and cardinality.

It's not surprising that a weird shift in the plan chosen by the
optimizer is seen with some random test case, as a result of this
added transformation. Even changes that are 100% strictly better (e.g.
changes in a selectivity estimation function that is somehow
guaranteed to be more accurate in all cases) might do that. Here is a
recent example of that with another patch, involving a bitmap OR:

https://postgr.es/m/cah2-wzncdk9n2tz6j_-iln563_epuc3nzp6vsvtl6jhzs6n...@mail.gmail.com
At first, this surprised me very much. It took time to find a suitable 
place to implement the transformation.


I have looked through this thread many times, I will study it in more 
detail .

This example was *not* a regression, if you go by conventional
measures. It was merely a less robust plan than the bitmap OR plan,
because it didn't pass down both columns as index quals.

BTW, there are various restrictions on the sort order of SAOPs that
you might want to try to familiarize yourself with. I describe them
(perhaps not very clearly) here:

https://postgr.es/m/CAH2-Wz=ksvn_sjcnd1+bt-wtifra5ok48adynq3pkkhxgmq...@mail.gmail.com

Thank you! Yes, I'll study it too)

Currently, the optimizer doesn't recognize multi-column indexes with
SAOPs on every column as having a valid sort order, except on the
first column. It seems possible that that has consequences for your
patch. (I'm really only guessing, though; don't trust anything that I
say about the optimizer too much.)

Honestly, I couldn't unde

Re: POC, WIP: OR-clause support for indexes

2023-07-18 Thread Alena Rybakina

Hi, all!

I sent a patch to commitfest and noticed that the authors and the 
reviewer were incorrectly marked.


Sorry about that. I fixed it and sent the current version of the patch.

--
Regards,
Alena Rybakina
Postgres Professional
From 087125cc413429bda05f22ebbd51115c23819285 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Tue, 18 Jul 2023 17:19:53 +0300
Subject: [PATCH] Replace OR clause to ANY expressions. Replace (X=N1) OR
 (X=N2) ... with X = ANY(N1, N2) on the stage of the optimiser when we are
 still working with a tree expression. Firstly, we do not try to make a
 transformation for "non-or" expressions or inequalities and the creation of a
 relation with "or" expressions occurs according to the same scenario.
 Secondly, we do not make transformations if there are less than set
 or_transform_limit. Thirdly, it is worth considering that we consider "or"
 expressions only at the current level.

Authors: Alena Rybakina , Andrey Lepikhov 
Reviewed-by: Ranier Vilela 
---
 src/backend/parser/parse_expr.c   | 230 +-
 src/backend/utils/misc/guc_tables.c   |  10 +
 src/include/parser/parse_expr.h   |   1 +
 src/test/regress/expected/create_index.out| 115 +
 src/test/regress/expected/guc.out |   3 +-
 src/test/regress/expected/join.out|  50 
 src/test/regress/expected/partition_prune.out | 179 ++
 src/test/regress/expected/tidscan.out |  17 ++
 src/test/regress/sql/create_index.sql |  32 +++
 src/test/regress/sql/join.sql |  10 +
 src/test/regress/sql/partition_prune.sql  |  22 ++
 src/test/regress/sql/tidscan.sql  |   6 +
 src/tools/pgindent/typedefs.list  |   1 +
 13 files changed, 674 insertions(+), 2 deletions(-)

diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 5a05caa8744..b2294af0f43 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -43,6 +43,7 @@
 
 /* GUC parameters */
 bool		Transform_null_equals = false;
+int			or_transform_limit = 500;
 
 
 static Node *transformExprRecurse(ParseState *pstate, Node *expr);
@@ -95,6 +96,233 @@ static Expr *make_distinct_op(ParseState *pstate, List *opname,
 static Node *make_nulltest_from_distinct(ParseState *pstate,
 		 A_Expr *distincta, Node *arg);
 
+typedef struct OrClauseGroupEntry
+{
+	Node		   *node;
+	List		   *consts;
+	Oidscalar_type;
+	Oidopno;
+	Expr 		   *expr;
+} OrClauseGroupEntry;
+
+static Node *
+transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig)
+{
+	List		   *or_list = NIL;
+	List		   *groups_list = NIL;
+	ListCell	   *lc;
+
+	/* If this is not an 'OR' expression, skip the transformation */
+	if (expr_orig->boolop != OR_EXPR ||
+		list_length(expr_orig->args) < or_transform_limit)
+		return transformBoolExpr(pstate, (BoolExpr *) expr_orig);
+
+	foreach(lc, expr_orig->args)
+	{
+		Node			   *arg = lfirst(lc);
+		Node			   *orqual;
+		Node			   *const_expr;
+		Node			   *nconst_expr;
+		ListCell		   *lc_groups;
+		OrClauseGroupEntry *gentry;
+
+		/* At first, transform the arg and evaluate constant expressions. */
+		orqual = transformExprRecurse(pstate, (Node *) arg);
+		orqual = coerce_to_boolean(pstate, orqual, "OR");
+		orqual = eval_const_expressions(NULL, orqual);
+
+		if (!IsA(orqual, OpExpr))
+		{
+			or_list = lappend(or_list, orqual);
+			continue;
+		}
+
+		/*
+		 * Detect the constant side of the clause. Recall non-constant
+		 * expression can be made not only with Vars, but also with Params,
+		 * which is not bonded with any relation. Thus, we detect the const
+		 * side - if another side is constant too, the orqual couldn't be
+		 * an OpExpr.
+		 * Get pointers to constant and expression sides of the qual.
+		 */
+		if (IsA(get_leftop(orqual), Const))
+		{
+			nconst_expr = get_rightop(orqual);
+			const_expr = get_leftop(orqual);
+		}
+		else if (IsA(get_rightop(orqual), Const))
+		{
+			const_expr = get_rightop(orqual);
+			nconst_expr = get_leftop(orqual);
+		}
+		else
+		{
+			or_list = lappend(or_list, orqual);
+			continue;
+		}
+
+		if (!op_mergejoinable(((OpExpr *) orqual)->opno, exprType(nconst_expr)))
+		{
+			or_list = lappend(or_list, orqual);
+			continue;
+		}
+
+		/*
+		* At this point we definitely have a transformable clause.
+		* Classify it and add into specific group of clauses, or create new
+		* group.
+		* TODO: to manage complexity in the case of many different clauses
+		* (X1=C1) OR (X2=C2 OR) ... (XN = CN) we could invent something
+		* like a hash table. But also we believe, that the case of many
+		* different variable sides is very rare.
+		*/
+		foreach(lc_groups, groups_list)
+		{
+			OrClauseGroupEntry *v = (OrClauseGroupEntry *) lfirst(lc_groups);
+
+			Assert(v->node != NULL);
+
+			if (equal(v->node, nconst_expr))
+			{
+v->consts = lap

Re: Problems with estimating OR conditions, IS NULL on LEFT JOINs

2023-07-16 Thread Alena Rybakina

Hi,

I'm still working on it, but, unfortunately, I didn't have much time to 
work with it well enough that there would be something that could be shown.
Now I am trying to sort out the problems that I drew attention to in the 
previous letter.


--
Regards,
Alena Rybakina
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2023-07-11 Thread Alena Rybakina

Hi!

On 11.07.2023 11:47, Andrey Lepikhov wrote:
This patch looks much better than earlier. But it definitely needs 
some covering with tests. As a first simple approximation, here you 
can see the result of regression tests, where the transformation limit 
is set to 0. See in the attachment some test changes induced by these 
diffs.


Yes, I think so too. I also added some tests. I have attached an 
additional diff-5.diff where you can see the changes.

Also, I see some impact of the transformation to other queries:
create_view.out:
(NOT x > z) > (x <= z)
inherit.out:
(((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
to -
(((a)::text = ANY ('{NULL,cd}'::text[])) OR ((a)::text = 'ab'::text))

Transformations, mentioned above, are correct, of course. But it can 
be a sign of possible unstable behavior.


I think it can be made more stable if we always add the existing 
transformed expressions first, and then the original ones, or vice versa. T


o do this, we will need two more lists, I think, and then we can combine 
them, where the elements of the second will be written to the end of the 
first.


But I suppose that this may not be the only unstable behavior - I 
suppose we need sorting result elements on the left side, what do you think?


--
Regards,
Alena Rybakina
Postgres Professional
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index cc229d4dcaf..60e053d2179 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -1922,81 +1922,6 @@ SELECT count(*) FROM tenk1
 10
 (1 row)
 
-EXPLAIN (COSTS OFF)
-SELECT count(*) FROM tenk1
-  WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3) OR thousand = 41;
-   QUERY PLAN   
-
- Aggregate
-   ->  Bitmap Heap Scan on tenk1
- Recheck Cond: (((thousand = 42) AND (tenthous = ANY ('{1,3}'::integer[]))) OR (thousand = 41))
- ->  BitmapOr
-   ->  Bitmap Index Scan on tenk1_thous_tenthous
- Index Cond: ((thousand = 42) AND (tenthous = ANY ('{1,3}'::integer[])))
-   ->  Bitmap Index Scan on tenk1_thous_tenthous
- Index Cond: (thousand = 41)
-(8 rows)
-
-SELECT count(*) FROM tenk1
-  WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3) OR thousand = 41;
- count 

-10
-(1 row)
-
-EXPLAIN (COSTS OFF)
-SELECT count(*) FROM tenk1
-  WHERE hundred = 42 AND (thousand = 42 OR thousand = 99 OR tenthous < 2) OR thousand = 41;
- QUERY PLAN  
--
- Aggregate
-   ->  Bitmap Heap Scan on tenk1
- Recheck Cond: (((hundred = 42) AND ((tenthous < 2) OR (thousand = ANY ('{42,99}'::integer[] OR (thousand = 41))
- ->  BitmapOr
-   ->  BitmapAnd
- ->  Bitmap Index Scan on tenk1_hundred
-   Index Cond: (hundred = 42)
- ->  BitmapOr
-   ->  Bitmap Index Scan on tenk1_thous_tenthous
- Index Cond: (tenthous < 2)
-   ->  Bitmap Index Scan on tenk1_thous_tenthous
- Index Cond: (thousand = ANY ('{42,99}'::integer[]))
-   ->  Bitmap Index Scan on tenk1_thous_tenthous
- Index Cond: (thousand = 41)
-(14 rows)
-
-SELECT count(*) FROM tenk1
-  WHERE hundred = 42 AND (thousand = 42 OR thousand = 99 OR tenthous < 2) OR thousand = 41;
- count 

-20
-(1 row)
-
-EXPLAIN (COSTS OFF)
-SELECT count(*) FROM tenk1
-  WHERE hundred = 42 AND (thousand = 42 OR thousand = 41 OR thousand = 99 AND tenthous = 2);
-  QUERY PLAN  
---
- Aggregate
-   ->  Bitmap Heap Scan on tenk1
- Recheck Cond: ((hundred = 42) AND (((thousand = 99) AND (tenthous = 2)) OR (thousand = ANY ('{42,41}'::integer[]
- ->  BitmapAnd
-   ->  Bitmap Index Scan on tenk1_hundred
- Index Cond: (hundred = 42)
-   ->  BitmapOr
- ->  Bitmap Index Scan on tenk1_thous_tenthous
-   Index Cond: ((thousand = 99) AND (tenthous = 2))
- ->  Bitmap Index Scan on tenk1_thous_tenthous
-   Index Cond: (thousand = AN

Re: POC, WIP: OR-clause support for indexes

2023-07-11 Thread Alena Rybakina

On 11.07.2023 16:29, Ranier Vilela wrote:
Em ter., 11 de jul. de 2023 às 09:29, Alena Rybakina 
 escreveu:


Hi!

On 10.07.2023 15:15, Ranier Vilela wrote:

Em seg., 10 de jul. de 2023 às 09:03, Ranier Vilela
 escreveu:

Hi Alena,

Em seg., 10 de jul. de 2023 às 05:38, Alena Rybakina
 escreveu:

I agreed with the changes. Thank you for your work.

I updated patch and added you to the authors.

I specified Ranier Vilela as a reviewer.

Is a good habit when post a new version of the patch, name it
v1, v2, v3,etc.
Makes it easy to follow development and references on the thread.


Sorry, I fixed it.


Regarding the last patch.
1. I think that variable const_is_left is not necessary.
You can stick with:
+ if (IsA(get_leftop(orqual), Const))
+ nconst_expr =get_rightop(orqual);
+ const_expr = get_leftop(orqual) ;
+ else if (IsA(get_rightop(orqual), Const))
+ nconst_expr =get_leftop(orqual);
+ const_expr = get_rightop(orqual) ;
+ else
+ {
+ or_list = lappend(or_list, orqual);
+ continue;
+ }


Agreed.

You missed in removing the declaration
- bool const_is_left = true;

Yes, thank you. I fixed it.

.



2. Test scalar_type != RECORDOID is more cheaper,
mainly if OidIsValid were a function, we knows that is a macro.
+ if (scalar_type != RECORDOID && OidIsValid(scalar_type))


Is it safe? Maybe we should first make sure that it can be checked
on RECORDOID at all?

Yes it's safe, because && connector.
But you can leave as is in v5.


Added it.

--
Regards,
Alena Rybakina
Postgres Professional
From d4ca399ec5a1866025e6cea5ed5cf0e231af38e3 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Tue, 11 Jul 2023 17:10:25 +0300
Subject: [PATCH] Replace OR clause to ANY expressions. Replace (X=N1) OR
 (X=N2) ... with X = ANY(N1, N2) on the stage of the optimiser when we are
 still working with a tree expression. Firstly, we do not try to make a
 transformation for "non-or" expressions or inequalities and the creation of a
 relation with "or" expressions occurs according to the same scenario.
 Secondly, we do not make transformations if there are less than 500 or
 expressions. Thirdly, it is worth considering that we consider "or"
 expressions only at the current level.

---
 src/backend/parser/parse_expr.c  | 231 ++-
 src/tools/pgindent/typedefs.list |   1 +
 2 files changed, 231 insertions(+), 1 deletion(-)

diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 346fd272b6d..82d51035684 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -95,6 +95,235 @@ static Expr *make_distinct_op(ParseState *pstate, List *opname,
 static Node *make_nulltest_from_distinct(ParseState *pstate,
 		 A_Expr *distincta, Node *arg);
 
+typedef struct OrClauseGroupEntry
+{
+	Node		   *node;
+	List		   *consts;
+	Oidscalar_type;
+	Oidopno;
+	Expr 		   *expr;
+} OrClauseGroupEntry;
+
+static int const_transform_or_limit = 500;
+
+static Node *
+transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig)
+{
+	List		   *or_list = NIL;
+	List		   *groups_list = NIL;
+	ListCell	   *lc;
+
+	/* If this is not an 'OR' expression, skip the transformation */
+	if (expr_orig->boolop != OR_EXPR ||
+		list_length(expr_orig->args) < const_transform_or_limit)
+		return transformBoolExpr(pstate, (BoolExpr *) expr_orig);
+
+	foreach(lc, expr_orig->args)
+	{
+		Node			   *arg = lfirst(lc);
+		Node			   *orqual;
+		Node			   *const_expr;
+		Node			   *nconst_expr;
+		ListCell		   *lc_groups;
+		OrClauseGroupEntry *gentry;
+
+		/* At first, transform the arg and evaluate constant expressions. */
+		orqual = transformExprRecurse(pstate, (Node *) arg);
+		orqual = coerce_to_boolean(pstate, orqual, "OR");
+		orqual = eval_const_expressions(NULL, orqual);
+
+		if (!IsA(orqual, OpExpr))
+		{
+			or_list = lappend(or_list, orqual);
+			continue;
+		}
+
+		/*
+		 * Detect the constant side of the clause. Recall non-constant
+		 * expression can be made not only with Vars, but also with Params,
+		 * which is not bonded with any relation. Thus, we detect the const
+		 * side - if another side is constant too, the orqual couldn't be
+		 * an OpExpr.
+		 * Get pointers to constant and expression sides of the qual.
+		 */
+		if (IsA(get_leftop(orqual), Const))
+		{
+			nconst_expr = get_rightop(orqual);
+			const_expr = get_leftop(orqual);
+		}
+		else if (IsA(get_rightop(orqual), Const))
+		{
+			const_expr = get_rightop(orqual);
+			nconst_expr = get_leftop(orqual);
+		}
+		else
+		{
+			or_list = lappend(or_list, orqual);
+			continue;
+		}
+
+		if (!op_mergejoinable(((OpExpr *) orqual)->opno, exprType(nconst_expr)))
+		{
+			or_list = lap

Re: POC, WIP: OR-clause support for indexes

2023-07-11 Thread Alena Rybakina

Hi!

On 10.07.2023 15:15, Ranier Vilela wrote:
Em seg., 10 de jul. de 2023 às 09:03, Ranier Vilela 
 escreveu:


Hi Alena,

Em seg., 10 de jul. de 2023 às 05:38, Alena Rybakina
 escreveu:

I agreed with the changes. Thank you for your work.

I updated patch and added you to the authors.

I specified Ranier Vilela as a reviewer.

Is a good habit when post a new version of the patch, name it v1,
v2, v3,etc.
Makes it easy to follow development and references on the thread.


Sorry, I fixed it.


Regarding the last patch.
1. I think that variable const_is_left is not necessary.
You can stick with:
+ if (IsA(get_leftop(orqual), Const))
+ nconst_expr =get_rightop(orqual);
+ const_expr = get_leftop(orqual) ;
+ else if (IsA(get_rightop(orqual), Const))
+ nconst_expr =get_leftop(orqual);
+ const_expr = get_rightop(orqual) ;
+ else
+ {
+ or_list = lappend(or_list, orqual);
+ continue;
+ }


Agreed.



2. Test scalar_type != RECORDOID is more cheaper,
mainly if OidIsValid were a function, we knows that is a macro.
+ if (scalar_type != RECORDOID && OidIsValid(scalar_type))

Is it safe? Maybe we should first make sure that it can be checked on 
RECORDOID at all?


3. Sorry about wrong tip about array_type, but if really necessary,
better use it.
+ newa->element_typeid = scalar_type;
+ newa->array_typeid = array_type;


Agreed.



4. Is a good habit, call free last, to avoid somebody accidentally
using it.
+ or_list = lappend(or_list, gentry->expr);
+ list_free(gentry->consts);
+ continue;

No, this is not necessary, because we add the original expression in 
these places to the resulting list and later
we will not use the list of constants for this group at all, otherwise 
it would be an error.


--
Regards,
Alena Rybakina
Postgres Professional
From 55e56f1557ca6879eae6ea62845e180ebfd04662 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Tue, 11 Jul 2023 15:19:22 +0300
Subject: [PATCH] Replace OR clause to ANY expressions. Replace (X=N1) OR
 (X=N2) ... with X = ANY(N1, N2) on the stage of the optimiser when we are
 still working with a tree expression. Firstly, we do not try to make a
 transformation for "non-or" expressions or inequalities and the creation of a
 relation with "or" expressions occurs according to the same scenario.
 Secondly, we do not make transformations if there are less than 500 or
 expressions. Thirdly, it is worth considering that we consider "or"
 expressions only at the current level.

---
 src/backend/parser/parse_expr.c  | 232 ++-
 src/tools/pgindent/typedefs.list |   1 +
 2 files changed, 232 insertions(+), 1 deletion(-)

diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 346fd272b6d..36b50b6441d 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -95,6 +95,236 @@ static Expr *make_distinct_op(ParseState *pstate, List *opname,
 static Node *make_nulltest_from_distinct(ParseState *pstate,
 		 A_Expr *distincta, Node *arg);
 
+typedef struct OrClauseGroupEntry
+{
+	Node		   *node;
+	List		   *consts;
+	Oidscalar_type;
+	Oidopno;
+	Expr 		   *expr;
+} OrClauseGroupEntry;
+
+static int const_transform_or_limit = 500;
+
+static Node *
+transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig)
+{
+	List		   *or_list = NIL;
+	List		   *groups_list = NIL;
+	ListCell	   *lc;
+
+	/* If this is not an 'OR' expression, skip the transformation */
+	if (expr_orig->boolop != OR_EXPR ||
+		list_length(expr_orig->args) < const_transform_or_limit)
+		return transformBoolExpr(pstate, (BoolExpr *) expr_orig);
+
+	foreach(lc, expr_orig->args)
+	{
+		Node			   *arg = lfirst(lc);
+		Node			   *orqual;
+		Node			   *const_expr;
+		Node			   *nconst_expr;
+		ListCell		   *lc_groups;
+		OrClauseGroupEntry *gentry;
+		boolconst_is_left = true;
+
+		/* At first, transform the arg and evaluate constant expressions. */
+		orqual = transformExprRecurse(pstate, (Node *) arg);
+		orqual = coerce_to_boolean(pstate, orqual, "OR");
+		orqual = eval_const_expressions(NULL, orqual);
+
+		if (!IsA(orqual, OpExpr))
+		{
+			or_list = lappend(or_list, orqual);
+			continue;
+		}
+
+		/*
+		 * Detect the constant side of the clause. Recall non-constant
+		 * expression can be made not only with Vars, but also with Params,
+		 * which is not bonded with any relation. Thus, we detect the const
+		 * side - if another side is constant too, the orqual couldn't be
+		 * an OpExpr.
+		 * Get pointers to constant and expression sides of the qual.
+		 */
+		if (IsA(get_leftop(orqual), Const))
+		{
+			nconst_expr = get_rightop(orqual);
+			const_expr = get_leftop(orqual);
+		}
+		else if (IsA(get_rightop(orqual), Const))
+		{
+			const_expr = get_rightop(orqual);
+			nconst_expr = get

Re: Problems with estimating OR conditions, IS NULL on LEFT JOINs

2023-07-10 Thread Alena Rybakina
g Time: 0.406 ms
 Execution Time: 130.109 ms
(14 rows)

If you disable merge join with the original code, then the plans 
coincide, as well as the error value approximately, only in one case 
cardinality

overestimation occurs in the other vice versa.

QUERY PLAN
-
 Hash Left Join  (cost=1347.00..3915.00 rows=75082 width=16) (actual 
time=43.625..68.901 rows=5 loops=1)

   Hash Cond: (large.id = small.id)
   Filter: ((large.a = 1) OR (small.b = 1))
   Rows Removed by Filter: 5
   ->  Seq Scan on large  (cost=0.00..1443.00 rows=10 width=8) 
(actual time=0.008..9.895 rows=10 loops=1)
   ->  Hash  (cost=722.00..722.00 rows=5 width=8) (actual 
time=22.546..22.548 rows=5 loops=1)

 Buckets: 65536  Batches: 1  Memory Usage: 2466kB
 ->  Seq Scan on small  (cost=0.00..722.00 rows=5 width=8) 
(actual time=0.006..7.218 rows=5 loops=1)

 Planning Time: 0.239 ms
 Execution Time: 70.860 ms
(10 rows)

--
Regards,
Alena Rybakina
Postgres Professional
From 47daf6945403425f3a09a2df450a2cd2e96ae23c Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Mon, 10 Jul 2023 17:29:41 +0300
Subject: [PATCH] Fix the case of calculating the underestimated cardinality
 for LEFT/RIGHT/FULL OUTER JOIN, where zero elements were not taken into
 account as a result of the connection operation. This patch fixes this by
 calculating the fraction of zero values on the one side of the join without
 of matching row on the other side.

Co-authored-by: Alena Rybakina 
---
 src/backend/utils/adt/array_selfuncs.c |   2 +-
 src/backend/utils/adt/selfuncs.c   | 314 ++---
 src/include/nodes/pathnodes.h  |   3 +
 src/include/utils/selfuncs.h   |   2 +-
 4 files changed, 178 insertions(+), 143 deletions(-)

diff --git a/src/backend/utils/adt/array_selfuncs.c b/src/backend/utils/adt/array_selfuncs.c
index 9207a5ed193..56ce460cf05 100644
--- a/src/backend/utils/adt/array_selfuncs.c
+++ b/src/backend/utils/adt/array_selfuncs.c
@@ -93,7 +93,7 @@ scalararraysel_containment(PlannerInfo *root,
 	/*
 	 * rightop must be a variable, else punt.
 	 */
-	examine_variable(root, rightop, varRelid, );
+	examine_variable(root, rightop, varRelid, NULL, );
 	if (!vardata.rel)
 	{
 		ReleaseVariableStats(vardata);
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index c4fcd0076ea..072579dbfd5 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -153,7 +153,7 @@ static double eqjoinsel_inner(Oid opfuncoid, Oid collation,
 			  bool isdefault1, bool isdefault2,
 			  AttStatsSlot *sslot1, AttStatsSlot *sslot2,
 			  Form_pg_statistic stats1, Form_pg_statistic stats2,
-			  bool have_mcvs1, bool have_mcvs2);
+			  bool have_mcvs1, bool have_mcvs2, double *unmatched_frac);
 static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
 			 VariableStatData *vardata1, VariableStatData *vardata2,
 			 double nd1, double nd2,
@@ -1513,7 +1513,7 @@ boolvarsel(PlannerInfo *root, Node *arg, int varRelid)
 	VariableStatData vardata;
 	double		selec;
 
-	examine_variable(root, arg, varRelid, );
+	examine_variable(root, arg, varRelid, NULL, );
 	if (HeapTupleIsValid(vardata.statsTuple))
 	{
 		/*
@@ -1541,110 +1541,20 @@ booltestsel(PlannerInfo *root, BoolTestType booltesttype, Node *arg,
 {
 	VariableStatData vardata;
 	double		selec;
+	double		freq_null;
+	AttStatsSlot sslot;
 
-	examine_variable(root, arg, varRelid, );
+	examine_variable(root, arg, varRelid, sjinfo, );
 
-	if (HeapTupleIsValid(vardata.statsTuple))
+	if (sjinfo)
+	{
+		freq_null = sjinfo->unmatched_frac;
+	}
+	else if(HeapTupleIsValid(vardata.statsTuple))
 	{
 		Form_pg_statistic stats;
-		double		freq_null;
-		AttStatsSlot sslot;
-
 		stats = (Form_pg_statistic) GETSTRUCT(vardata.statsTuple);
 		freq_null = stats->stanullfrac;
-
-		if (get_attstatsslot(, vardata.statsTuple,
-			 STATISTIC_KIND_MCV, InvalidOid,
-			 ATTSTATSSLOT_VALUES | ATTSTATSSLOT_NUMBERS)
-			&& sslot.nnumbers > 0)
-		{
-			double		freq_true;
-			double		freq_false;
-
-			/*
-			 * Get first MCV frequency and derive frequency for true.
-			 */
-			if (DatumGetBool(sslot.values[0]))
-freq_true = sslot.numbers[0];
-			else
-freq_true = 1.0 - sslot.numbers[0] - freq_null;
-
-			/*
-			 * Next derive frequency for false. Then use these as appropriate
-			 * to derive frequency for each case.
-			 */
-			freq_false = 1.0 - freq_true - freq_null;
-
-			switch (booltesttype)
-			{
-case IS_UNKNOWN:
-	/* select only NULL values */
-	selec = freq_null;
-	break;
-case IS_NOT_UNKNOWN:
-	/* select non-NULL values */
-	selec = 1.0 - freq_null;
-	break;
-case IS_TRUE:
-	/* select only TRUE values */
-	selec = freq_true;
-	break;
-case IS_NOT_TRUE:
-	/* select non-TRUE value

Re: POC, WIP: OR-clause support for indexes

2023-07-10 Thread Alena Rybakina

I agreed with the changes. Thank you for your work.

I updated patch and added you to the authors.

I specified Ranier Vilela as a reviewer.

On 10.07.2023 06:12, Andrey Lepikhov wrote:

On 7/7/2023 15:20, Alena Rybakina wrote:


because we will provide similar manipulation in this:

foreach(l, gentry->consts)
{
       Node       *rexpr = (Node *) lfirst(l);

       rexpr = coerce_to_common_type(pstate, rexpr,
                                             scalar_type,
                                             "IN");
      aexprs = lappend(aexprs, rexpr);
}

I'm not sure that it should be replaced.
In attachment - a bit more corrections to the patch.
The most important change - or_list contains already transformed 
expression subtree. So, I think we don't need to free it at all.



--
Regards,
Alena Rybakina
Postgres Professional
From 97239cdeac541d3aaeafde3d15a5b6fdc36f86de Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Thu, 29 Jun 2023 17:46:58 +0300
Subject: [PATCH] Replace OR clause to ANY expressions. Replace (X=N1) OR
 (X=N2) ... with X = ANY(N1, N2) on the stage of the optimiser when we are
 still working with a tree expression. Firstly, we do not try to make a
 transformation for "non-or" expressions or inequalities and the creation of a
 relation with "or" expressions occurs according to the same scenario.
 Secondly, we do not make transformations if there are less than 500 or
 expressions. Thirdly, it is worth considering that we consider "or"
 expressions only at the current level.

Authors: Alena Rybakina , Andrey Lepikhov 
Reviewed-by: Ranier Vilela 
---
 src/backend/parser/parse_expr.c  | 231 ++-
 src/tools/pgindent/typedefs.list |   1 +
 2 files changed, 231 insertions(+), 1 deletion(-)

diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 346fd272b6d..0ddcb880ef8 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -95,6 +95,235 @@ static Expr *make_distinct_op(ParseState *pstate, List *opname,
 static Node *make_nulltest_from_distinct(ParseState *pstate,
 		 A_Expr *distincta, Node *arg);
 
+typedef struct OrClauseGroupEntry
+{
+	Node		   *node;
+	List		   *consts;
+	Oidscalar_type;
+	Oidopno;
+	Expr 		   *expr;
+} OrClauseGroupEntry;
+
+static int const_transform_or_limit = 500;
+
+static Node *
+transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig)
+{
+	List		   *or_list = NIL;
+	List		   *groups_list = NIL;
+	ListCell	   *lc;
+
+	/* If this is not an 'OR' expression, skip the transformation */
+	if (expr_orig->boolop != OR_EXPR ||
+		list_length(expr_orig->args) < const_transform_or_limit)
+		return transformBoolExpr(pstate, (BoolExpr *) expr_orig);
+
+	foreach(lc, expr_orig->args)
+	{
+		Node			   *arg = lfirst(lc);
+		Node			   *orqual;
+		Node			   *const_expr;
+		Node			   *nconst_expr;
+		ListCell		   *lc_groups;
+		OrClauseGroupEntry *gentry;
+		boolconst_is_left = true;
+
+		/* At first, transform the arg and evaluate constant expressions. */
+		orqual = transformExprRecurse(pstate, (Node *) arg);
+		orqual = coerce_to_boolean(pstate, orqual, "OR");
+		orqual = eval_const_expressions(NULL, orqual);
+
+		if (!IsA(orqual, OpExpr))
+		{
+			or_list = lappend(or_list, orqual);
+			continue;
+		}
+
+		/*
+		 * Detect the constant side of the clause. Recall non-constant
+		 * expression can be made not only with Vars, but also with Params,
+		 * which is not bonded with any relation. Thus, we detect the const
+		 * side - if another side is constant too, the orqual couldn't be
+		 * an OpExpr.
+		 */
+		if (IsA(get_leftop(orqual), Const))
+			const_is_left = true;
+		else if (IsA(get_rightop(orqual), Const))
+			const_is_left = false;
+		else
+		{
+			or_list = lappend(or_list, orqual);
+			continue;
+		}
+
+		/* Get pointers to constant and expression sides of the qual */
+		nconst_expr = (const_is_left) ? get_rightop(orqual) :
+		get_leftop(orqual);
+		const_expr = (const_is_left) ?  get_leftop(orqual) :
+		get_rightop(orqual);
+
+		if (!op_mergejoinable(((OpExpr *) orqual)->opno, exprType(nconst_expr)))
+		{
+			or_list = lappend(or_list, orqual);
+			continue;
+		}
+
+		/*
+		* At this point we definitely have a transformable clause.
+		* Classify it and add into specific group of clauses, or create new
+		* group.
+		* TODO: to manage complexity in the case of many different clauses
+		* (X1=C1) OR (X2=C2 OR) ... (XN = CN) we could invent something
+		* like a hash table. But also we believe, that the case of many
+		* different variable sides is very rare.
+		*/
+		foreach(lc_groups, groups_list)
+		{
+			OrClauseGroupEntry *v = (OrClauseGroupEntry *) lfirst(lc_groups);
+
+			Assert(v->node != NULL);
+
+			if (equal(v->node, nconst_expr))
+			{
+v->consts = lappend(v->consts, const_expr);
+nconst_expr = NULL;
+break;
+			}
+		}
+
+		if 

Re: Problems with estimating OR conditions, IS NULL on LEFT JOINs

2023-07-08 Thread Alena Rybakina




Well, one option would be to modify all selectivity functions to do
something like the patch does for nulltestsel(). That seems a bit
cumbersome because why should those places care about maybe running on
the outer side of a join, or what? For code in extensions this would be
particularly problematic, I think.

Agree. I would say that we can try it if nothing else works out.

So what I was thinking about doing this in a way that'd make this
automatic, without having to modify the selectivity functions.

Option (3) is very simple - examine_variable would simply adjust the
statistics by tweaking the null_frac field, when looking at variables on
the outer side of the join. But it has issues when estimating multiple
conditions.

Imagine t1 has 1M rows, and we want to estimate

   SELECT * FROM t1 LEFT JOIN t2 ON (t1.id = t2.id)
   WHERE ((t2.a=1) AND (t2.b=1))

but only 50% of the t1 rows has a match in t2. Assume each of the t2
conditions matches 100% rows in the table. With the correction, this
means 50% selectivity for each condition. And if we combine them the
usual way, it's 0.5 * 0.5 = 0.25.

But we know all the rows in the "matching" part match the condition, so
the correct selectivity should be 0.5.

In a way, this is just another case of estimation issues due to the
assumption of independence.
FWIW, I used "AND" in the example for simplicity, but that'd probably be
pushed to the baserel level. There'd need to be OR to keep it at the
join level, but the overall issue is the same, I think.

Also, this entirely ignores extended statistics - I have no idea how we
might tweak those in (3).


I understood the idea - it is very similar to what is implemented in the 
current patch.


But I don't understand how to do it in the examine_variable function, to 
be honest.



But (4) was suggesting we could improve this essentially by treating the
join as two distinct sets of rows

  - the inner join result

  - rows without match on the outer side

For the inner part, we would do estimates as now (using the regular
per-column statistics). If we knew the conditions match 100% rows, we'd
still get 100% when the conditions are combined.

For the second part of the join we know the outer side is just NULLs in
all columns, and that'd make the estimation much simpler for most
clauses. We'd just need to have "fake" statistics with null_frac=1.0 and
that's it.

And then we'd just combine these two selectivities. If we know the inner
side is 50% and all rows match the conditions, and no rows in the other
50% match, the selectivity is 50%.

inner_part * inner_sel + outer_part * outer_sel = 0.5 * 1.0 + 0.0 = 0.5

Now, we still have issues with independence assumption in each of these
parts separately. But that's OK, I think.

I think (4) could be implemented by doing the current estimation for the
  inner part, and by tweaking examine_variable in the "outer" part in a
way similar to (3). Except that it just sets null_frac=1.0 everywhere.

For (4) we don't need to tweak those at all,
because for inner part we can just apply them as is, and for outer part
it's irrelevant because everything is NULL.
I like this idea the most) I'll try to start with this and implement the 
patch.

I hope this makes more sense. If not, let me know and I'll try to
explain it better.


Thank you for your explanation)

I will unsubscribe soon based on the results or if I have any questions.

--
Regards,
Alena Rybakina
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2023-07-07 Thread Alena Rybakina
Hi! Thank you for your detailed review, your changes have greatly helped 
to improve this patch.


On 06.07.2023 13:20, Andrey Lepikhov wrote:

On 6/7/2023 03:06, Alena Rybakina wrote:

I corrected this constant in the patch.

The patch don't apply cleanly: it contains some trailing spaces.

I fixed it.


Also, quick glance into the code shows some weak points;
1. transformBoolExprOr should have input type BoolExpr.

Agreed.
2. You can avoid the switch operator at the beginning of the function, 
because you only need one option.

Agreed.

3. Stale comments: RestrictIinfos definitely not exists at this point.
Yes, unfortunately, I missed this from the previous version when I tried 
to perform such a transformation at the index creation stage.
4. I don't know, you really need to copy the expr or not, but it is 
better to do as late, as possible.

Yes, I agree with you, copying "expr" is not necessary in this patch

5. You assume, that leftop is non-constant and rightop - constant. Why?

Agreed, It was too presumptuous on my part and I agree with your changes.
6.I doubt about equivalence operator. Someone can invent a custom '=' 
operator with another semantics, than usual. May be better to check 
mergejoinability?
Yes, I agree with you, and I haven't thought about it before. But I 
haven't found any functions to arrange this in PostgreSQL, but using 
mergejoinability turns out to be more beautiful here.
7. I don't know how to confidently identify constant expressions at 
this level. So, I guess, You can only merge here expressions like 
"F(X)=Const", not an 'F(X)=ConstExpression'.
I see, you can find solution for this case, thank you for this, and I 
think it's reliable enough.


On 07.07.2023 05:43, Andrey Lepikhov wrote:

On 6/7/2023 03:06, Alena Rybakina wrote:
The test was performed on the same benchmark database generated by 2 
billion values.


I corrected this constant in the patch.
In attempt to resolve some issues had mentioned in my previous letter 
I used op_mergejoinable to detect mergejoinability of a clause.
Constant side of the expression is detected by call of 
eval_const_expressions() and check each side on the Const type of node.


See 'diff to diff' in attachment.



I notices you remove condition for checking equal operation.

strcmp(strVal(linitial((arg)->name)), "=") == 0

Firstly, it is noticed me not correct, but a simple example convinced me 
otherwise:


postgres=# explain analyze select x from a where x=1 or x>5 or x<3 or x=2;
   QUERY PLAN

 Seq Scan on a  (cost=0.00..2291.00 rows=97899 width=4) (actual 
time=0.038..104.168 rows=99000 loops=1)
   Filter: ((x > '5'::numeric) OR (x < '3'::numeric) OR (x = ANY 
('{1,2}'::numeric[])))

   Rows Removed by Filter: 1000
 Planning Time: 9.938 ms
 Execution Time: 113.457 ms
(5 rows)

It surprises me that such check I can write such similar way:

eval_const_expressions(NULL, orqual).


Yes, I see we can remove this code:

bare_orarg = transformExprRecurse(pstate, (Node *)arg);
bare_orarg = coerce_to_boolean(pstate, bare_orarg, "OR");

because we will provide similar manipulation in this:

foreach(l, gentry->consts)
{
      Node       *rexpr = (Node *) lfirst(l);

      rexpr = coerce_to_common_type(pstate, rexpr,
                                            scalar_type,
                                            "IN");
     aexprs = lappend(aexprs, rexpr);
}

--
Regards,
Alena Rybakina
Postgres Professional
From 9134099ef9808ca27494bc131558d04b24d9bdeb Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Thu, 29 Jun 2023 17:46:58 +0300
Subject: [PATCH] Replace OR clause to ANY expressions. Replace (X=N1) OR
 (X=N2) ... with X = ANY(N1, N2) on the stage of the optimiser when we are
 still working with a tree expression. Firstly, we do not try to make a
 transformation for "non-or" expressions or inequalities and the creation of a
 relation with "or" expressions occurs according to the same scenario.
 Secondly, we do not make transformations if there are less than 500 or
 expressions. Thirdly, it is worth considering that we consider "or"
 expressions only at the current level.

---
 src/backend/parser/parse_expr.c  | 269 ++-
 src/tools/pgindent/typedefs.list |   1 +
 2 files changed, 269 insertions(+), 1 deletion(-)

diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 346fd272b6d..961ca3e482c 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -95,6 +95,273 @@ static Expr *make_distinct_op(ParseState *pstate, List *opname,
 static Node *make_nulltest_from_distinct(ParseState *pstate,
 		 A_Expr *distincta, Node *arg);
 
+typedef struct OrClauseGroupEntry
+{
+	Node		   *node;
+	List		   *co

Re: Problems with estimating OR conditions, IS NULL on LEFT JOINs

2023-07-06 Thread Alena Rybakina

Hi, all!

On 26.06.2023 12:22, Andrey Lepikhov wrote:

On 24/6/2023 17:23, Tomas Vondra wrote:

I really hope what I just wrote makes at least a little bit of sense.

Throw in one more example:

SELECT i AS id INTO l FROM generate_series(1,10) i;
CREATE TABLE r (id int8, v text);
INSERT INTO r (id, v) VALUES (1, 't'), (-1, 'f');
ANALYZE l,r;
EXPLAIN ANALYZE
SELECT * FROM l LEFT OUTER JOIN r ON (r.id = l.id) WHERE r.v IS NULL;

Here you can see the same kind of underestimation:
Hash Left Join  (... rows=500 width=14) (... rows=9 ...)

So the eqjoinsel_unmatch_left() function should be modified for the 
case where nd1


Unfortunately, this patch could not fix the cardinality calculation in 
this request, I'll try to look and figure out what is missing here.


I tried to fix the cardinality score in the query above by changing:

diff --git a/src/backend/utils/adt/selfuncs.c 
b/src/backend/utils/adt/selfuncs.c

index 8e18aa1dd2b..40901836146 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -2604,11 +2604,16 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
 * if we're calculating fraction of NULLs or fraction 
of unmatched rows.

 */
    // unmatchfreq = (1.0 - nullfrac1) * (1.0 - nullfrac2);
-   if (nd1 > nd2)
+   if (nd1 != nd2)
    {
-   selec /= nd1;
-   *unmatched_frac = (nd1 - nd2) * 1.0 / nd1;
+   selec /= Max(nd1, nd2);
+   *unmatched_frac = abs(nd1 - nd2) * 1.0 / 
Max(nd1, nd2);

    }
+   /*if (nd1 > nd2)
+   {
+   selec /= nd1;
+   *unmatched_frac = nd1 - nd2 * 1.0 / nd1;
+   }*/
    else
    {
    selec /= nd2;

and it worked:

SELECT i AS id INTO l FROM generate_series(1,10) i;
CREATE TABLE r (id int8, v text);
INSERT INTO r (id, v) VALUES (1, 't'), (-1, 'f');
ANALYZE l,r;
EXPLAIN ANALYZE
SELECT * FROM l LEFT OUTER JOIN r ON (r.id = l.id) WHERE r.v IS NULL;
ERROR:  relation "l" already exists
ERROR:  relation "r" already exists
INSERT 0 2
ANALYZE
  QUERY PLAN
---
 Hash Left Join  (cost=1.09..1944.13 rows=8 width=14) (actual 
time=0.152..84.184 rows=9 loops=1)

   Hash Cond: (l.id = r.id)
   Filter: (r.v IS NULL)
   Rows Removed by Filter: 2
   ->  Seq Scan on l  (cost=0.00..1443.00 rows=10 width=4) (actual 
time=0.040..27.635 rows=10 loops=1)
   ->  Hash  (cost=1.04..1.04 rows=4 width=10) (actual 
time=0.020..0.022 rows=4 loops=1)

 Buckets: 1024  Batches: 1  Memory Usage: 9kB
 ->  Seq Scan on r  (cost=0.00..1.04 rows=4 width=10) (actual 
time=0.009..0.011 rows=4 loops=1)

 Planning Time: 0.954 ms
 Execution Time: 92.309 ms
(10 rows)

It looks too simple and I suspect that I might have missed something 
somewhere, but so far I haven't found any examples of queries where it 
doesn't work.


I didn't see it breaking anything in the examples from my previous 
letter [1].


1. 
https://www.postgresql.org/message-id/7af1464e-2e24-cfb1-b6d4-1544757f8cfa%40yandex.ru



Unfortunately, I can't understand your idea from point 4, please explain it?

The good thing is this helps even for IS NULL checks on non-join-key
columns (where we don't switch to an antijoin), but there's a couple
things that I dislike ...

1) It's not restricted to outer joins or anything like that (this is
mostly just my laziness / interest in one particular query, but also
something the outer-join-aware patch might help with).

2) We probably don't want to pass this kind of information through
sjinfo. That was the simplest thing for an experimental patch, but I
suspect it's not the only piece of information we may need to pass to
the lower levels of estimation code.

3) I kinda doubt we actually want to move this responsibility (to
consider fraction of unmatched rows) to the low-level estimation
routines (e.g. nulltestsel and various others). AFAICS this just
"introduces NULLs" into the relation, so maybe we could "adjust" the
attribute statistics (in examine_variable?) by inflating null_frac and
modifying the other frequencies in MCV/histogram.

4) But I'm not sure we actually want to do that in these low-level
selectivity functions. The outer join essentially produces output with
two subsets - one with matches on the outer side, one without them. But
the side without matches has NULLs in all columns. In a way, we know
exactly how are these columns correlated - if we do the usual estimation
(even with the null_frac adjusted), we just throw this information away.
And when there's a lot of rows without a match, that seems bad.

Re: POC, WIP: OR-clause support for indexes

2023-06-29 Thread Alena Rybakina

On 29.06.2023 14:23, Ranier Vilela wrote:
Em qui., 29 de jun. de 2023 às 06:56, Alena Rybakina 
 escreveu:


I apologize for breaks the original thread. In my defense, I can
say that I'm new to all this and I'm just learning. I will try to
make as few mistakes as possible.

By no means, your work is excellent and deserves all compliments.


Thank you, I will try to work in the same spirit, especially since there 
is still quite a lot of work left).


Thank you for your feedback.

--
Regards,
Alena Rybakina
Postgres Professional


Re: POC, WIP: OR-clause support for indexes

2023-06-29 Thread Alena Rybakina

Hi!


At the moment, I'm not sure that the constant is the right number
for applying transformations, so I'm in search of it, to be
honest. I will post my observations on this issue later. If you
don't mind, I'll leave the constant equal to 15 for now.

It's hard to predict. Perhaps accounting for time on each benchmark 
could help decide.


I will try to test on JOB [1] (because queries are difficult for the 
optimizer due to the significant number of joins and correlations 
contained in the dataset) and
tpcds [2] (the benchmark I noticed contains a sufficient number of 
queries with "or" expressions).



Sorry, I don't understand well enough what is meant by points
"Eliminate unnecessary variables" and "Eliminate unnecessary
expressions". Can you explain in more detail?

One example is array_type.
As you can see in v2 and v3 it no longer exists.

I get it. Honestly, I was guided by the example of converting "IN" to 
"ANY" (transformAExprIn), at least the part of the code when we 
specifically convert the expression to ScalarArrayOpExpr.


Both there and here, we first look for a common type for the collected 
constants, and if there is one, then we try to find the type for the 
array structure.


Only I think in my current patch it is also worth returning to the 
original version in this place, since if it is not found, the 
ScalarArrayOpExpr generation function will be processed incorrectly and
the request may not be executed at all, referring to the error that it 
is impossible to determine the type of node (ERROR: unrecognized node 
type. )


At the same time we are trying to do this transformation for each group. 
The group here implies that these are combined "or" expressions on the 
common left side, and at the same time we consider

only expressions that contain a constant and only equality.

What else should be taken into account is that we are trying to do this 
processing before forming a BoolExpr expression (if you notice, then 
after any outcome we call the makeBoolExpr function,
which just forms the "Or" expression, as in the original version, 
regardless of what type of expressions it combines.




As I said earlier, these are just suggestions.
But thinking about it now, I think they can be classified as bad early 
optimizations.
Thank you again for your interest in this problem and help. Yes, I think 
so too)



1. https://github.com/gregrahn/join-order-benchmark

2. 
https://github.com/Alena0704/s64da-benchmark-toolkit/tree/master/benchmarks/tpcds


--
Regards,
Alena Rybakina
Postgres Professional
From f9f5958707bc1d7931323df05d51a60fc9d6cd38 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Thu, 29 Jun 2023 17:46:58 +0300
Subject: [PATCH] Replace OR clause to ANY expressions. Replace (X=N1) OR
 (X=N2) ... with X = ANY(N1, N2) on the stage of the optimiser when we are
 still working with a tree expression. Firstly, we do not try to make a
 transformation for "non-or" expressions or inequalities and the creation of a
 relation with "or" expressions occurs according to the same scenario.
 Secondly, we do not make transformations if there are less than 15 or
 expressions. Thirdly, it is worth considering that we consider "or"
 expressions only at the current level.

---
 src/backend/parser/parse_expr.c  | 290 ++-
 src/tools/pgindent/typedefs.list |   1 +
 2 files changed, 290 insertions(+), 1 deletion(-)

diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 346fd272b6d..3d395fd6459 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -95,6 +95,294 @@ static Expr *make_distinct_op(ParseState *pstate, List *opname,
 static Node *make_nulltest_from_distinct(ParseState *pstate,
 		 A_Expr *distincta, Node *arg);
 
+typedef struct OrClauseGroupEntry
+{
+	Node		   *node;
+	List		   *consts;
+	Oidscalar_type;
+	Oidopno;
+	Expr 		   *expr;
+} OrClauseGroupEntry;
+
+static int const_transform_or_limit = 15;
+
+static Node *
+transformBoolExprOr(ParseState *pstate, Expr *expr_orig)
+{
+	List		   *or_list = NIL;
+	List		   *groups_list = NIL;
+	ListCell	   *lc_eargs;
+	Node 		   *result;
+	BoolExpr 	   *expr = (BoolExpr *)copyObject(expr_orig);
+	const char 	   *opname;
+	bool			change_apply = false;
+	bool			or_statement = false;
+
+	Assert(IsA(expr, BoolExpr));
+
+	/* If this is not expression "Or", then will do it the old way. */
+	switch (expr->boolop)
+	{
+		case AND_EXPR:
+			opname = "AND";
+			break;
+		case OR_EXPR:
+			opname = "OR";
+			or_statement = true;
+			break;
+		case NOT_EXPR:
+			opname = "NOT";
+			break;
+		default:
+			elog(ERROR, "unrecognized boolop: %d", (int) expr->boolop);
+			opname = NULL;		/* keep compiler quiet */
+			break;
+	}
+
+	if (!or_statement || list_length(expr-&

Re: POC, WIP: OR-clause support for indexes

2023-06-29 Thread Alena Rybakina
I apologize for breaks the original thread. In my defense, I can say 
that I'm new to all this and I'm just learning. I will try to make as 
few mistakes as possible.


I try to fix it by forwarding this message to you, besides it might be 
interesting to you too. This message to you, because it might be 
interesting to you too.


I'm sorry if I didn't state my goals clearly at first, but it seemed to 
me that initially the problem I encountered was very similar to what is 
described in this thread, only I suggested a slightly different way to 
solve it.


I have described the problem more or less clearly here [1] and the worst 
case, as it seems to me, too, but if this is not the case, let me know.


1. 
https://www.mail-archive.com/pgsql-hackers@lists.postgresql.org/msg146230.html 


On 29.06.2023 12:32, Alena Rybakina wrote:


Hi! I'm sorry I didn't answer you right away, I was too busy with work.

On 27.06.2023 22:50, Peter Geoghegan wrote:

On Tue, Jun 27, 2023 at 6:19 AM Alena Rybakina  wrote:

I learned something new from your letter, thank you very much for that!

Cool. The MDAM paper is also worth a read:

https://vldb.org/conf/1995/P710.PDF

Some of the techniques it describes are already in Postgres. With
varying degrees of maturity.

The paper actually mentions OR optimization at one point, under
"Duplicate Elimination". The general idea is that ScalarArrayOpExpr
execution can "eliminate duplicates before the data is read". The
important underlying principle is that it can be really useful to give
the B-Tree code the context it requires to be clever about stuff like
that. We can do this by (say) using one ScalarArrayOpExpr, rather than
using two or more index scans that the B-Tree code will treat as
independent things. So a lot of the value in your patch comes from the
way that it can enable other optimizations (the immediate benefits are
  also nice).

In the past, OR optimizations have been prototyped that were later
withdrawn/rejected because the duplicate elimination aspect was...too
scary [1]. It's very easy to see that ScalarArrayOpExpr index scans
don't really have the same problem. "Giving the B-Tree code the
required context" helps here too.

Thank you for the explanation and the material provided) 
unfortunately, I am still only studying the article and at the moment 
I cannot write more. To be honest, I didn't think about the fact that 
my optimization can help eliminate duplicates before reading the data 
before.


I am still only in the process of familiarizing myself with the 
thread [1] (reference from your letter), but I have already seen that 
there are problems related, for example, to when "or" expressions 
refer to the parent element.


I think, I would face the similar problems if I complicate the 
current code, for example, so that not only or expressions standing 
on the same level are written in any, but also on different ones 
without violating the logic of the priority of executing operators.


For example, this query works now:

postgres=# EXPLAIN (analyze, COSTS OFF)
SELECT oid,relname FROM pg_class
WHERE
  (oid = 13779 OR oid = 2) OR (oid = 4 OR oid = 5) OR
  relname = 'pg_extension'
;

    QUERY PLAN
--
 Seq Scan on pg_class (actual time=0.086..0.140 rows=1 loops=1)
   Filter: ((oid = ANY ('{4,5}'::oid[])) OR (oid = ANY 
('{13779,2}'::oid[])) OR (relname = 'pg_extension'::name))

   Rows Removed by Filter: 412
 Planning Time: 2.135 ms
 Execution Time: 0.160 ms
(5 rows)

But I would like it works such as:

  QUERY PLAN
--
 Seq Scan on pg_class (actual time=0.279..0.496 rows=1 loops=1)
   Filter: ((oid = ANY ('{13779,2,4,5}'::oid[])) OR (relname = 
'pg_extension'::name))

   Rows Removed by Filter: 412
 Planning Time: 0.266 ms
 Execution Time: 0.536 ms
(5 rows)


I analyzed the buffer consumption when I ran control regression tests using my 
patch. diff shows me that there is no difference between the number of buffer 
block scans without and using my patch, as far as I have seen. 
(regression.diffs)

To be clear, I wasn't expecting that there'd be any regressions from
your patch. Intuitively, it seems like this optimization should make
the query plan do almost the same thing at execution time -- just
slightly more efficiently on average, and much more efficiently in
some individual cases.

It would probably be very hard for the optimizer to model/predict how
much work it can save by using a ScalarArrayOpExpr instead of an
"equivalent" set of bitmap index scans, OR'd together. But it doesn't
necessarily matter -- the only truly critical detail is understanding
the worst case for the transformation optimization.

Yes, I agree with you and I

Re: POC, WIP: OR-clause support for indexes

2023-06-29 Thread Alena Rybakina

Hi! I'm sorry I didn't answer you right away, I was too busy with work.

On 27.06.2023 22:50, Peter Geoghegan wrote:

On Tue, Jun 27, 2023 at 6:19 AM Alena Rybakina  wrote:

I learned something new from your letter, thank you very much for that!

Cool. The MDAM paper is also worth a read:

https://vldb.org/conf/1995/P710.PDF

Some of the techniques it describes are already in Postgres. With
varying degrees of maturity.

The paper actually mentions OR optimization at one point, under
"Duplicate Elimination". The general idea is that ScalarArrayOpExpr
execution can "eliminate duplicates before the data is read". The
important underlying principle is that it can be really useful to give
the B-Tree code the context it requires to be clever about stuff like
that. We can do this by (say) using one ScalarArrayOpExpr, rather than
using two or more index scans that the B-Tree code will treat as
independent things. So a lot of the value in your patch comes from the
way that it can enable other optimizations (the immediate benefits are
  also nice).

In the past, OR optimizations have been prototyped that were later
withdrawn/rejected because the duplicate elimination aspect was...too
scary [1]. It's very easy to see that ScalarArrayOpExpr index scans
don't really have the same problem. "Giving the B-Tree code the
required context" helps here too.

Thank you for the explanation and the material provided) unfortunately, 
I am still only studying the article and at the moment I cannot write 
more. To be honest, I didn't think about the fact that my optimization 
can help eliminate duplicates before reading the data before.


I am still only in the process of familiarizing myself with the thread 
[1] (reference from your letter), but I have already seen that there are 
problems related, for example, to when "or" expressions refer to the 
parent element.


I think, I would face the similar problems if I complicate the current 
code, for example, so that not only or expressions standing on the same 
level are written in any, but also on different ones without violating 
the logic of the priority of executing operators.


For example, this query works now:

postgres=# EXPLAIN (analyze, COSTS OFF)
SELECT oid,relname FROM pg_class
WHERE
  (oid = 13779 OR oid = 2) OR (oid = 4 OR oid = 5) OR
  relname = 'pg_extension'
;

    QUERY PLAN
--
 Seq Scan on pg_class (actual time=0.086..0.140 rows=1 loops=1)
   Filter: ((oid = ANY ('{4,5}'::oid[])) OR (oid = ANY 
('{13779,2}'::oid[])) OR (relname = 'pg_extension'::name))

   Rows Removed by Filter: 412
 Planning Time: 2.135 ms
 Execution Time: 0.160 ms
(5 rows)

But I would like it works such as:

  QUERY PLAN
--
 Seq Scan on pg_class (actual time=0.279..0.496 rows=1 loops=1)
   Filter: ((oid = ANY ('{13779,2,4,5}'::oid[])) OR (relname = 
'pg_extension'::name))

   Rows Removed by Filter: 412
 Planning Time: 0.266 ms
 Execution Time: 0.536 ms
(5 rows)


I analyzed the buffer consumption when I ran control regression tests using my 
patch. diff shows me that there is no difference between the number of buffer 
block scans without and using my patch, as far as I have seen. 
(regression.diffs)

To be clear, I wasn't expecting that there'd be any regressions from
your patch. Intuitively, it seems like this optimization should make
the query plan do almost the same thing at execution time -- just
slightly more efficiently on average, and much more efficiently in
some individual cases.

It would probably be very hard for the optimizer to model/predict how
much work it can save by using a ScalarArrayOpExpr instead of an
"equivalent" set of bitmap index scans, OR'd together. But it doesn't
necessarily matter -- the only truly critical detail is understanding
the worst case for the transformation optimization.

Yes, I agree with you and I have yet to analyze this.

  It cannot be too
bad (maybe it's ~zero added runtime overhead relative to not doing the
transformation, even?).
I haven't seen a major performance degradation so far, but to be honest, 
I have not conducted a detailed analysis on other types of queries other 
than x=1 or x=2 or x=1 or y=2, etc. As soon as something is known, I 
will provide the data, it is very interesting to me.

At the same time, nbtree can be clever about
ScalarArrayOpExpr execution at runtime (once that's implemented),
without ever needing to make any kind of up-front commitment to
navigating through the index in any particular way. It's all dynamic,
and can be driven by the actual observed characteristics of the index
structure.

In other words, we don't really need to gamble (in the planner, or at
execution time). We're just ke

Re: POC, WIP: OR-clause support for indexes

2023-06-29 Thread Alena Rybakina
k1 b on
  a.unique2 = 3 or a.unique2 = 7 or a.unique1 = 1;
  QUERY PLAN
--
 Nested Loop  (cost=0.00..22247.02 rows=135 width=32) (actual 
time=0.094..373.627 rows=135 loops=1)
   ->  Seq Scan on tenk1 b  (cost=0.00..2311.00 rows=15 width=16) (actual 
time=0.051..14.667 rows=15 loops=1)
   ->  Materialize  (cost=0.00..3061.05 rows=9 width=16) (actual 
time=0.000..0.001 rows=9 loops=15)
 ->  Seq Scan on tenk1 a  (cost=0.00..3061.00 rows=9 width=16) (actual 
time=0.026..42.389 rows=9 loops=1)
   Filter: ((unique2 = ANY ('{3,7}'::integer[])) OR (unique1 = 1))
   Rows Removed by Filter: 149991
 Planning Time: 0.414 ms
 Execution Time: 409.154 ms
(8 rows)

I compiled my original patch and there were no problems with regression 
tests. The only time there was a problem when I set the 
const_transform_or_limit variable to 0 (I have 15), as you have in the 
patch. To be honest, diff appears there because you had a different 
plan, specifically the expressions "or" are replaced by ANY (see 
regression.diffs).
Unfortunately, your patch version did not apply immediately, I did not 
understand the reasons, I applied it manually.
At the moment, I'm not sure that the constant is the right number for 
applying transformations, so I'm in search of it, to be honest. I will 
post my observations on this issue later. If you don't mind, I'll leave 
the constant equal to 15 for now.


Sorry, I don't understand well enough what is meant by points "Eliminate 
unnecessary variables" and "Eliminate unnecessary expressions". Can you 
explain in more detail?



Regarding the patch, there was a Warning at the compilation stage.

In file included from ../../../src/include/nodes/bitmapset.h:21,

 from ../../../src/include/nodes/parsenodes.h:26,

 from ../../../src/include/catalog/objectaddress.h:17,

 from ../../../src/include/catalog/pg_aggregate.h:24,

 from parse_expr.c:18:

parse_expr.c: In function ‘transformBoolExprOr’:

../../../src/include/nodes/nodes.h:133:66: warning: ‘expr’ is used 
uninitialized [-Wuninitialized]

  133 | #define nodeTag(nodeptr)    (((const Node*)(nodeptr))->type)

  |  ^~

parse_expr.c:116:29: note: ‘expr’ was declared here

  116 | BoolExpr   *expr;

  | ^~~~

I couldn't figure out how to fix it and went back to my original 
version. To be honest, I don't think anything needs to be changed here.


Unfortunately, I didn't understand the reasons why, with the available 
or expressions, you don't even try to convert to ANY by calling 
transformBoolExpr, as I saw. I went back to my version.


I think it's worth checking whether the or_statement variable is positive.

I think it's worth leaving the use of the or_statement variable in its 
original form.


  switch (expr->boolop)
  {
    case AND_EXPR:
  opname = "AND";
  break;
    case OR_EXPR:
  opname = "OR";
  or_statement = true;
  break;
    case NOT_EXPR:
  opname = "NOT";
  break;
    default:
  elog(ERROR, "unrecognized boolop: %d", (int) expr->boolop);
  opname = NULL;    /* keep compiler quiet */
  break;
  }

  if (!or_statement || list_length(expr->args) < const_transform_or_limit)
    return transformBoolExpr(pstate, (BoolExpr *)expr_orig);

The current version of the patch also works and all tests pass.

--
Regards,
Alena Rybakina
Postgres Professional
From 06ae4f0887358eb38fe1b32287580a6996c8c9f6 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Thu, 29 Jun 2023 07:09:16 +0300
Subject: [PATCH] Replace clause (X=N1) OR (X=N2) ... with X = ANY(N1, N2) on
 the stage of the optimiser when we are still working with a tree expression.

Firstly, we do not try to make a transformation for "non-or"
expressions or inequalities and the creation of a relation
with "or" expressions occurs according to the same scenario;
secondly, we do not make transformations if there are less
than 15 or expressions (here you can put another number, but
during testing, already starting with 3 expressions, the execution
time and planning time with transformed or were faster);
thirdly, it is worth considering that we consider "or" expressions
only at the current level.
The transformation takes place according to the following scheme:
first we define the groups on the left side, collect the constants
in a list for each group, then considering each group we make the
collected constants to one type, find a common type for array and
using the make_scalar_array_op function we form ScalarArrayOpExpr,
if possible. If it is not p

Re: MergeJoin beats HashJoin in the case of multiple hash clauses

2023-06-28 Thread Alena Rybakina
t;  Seq Scan on b  (cost=0.00..1637.00 rows=10 width=20) (actual 
time=0.027..17.980 rows=10 loops=1)
   ->  Hash  (cost=10.00..10.00 rows=600 width=12) (actual 
time=2.046..2.047 rows=600 loops=1)

 Buckets: 1024  Batches: 1  Memory Usage: 34kB
 ->  Seq Scan on a  (cost=0.00..10.00 rows=600 width=12) 
(actual time=0.022..0.315 rows=600 loops=1)

 Planning Time: 0.631 ms
 Execution Time: 95.730 ms
(8 rows)

SET
  QUERY PLAN
--
 Hash Join  (cost=3387.00..8621.58 rows=8 width=32) (actual 
time=102.873..102.877 rows=0 loops=1)
   Hash Cond: (((a.x)::numeric = b.x) AND ((a.y)::numeric = b.y) AND 
((a.z)::numeric = b.z))
   ->  Seq Scan on a  (cost=0.00..10.00 rows=600 width=12) (actual 
time=0.014..0.131 rows=600 loops=1)
   ->  Hash  (cost=1637.00..1637.00 rows=10 width=20) (actual 
time=101.920..101.921 rows=10 loops=1)

 Buckets: 131072  Batches: 1  Memory Usage: 6474kB
 ->  Seq Scan on b  (cost=0.00..1637.00 rows=10 width=20) 
(actual time=0.013..16.349 rows=10 loops=1)

 Planning Time: 0.153 ms
 Execution Time: 103.518 ms
(8 rows)

I also give an improvement relative to the left external or right 
connection:


DROP TABLE IF EXISTS a,b CASCADE;

CREATE TABLE a AS
  SELECT ((3*gs) % 300) AS x, ((3*gs+1) % 300) AS y, ((3*gs+2) % 300) AS z
  FROM generate_series(1,600) AS gs;
CREATE TABLE b AS
  SELECT gs % 90 AS x, gs % 49 AS y, gs %100 AS z, 'abc' || gs AS payload
  FROM generate_series(1,1e5) AS gs;
ANALYZE a,b;


SET enable_cost_size = 'on';

EXPLAIN ANALYZE
SELECT * FROM a right join b
on a.x=b.x AND a.y=b.y AND a.z=b.z;

SET enable_cost_size = 'off';
EXPLAIN ANALYZE
SELECT * FROM a right join b
on a.x=b.x AND a.y=b.y AND a.z=b.z;

QUERY PLAN

 Hash Left Join  (cost=20.50..3157.58 rows=10 width=32) (actual 
time=1.846..102.264 rows=10 loops=1)
   Hash Cond: ((b.x = (a.x)::numeric) AND (b.y = (a.y)::numeric) AND 
(b.z = (a.z)::numeric))
   ->  Seq Scan on b  (cost=0.00..1637.00 rows=10 width=20) (actual 
time=0.041..15.328 rows=10 loops=1)
   ->  Hash  (cost=10.00..10.00 rows=600 width=12) (actual 
time=1.780..1.781 rows=600 loops=1)

 Buckets: 1024  Batches: 1  Memory Usage: 34kB
 ->  Seq Scan on a  (cost=0.00..10.00 rows=600 width=12) 
(actual time=0.031..0.252 rows=600 loops=1)

 Planning Time: 0.492 ms
 Execution Time: 107.609 ms
(8 rows)

SET
  QUERY PLAN
--
 Hash Right Join  (cost=3387.00..8500.08 rows=10 width=32) (actual 
time=80.919..101.613 rows=10 loops=1)
   Hash Cond: (((a.x)::numeric = b.x) AND ((a.y)::numeric = b.y) AND 
((a.z)::numeric = b.z))
   ->  Seq Scan on a  (cost=0.00..10.00 rows=600 width=12) (actual 
time=0.017..0.084 rows=600 loops=1)
   ->  Hash  (cost=1637.00..1637.00 rows=10 width=20) (actual 
time=80.122..80.123 rows=10 loops=1)

 Buckets: 131072  Batches: 1  Memory Usage: 6474kB
 ->  Seq Scan on b  (cost=0.00..1637.00 rows=10 width=20) 
(actual time=0.015..11.819 rows=10 loops=1)

 Planning Time: 0.194 ms
 Execution Time: 104.662 ms
(8 rows)

--
Regards,
Alena Rybakina
Postgres Professional
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ef475d95a18..31771dfba46 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -153,6 +153,7 @@ bool		enable_parallel_hash = true;
 bool		enable_partition_pruning = true;
 bool		enable_presorted_aggregate = true;
 bool		enable_async_append = true;
+bool 		enable_cost_size = true;
 
 typedef struct
 {
@@ -4033,11 +4034,22 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
 thismcvfreq = restrictinfo->left_mcvfreq;
 			}
 
+			if (enable_cost_size)
+			{
+innerbucketsize *= thisbucketsize;
+innermcvfreq *= thismcvfreq;
+			}
+			else
+			{
 			if (innerbucketsize > thisbucketsize)
 innerbucketsize = thisbucketsize;
 			if (innermcvfreq > thismcvfreq)
 innermcvfreq = thismcvfreq;
+			}
 		}
+
+		if (enable_cost_size && innerbucketsize > virtualbuckets)
+			innerbucketsize = 1.0 / virtualbuckets;
 	}
 
 	/*
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 71e27f8eb05..ded9ba3b7a9 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1007,6 +1007,19 @@ struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_cost_size", PGC_USERSET, QUERY_TUNING_OTHER,
+			gettext_noop("s

Re: Problems with estimating OR conditions, IS NULL on LEFT JOINs

2023-06-26 Thread Alena Rybakina
2.00..2.00 rows=100 width=8) (actual 
time=0.034..0.036 rows=100 loops=1)

 Buckets: 1024  Batches: 1  Memory Usage: 12kB
 ->  Seq Scan on small  (cost=0.00..2.00 rows=100 width=8) 
(actual time=0.008..0.016 rows=100 loops=1)

 Planning Time: 0.168 ms
 Execution Time: 209.883 ms
(10 rows)*
From e8653477fbe7321325122a2d5032798cd6a95a8b Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Mon, 26 Jun 2023 15:39:11 +0300
Subject: [PATCH] Fixed the case of calculating underestimated cardinality for
 an LEFT JOIN with the restriction "IS NULL" in the clause. This error is
 caused by an incorrect calculation of selectivity in the "IS NULL" clause,
 since it took into account only table-level statistics without zero values in
 the results of the join operation. This patch fixes this by calculating the
 fraction of zero values on the right side of the join without of matching row
 on left.

Co-authored-by: Alena Rybakina 
---
 src/backend/utils/adt/selfuncs.c | 35 +---
 src/include/nodes/pathnodes.h|  3 +++
 2 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index c4fcd0076ea..8e18aa1dd2b 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -153,7 +153,7 @@ static double eqjoinsel_inner(Oid opfuncoid, Oid collation,
 			  bool isdefault1, bool isdefault2,
 			  AttStatsSlot *sslot1, AttStatsSlot *sslot2,
 			  Form_pg_statistic stats1, Form_pg_statistic stats2,
-			  bool have_mcvs1, bool have_mcvs2);
+			  bool have_mcvs1, bool have_mcvs2, double *unmatched_frac);
 static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
 			 VariableStatData *vardata1, VariableStatData *vardata2,
 			 double nd1, double nd2,
@@ -1710,6 +1710,9 @@ nulltestsel(PlannerInfo *root, NullTestType nulltesttype, Node *arg,
 		stats = (Form_pg_statistic) GETSTRUCT(vardata.statsTuple);
 		freq_null = stats->stanullfrac;
 
+		if (sjinfo)
+			freq_null = freq_null + sjinfo->unmatched_frac - freq_null * sjinfo->unmatched_frac;
+
 		switch (nulltesttype)
 		{
 			case IS_NULL:
@@ -2313,13 +2316,24 @@ eqjoinsel(PG_FUNCTION_ARGS)
 	}
 
 	/* We need to compute the inner-join selectivity in all cases */
+	/*
+	 * calculate fraction of right without of matching row on left
+	 *
+	 * FIXME Should be restricted to JOIN_LEFT, we should have similar logic
+	 * for JOIN_FULL.
+	 *
+	 * XXX Probably should calculate unmatched as fraction of the join result,
+	 * not of the relation on the right (because the matched part can have more
+	 * matches per row and thus grow). Not sure. Depends on how it's used later.
+	 */
 	selec_inner = eqjoinsel_inner(opfuncoid, collation,
   , ,
   nd1, nd2,
   isdefault1, isdefault2,
   , ,
   stats1, stats2,
-  have_mcvs1, have_mcvs2);
+  have_mcvs1, have_mcvs2,
+  >unmatched_frac);
 
 	switch (sjinfo->jointype)
 	{
@@ -2407,7 +2421,7 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
 bool isdefault1, bool isdefault2,
 AttStatsSlot *sslot1, AttStatsSlot *sslot2,
 Form_pg_statistic stats1, Form_pg_statistic stats2,
-bool have_mcvs1, bool have_mcvs2)
+bool have_mcvs1, bool have_mcvs2, double *unmatched_frac)
 {
 	double		selec;
 
@@ -2503,7 +2517,10 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
 		}
 		CLAMP_PROBABILITY(matchfreq1);
 		CLAMP_PROBABILITY(unmatchfreq1);
+
+		*unmatched_frac = unmatchfreq1;
 		matchfreq2 = unmatchfreq2 = 0.0;
+
 		for (i = 0; i < sslot2->nvalues; i++)
 		{
 			if (hasmatch2[i])
@@ -2581,10 +2598,22 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation,
 		double		nullfrac2 = stats2 ? stats2->stanullfrac : 0.0;
 
 		selec = (1.0 - nullfrac1) * (1.0 - nullfrac2);
+
+		/*
+		 * XXX Should this look at nullfrac on either side? Probably depends on
+		 * if we're calculating fraction of NULLs or fraction of unmatched rows.
+		 */
+		// unmatchfreq = (1.0 - nullfrac1) * (1.0 - nullfrac2);
 		if (nd1 > nd2)
+		{
 			selec /= nd1;
+			*unmatched_frac = (nd1 - nd2) * 1.0 / nd1;
+		}
 		else
+		{
 			selec /= nd2;
+			*unmatched_frac = 0.0;
+		}
 	}
 
 	return selec;
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index c17b53f7adb..6bc63e648e6 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -2853,6 +2853,9 @@ struct SpecialJoinInfo
 	bool		semi_can_hash;	/* true if semi_operators are all hash */
 	List	   *semi_operators; /* OIDs of equality join operators */
 	List	   *semi_rhs_exprs; /* righthand-side expressions of these ops */
+
+	/* For outer join, fraction of rows without a match. */
+	Selectivity	unmatched_frac;
 };
 
 /*
-- 
2.34.1



Re: POC, WIP: OR-clause support for indexes

2023-06-25 Thread Alena Rybakina
Sorry,  I wrote the last sentence in a confusing way, I meant that I 
formed transformations for any number of "or" expressions 
(const_transform_or_limit=1). in regression tests, I noticed only diff 
changes of transformations of "or" expressions to "any". I attach a file 
with diff.


On 26.06.2023 04:47, Alena Rybakina wrote:


Hi, all! Sorry I haven't written for a long time.

I finished writing the code patch for transformation "Or" expressions 
to "Any" expressions. I didn't see any problems in regression tests, 
even when I changed the constant at which the minimum or expression is 
replaced by any at 0. I ran my patch on sqlancer and so far the code 
has never fallen.


On 14.01.2023 18:45, Marcos Pegoraro wrote:


I agree with your idea and try to implement it and will soon
attach a patch with a solution.

Additionally, if those OR constants repeat you'll see ...
If all constants are the same value, fine
explain select * from x where ((ID = 1) OR (ID = 1) OR (ID = 1));
Index Only Scan using x_id on x  (cost=0.42..4.44 rows=1 width=4)
  Index Cond: (id = 1)

if all values are almost the same, ops
explain select * from x where ((ID = 1) OR (ID = 1) OR (ID = 1) OR 
(ID = 2));

Bitmap Heap Scan on x  (cost=17.73..33.45 rows=4 width=4)
  Recheck Cond: ((id = 1) OR (id = 1) OR (id = 1) OR (id = 2))
  ->  BitmapOr  (cost=17.73..17.73 rows=4 width=0)
        ->  Bitmap Index Scan on x_id  (cost=0.00..4.43 rows=1 width=0)
              Index Cond: (id = 1)
        ->  Bitmap Index Scan on x_id  (cost=0.00..4.43 rows=1 width=0)
              Index Cond: (id = 1)
        ->  Bitmap Index Scan on x_id  (cost=0.00..4.43 rows=1 width=0)
              Index Cond: (id = 1)
        ->  Bitmap Index Scan on x_id  (cost=0.00..4.43 rows=1 width=0)
              Index Cond: (id = 2)

thanks
Marcos


--

Regards,

Alena Rybakina
diff -U3 /home/alena/postgrespro9/src/test/regress/expected/create_index.out 
/home/alena/postgrespro9/src/test/regress/results/create_index.out
--- /home/alena/postgrespro9/src/test/regress/expected/create_index.out 
2023-06-26 04:08:22.903294342 +0300
+++ /home/alena/postgrespro9/src/test/regress/results/create_index.out  
2023-06-26 05:36:31.904205986 +0300
@@ -1838,18 +1838,11 @@
 EXPLAIN (COSTS OFF)
 SELECT * FROM tenk1
   WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR tenthous = 42);
-   QUERY PLAN  
  
--
- Bitmap Heap Scan on tenk1
-   Recheck Cond: (((thousand = 42) AND (tenthous = 1)) OR ((thousand = 42) AND 
(tenthous = 3)) OR ((thousand = 42) AND (tenthous = 42)))
-   ->  BitmapOr
- ->  Bitmap Index Scan on tenk1_thous_tenthous
-   Index Cond: ((thousand = 42) AND (tenthous = 1))
- ->  Bitmap Index Scan on tenk1_thous_tenthous
-   Index Cond: ((thousand = 42) AND (tenthous = 3))
- ->  Bitmap Index Scan on tenk1_thous_tenthous
-   Index Cond: ((thousand = 42) AND (tenthous = 42))
-(9 rows)
+  QUERY PLAN  
+--
+ Index Scan using tenk1_thous_tenthous on tenk1
+   Index Cond: ((thousand = 42) AND (tenthous = ANY ('{1,3,42}'::integer[])))
+(2 rows)
 
 SELECT * FROM tenk1
   WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR tenthous = 42);
@@ -1861,20 +1854,17 @@
 EXPLAIN (COSTS OFF)
 SELECT count(*) FROM tenk1
   WHERE hundred = 42 AND (thousand = 42 OR thousand = 99);
-   QUERY PLAN  
  
--
+ QUERY PLAN
 
+
  Aggregate
->  Bitmap Heap Scan on tenk1
- Recheck Cond: ((hundred = 42) AND ((thousand = 42) OR (thousand = 
99)))
+ Recheck Cond: ((hundred = 42) AND (thousand = ANY 
('{42,99}'::integer[])))
  ->  BitmapAnd
->  Bitmap Index Scan on tenk1_hundred
  Index Cond: (hundred = 42)
-   ->  BitmapOr
- ->  Bitmap Index Scan on tenk1_thous_tenthous
-   Index Cond: (thousand = 42)
- ->  Bitmap Index Scan on tenk1_thous_tenthous
-   Index Cond: (thousand = 99)
-(11 rows)
+   ->  Bitmap Index Scan on tenk1_thous_tenthous
+ Index Cond: (thousand = ANY ('{42,99}'::integer[]))
+(8 rows)
 
 SELECT count(*) FROM tenk1
   WHERE hundred = 42

Re: POC, WIP: OR-clause support for indexes

2023-06-25 Thread Alena Rybakina

Hi, all! Sorry I haven't written for a long time.

I finished writing the code patch for transformation "Or" expressions to 
"Any" expressions. I didn't see any problems in regression tests, even 
when I changed the constant at which the minimum or expression is 
replaced by any at 0. I ran my patch on sqlancer and so far the code has 
never fallen.


On 14.01.2023 18:45, Marcos Pegoraro wrote:


I agree with your idea and try to implement it and will soon
attach a patch with a solution.

Additionally, if those OR constants repeat you'll see ...
If all constants are the same value, fine
explain select * from x where ((ID = 1) OR (ID = 1) OR (ID = 1));
Index Only Scan using x_id on x  (cost=0.42..4.44 rows=1 width=4)
  Index Cond: (id = 1)

if all values are almost the same, ops
explain select * from x where ((ID = 1) OR (ID = 1) OR (ID = 1) OR (ID 
= 2));

Bitmap Heap Scan on x  (cost=17.73..33.45 rows=4 width=4)
  Recheck Cond: ((id = 1) OR (id = 1) OR (id = 1) OR (id = 2))
  ->  BitmapOr  (cost=17.73..17.73 rows=4 width=0)
        ->  Bitmap Index Scan on x_id  (cost=0.00..4.43 rows=1 width=0)
              Index Cond: (id = 1)
        ->  Bitmap Index Scan on x_id  (cost=0.00..4.43 rows=1 width=0)
              Index Cond: (id = 1)
        ->  Bitmap Index Scan on x_id  (cost=0.00..4.43 rows=1 width=0)
              Index Cond: (id = 1)
        ->  Bitmap Index Scan on x_id  (cost=0.00..4.43 rows=1 width=0)
              Index Cond: (id = 2)

thanks
Marcos


--

Regards,

Alena Rybakina
From 56fba3befe4f6b041d097d8884815fe943fb21f9 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Mon, 26 Jun 2023 04:18:15 +0300
Subject: [PATCH] Replace clause (X=N1) OR (X=N2) ... with X = ANY(N1, N2) on 
 the stage of the optimiser when we are still working with a tree expression.

Firstly, we do not try to make a transformation for "non-or"
expressions or inequalities and the creation of a relation
with "or" expressions occurs according to the same scenario;
secondly, we do not make transformations if there are less
than 15 or expressions (here you can put another number, but
during testing, already starting with 3 expressions, the execution
time and planning time with transformed or were faster);
thirdly, it is worth considering that we consider "or" expressions
only at the current level.
The transformation takes place according to the following scheme:
first we define the groups on the left side, collect the constants
in a list for each group, then considering each group we make the
collected constants to one type, find a common type for array and
using the make_scalar_array_op function we form ScalarArrayOpExpr,
if possible. If it is not possible, then these constants both remain
in the same expr as before the function call. All successful attempts
(received ScalarArrayOpExpr together with unformulated Expr are combined
via OR operation).
---
 src/backend/parser/parse_expr.c  | 289 ++-
 src/tools/pgindent/typedefs.list |   1 +
 2 files changed, 289 insertions(+), 1 deletion(-)

diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 346fd272b6d..c5f58aee9ec 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -95,6 +95,293 @@ static Expr *make_distinct_op(ParseState *pstate, List *opname,
 static Node *make_nulltest_from_distinct(ParseState *pstate,
 		 A_Expr *distincta, Node *arg);
 
+typedef struct OrClauseGroupEntry
+{
+	Node		   *node;
+	List		   *consts;
+	Oidscalar_type;
+	Oidopno;
+	Expr 		   *expr;
+} OrClauseGroupEntry;
+
+int const_transform_or_limit = 15;
+
+static Node *
+transformBoolExprOr(ParseState *pstate, Expr *expr_orig)
+{
+	List		   *or_list = NIL;
+	ListCell	   *lc_eargs,
+   *lc_args;
+	List		   *groups_list = NIL;
+	bool			change_apply = false;
+	const char 	   *opname;
+	Node 		   *result;
+	bool			or_statement=false;
+	BoolExpr 	   *expr = (BoolExpr *)copyObject(expr_orig);
+
+	Assert(IsA(expr, BoolExpr));
+
+	/* If this is not expression "Or", then will do it the old way. */
+	switch (expr->boolop)
+	{
+		case AND_EXPR:
+			opname = "AND";
+			break;
+		case OR_EXPR:
+			opname = "OR";
+			or_statement = true;
+			break;
+		case NOT_EXPR:
+			opname = "NOT";
+			break;
+		default:
+			elog(ERROR, "unrecognized boolop: %d", (int) expr->boolop);
+			opname = NULL;		/* keep compiler quiet */
+			break;
+	}
+
+	if (!or_statement || list_length(expr->args) < const_transform_or_limit)
+		return transformBoolExpr(pstate, (BoolExpr *)expr_orig);
+
+	/*
+		* NOTE:
+		* It is an OR-clause. So, rinfo->orclause is a BoolExpr node, contains
+		* a list of sub-restrictinfo args, and rinfo->clause - which is the
+		* same expression, made from bare clauses. To not break selectivity
+		* caches and other optimizations, use both:
+		* - use rinfos 

Re: eqjoinsel_semi still sucks ...

2023-06-25 Thread Alena Rybakina
get to impact the 
join size estimate at all, since set_joinrel_size_estimates is not going 
to factor the inner rel size into what it multiplies the join 
selectivity against.    In short, I was mistakenly extrapolating from 
the observation that it helped to hack the ndistinct estimate for a 
semijoin's inner rel, to the conclusion that we should do that for all 
join input rels. *


*...*
*Yeah, those are estimating that all the outer rows have join partners, 
because there are more distinct values in the sub-select than there are 
in the outer relation.

*

In addition, the answer to the question why clamping is necessary is 
also in the comment to commit 97930cf5, and this not only reduced the 
number of rows coming out of a table and  the number of 
possibly-distinct values of a join variable, but join restriction 
clauses that might have been applied at a lower level of join.


*That patch accounted for baserel restriction clauses that reduced the 
number of rows coming out of a table (and hence the number of 
possibly-distinct values of a join variable), but not for join 
restriction clauses that might have been applied at a lower level of 
join.  To account for the latter, look up the sizes of the min_lefthand 
and min_righthand inputs of the current join, and clamp with those in 
the same way as for the base relations.*



1. 
https://www.postgresql.org/message-id/flat/5156.1314829311%40sss.pgh.pa.us#86609b6ac6af405dec67316bfbe9868f


2. https://www.postgresql.org/message-id/raw/5156.1314829311%40sss.pgh.pa.us



Regards,

Alena Rybakina


Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-05-21 Thread Alena Rybakina

I'm sorry I was unable to respond right away.

On 09.05.2023 17:23, torikoshia wrote:
You may already understand it, but these variable names are given in 
imitation of FREEZE and BINARY cases:


  --- a/src/include/commands/copy.h
  +++ b/src/include/commands/copy.h
  @@ -42,6 +42,7 @@ typedef struct CopyFormatOptions
   * -1 if not specified */
  bool    binary; /* binary format? */
  bool    freeze; /* freeze rows on loading? */
  +   bool    ignore_datatype_errors;  /* ignore rows with 
datatype errors */


  --- a/src/backend/commands/copy.c
  +++ b/src/backend/commands/copy.c
  @@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate,
  bool    format_specified = false;
  bool    freeze_specified = false;
  bool    header_specified = false;
  +   bool    ignore_datatype_errors_specified = false;

I think it would be sane to align the names with the FREEZE and BINARY 
options.


I agree with the name is too long and we once used the name 
'ignore_errors'.
However, current implementation does not ignore all errors but just 
data type error, so I renamed it.

There may be a better name, but I haven't come up with one.


Yes, you are right, I saw it.



As far as I take a quick look at on PostgreSQL source code, there're 
few variable name with "_counter". It seems to be used for function names.

Something like "ignored_errors_count" might be better.
I noticed that many variables are named with the "_counter" postfix, and 
most of them are used as a counter. For example, PgStat_StatTabEntry or 
JitInstrumentation structures consisted of many such variables. Despite 
this, I agree with your suggested name, because I found many similar 
variables that are used in the program as a counter, but it seems to me 
that the most of them are still used by local variables in the function.





Fwd: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-05-17 Thread Alena Rybakina

I'm sorry I was unable to respond right away.

On 09.05.2023 17:23, torikoshia wrote:
You may already understand it, but these variable names are given in 
imitation of FREEZE and BINARY cases:


  --- a/src/include/commands/copy.h
  +++ b/src/include/commands/copy.h
  @@ -42,6 +42,7 @@ typedef struct CopyFormatOptions
   * -1 if not specified */
  bool    binary; /* binary format? */
  bool    freeze; /* freeze rows on loading? */
  +   bool    ignore_datatype_errors;  /* ignore rows with 
datatype errors */


  --- a/src/backend/commands/copy.c
  +++ b/src/backend/commands/copy.c
  @@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate,
  bool    format_specified = false;
  bool    freeze_specified = false;
  bool    header_specified = false;
  +   bool    ignore_datatype_errors_specified = false;

I think it would be sane to align the names with the FREEZE and BINARY 
options.


I agree with the name is too long and we once used the name 
'ignore_errors'.
However, current implementation does not ignore all errors but just 
data type error, so I renamed it.

There may be a better name, but I haven't come up with one.


Yes, you are right, I saw it.



As far as I take a quick look at on PostgreSQL source code, there're 
few variable name with "_counter". It seems to be used for function names.

Something like "ignored_errors_count" might be better.
I noticed that many variables are named with the "_counter" postfix, and 
most of them are used as a counter. For example, PgStat_StatTabEntry or 
JitInstrumentation structures consisted of many such variables. Despite 
this, I agree with your suggested name, because I found many similar 
variables that are used in the program as a counter, but it seems to me 
that the most of them are still used by local variables in the function.





Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-05-06 Thread Alena Rybakina

Hi!

Thank you, Damir, for your patch. It is very interesting to review it!

It seemed to me that the names of variables are not the same everywhere.

I noticed that you used /ignore_datatype_errors_specified/ variable in 
/copy.c/ , but guc has a short name /ignore_datatype_errors/. Also you 
used the short variable name in /CopyFormatOptions/ structure.
Name used /ignore_datatype_errors_specified /is seemed very long to me, 
may be use a short version of it (/ignore_datatype_errors/) in /copy.c/ too?


Besides, I noticed that you used /ignored_errors/ variable in 
/CopyFromStateData/ structure and it's name is strikingly similar to 
name (/ignore_datatype_error//s/), but they have different meanings.

Maybe it will be better to rename it as /ignored_errors_counter/?


I tested last version 
/v5-0001-Add-new-COPY-option-IGNORE_DATATYPE_ERRORS.patch/ with /bytea/ 
data type and transaction cases. Eventually, I didn't find any problem 
there.

I described my steps in more detail, if I missed something.

*First of all, I ran copy function with IGNORE_DATATYPE_ERRORS parameter 
being in transaction block.*

/
//File t2.csv exists:/

|id,b
769,\
1,\6e
2,\x5
5,\x|

/Test:/
CREATE TABLE t (id INT , b  BYTEA) ;
postgres=# BEGIN;
copy t FROM '/home/alena/postgres/t2.csv'  WITH (format 'csv', 
IGNORE_DATATYPE_ERRORS, delimiter ',', HEADER);

SAVEPOINT my_savepoint;
BEGIN
WARNING:  invalid input syntax for type bytea
WARNING:  invalid input syntax for type bytea
WARNING:  invalid hexadecimal data: odd number of digits
WARNING:  3 rows were skipped due to data type incompatibility
COPY 1
SAVEPOINT
postgres=*# copy t FROM '/home/alena/postgres/t2.csv'  WITH (format 
'csv', IGNORE_DATATYPE_ERRORS, delimiter ',', HEADER);

WARNING:  invalid input syntax for type bytea
WARNING:  invalid input syntax for type bytea
WARNING:  invalid hexadecimal data: odd number of digits
WARNING:  3 rows were skipped due to data type incompatibility
COPY 1
postgres=*# ROLLBACK TO my_savepoint;
ROLLBACK
postgres=*# select * from t;
 id | b
+
  5 | \x
(1 row)

postgres=*# copy t FROM '/home/alena/postgres/t2.csv'  WITH (format 
'csv', IGNORE_DATATYPE_ERRORS, delimiter ',', HEADER);

WARNING:  invalid input syntax for type bytea
WARNING:  invalid input syntax for type bytea
WARNING:  invalid hexadecimal data: odd number of digits
WARNING:  3 rows were skipped due to data type incompatibility
COPY 1
postgres=*# select * from t;
 id | b
+
  5 | \x
  5 | \x
(2 rows)

postgres=*# commit;
COMMIT

*I tried to use the similar test and moved transaction block in function:*
CREATE FUNCTION public.log2()
 RETURNS void
 LANGUAGE plpgsql
 SECURITY DEFINER
AS $function$
BEGIN;
copy t FROM '/home/alena/postgres/t2.csv'  WITH (format 'csv', 
IGNORE_DATATYPE_ERRORS, delimiter ',', HEADER);

SAVEPOINT my_savepoint;
END;
$function$;

postgres=# delete from t;

postgres=# select 1 as t from log2();
WARNING:  invalid input syntax for type bytea
WARNING:  invalid input syntax for type bytea
WARNING:  invalid hexadecimal data: odd number of digits
WARNING:  3 rows were skipped due to data type incompatibility
 t
---
 1
(1 row)

*Secondly I checked function copy with bytea datatype. *
/t1.csv consists:/
id,b
769,\x2d
1,\x6e
2,\x5c
5,\x

/And I ran it:/

postgres=# delete from t;
DELETE 4
postgres=# copy t FROM '/home/alena/postgres/t2.csv'  WITH (format 
'csv', IGNORE_DATATYPE_ERRORS, delimiter ',', HEADER);

WARNING:  invalid input syntax for type bytea
WARNING:  invalid input syntax for type bytea
WARNING:  invalid hexadecimal data: odd number of digits
WARNING:  3 rows were skipped due to data type incompatibility
COPY 1
postgres=# select * from t;
 id | b
+
  5 | \x
(1 row)

--
---
Alena Rybakina
Postgres Professional


Re: POC, WIP: OR-clause support for indexes

2023-01-14 Thread Alena Rybakina
ch I see for now:
1 Calculating cost. Right now it's just a simple transformation of 
costs computed for BitmapOr path. I'd like to hope that's possible 
and so index's estimation function could be non-touched. So, they 
could believe that all clauses are implicitly-ANDed
2 I'd like to add such support to btree but it seems that it should 
be a separated patch. Btree search algorithm doesn't use any kind of 
stack of pages and algorithm to walk over btree doesn't clear for me 
for now.
3 I could miss some places which still assumes  implicitly-ANDed list 
of clauses although regression tests passes fine.
I support such a cunning approach. But this specific case, you 
demonstrated above, could be optimized independently at an earlier 
stage. If to convert:


(F(A) = ConstStableExpr_1) OR (F(A) = ConstStableExpr_2)
to
F(A) IN (ConstStableExpr_1, ConstStableExpr_2)

it can be seen significant execution speedup. For example, using the 
demo.sql to estimate maximum positive effect we see about 40% of 
execution and 100% of planning speedup.


To avoid unnecessary overhead, induced by the optimization, such 
transformation may be made at the stage of planning (we have 
cardinality estimations and have pruned partitions) but before 
creation of a relation scan paths. So, we can avoid planning overhead 
and non-optimal BitmapOr in the case of many OR's possibly aggravated 
by many indexes on the relation.
For example, such operation can be executed in create_index_paths() 
before passing rel->indexlist.



--
Alena Rybakina
Postgres Professional


Re: RFC: Logging plan of the running query

2022-09-23 Thread Alena Rybakina

Sorry, I wrote confusingly at that time.

No, I suggested adding comment about the explanation of 
HandleLogQueryPlanInterrupt() only in the explain.h and not removing 
from the explain.c.


I seemed to be necessary separating declaration function for 'explaining 
feature' of executed query from our logging plan of the running query 
interrupts function. But now, I doubt it.


I'm not sure I understand your comment correctly, do you mean 
HandleLogQueryPlanInterrupt() should not be placed in explain.c? 



Thank you for having reminded about this function and I looked at 
ProcessLogMemoryContextInterrupt() declaration. I'm noticed comments in 
the memutils.h are missed tooю


Description of this function is written only in mcxt.c.
However, given that ProcesLogMemoryContextInterrupt(), which similarly 
handles interrupts for pg_log_backend_memory_contexts(), is located in 
mcxt.c, I also think current location might be acceptable. 



So I think you are right and the comment about the explanation of 
HandleLogQueryPlanInterrupt() written in explain.h is redundant.


I feel this comment is unnecessary since the explanation of 
HandleLogQueryPlanInterrupt() is written in explain.c and no functions 
in explain.h have comments in it. 


Regards,

--
Alena Rybakina
Postgres Professional





Re: RFC: Logging plan of the running query

2022-09-21 Thread Alena Rybakina

 Ok, I get it.

Since GetLockMethodLocalHash() is only used for assertions, this is 
only defined when USE_ASSERT_CHECKING is enabled. This patch uses 
GetLockMethodLocalHash() not only for the assertion purpose, so I 
removed "ifdef USE_ASSERT_CHECKING" for this function. I belive it 
does not lead to skip errors. 

Agree.
Since these processes are needed for nested queries, not only for 
AbortSubTransaction[1], added comments on it.


I also noticed it. However I also discovered that above function 
declarations to be aplied for explain command and used to be printed 
details of the executed query.


We have a similar task to print the plan of an interrupted process 
making a request for a specific pid.


In short, I think, this task is different and for separating these parts 
I added this comment.


I feel this comment is unnecessary since the explanation of 
HandleLogQueryPlanInterrupt() is written in explain.c and no functions 
in explain.h have comments in it. 


Yes, I was worried about it. I understood it, thank for explaining.



AFAIU both of them are processed since SendProcSignal flags 
ps_signalFlags for each signal.


```
SendProcSignal(pid_t pid, ProcSignalReason reason, BackendId backendId)
{
volatile ProcSignalSlot *slot;
...(snip)...
278 if (slot->pss_pid == pid)
279 {
280 /* Atomically set the proper flag */
281 slot->pss_signalFlags[reason] = true;
282 /* Send signal */
283 return kill(pid, SIGUSR1);
```

Comments of ProcSignalReason also say 'We can cope with concurrent 
signals for different reasons'.


```C
/*
* Reasons for signaling a Postgres child process (a backend or an 
auxiliary
* process, like checkpointer). We can cope with concurrent signals for 
different
* reasons. However, if the same reason is signaled multiple times in 
quick
* succession, the process is likely to observe only one notification 
of it.

* This is okay for the present uses.
...
typedef enum
{
PROCSIG_CATCHUP_INTERRUPT, /* sinval catchup interrupt */
PROCSIG_NOTIFY_INTERRUPT, /* listen/notify interrupt */
PROCSIG_PARALLEL_MESSAGE, /* message from cooperating parallel backend */
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for 
shutdown */

PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
PROCSIG_LOG_QUERY_PLAN, /* ask backend to log plan of the current 
query */

...
} ProcSignalReason;
```


[1] 
https://www.postgresql.org/message-id/8b53b32f-26cc-0531-4ac0-27310e0bef4b%40oss.nttdata.com