Re: Vacuum statistics
On 30.05.2024 10:33, Alena Rybakina wrote: I suggest gathering information about vacuum resource consumption for processing indexes and tables and storing it in the table and index relationships (for example, PgStat_StatTabEntry structure like it has realized for usual statistics). It will allow us to determine how well the vacuum is configured and evaluate the effect of overhead on the system at the strategic level, the vacuum has gathered this information already, but this valuable information doesn't store it. My colleagues and I have prepared a patch that can help to solve this problem. We are open to feedback. -- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company From dfda656b35be2a73c076cb723fdeb917630e61e3 Mon Sep 17 00:00:00 2001 From: Alena Rybakina Date: Thu, 30 May 2024 11:02:10 -0700 Subject: [PATCH] Machinery for grabbing an extended vacuum statistics on relations. Remember, statistic on heap and index relations a bit different. Value of total_blks_hit, total_blks_read, total_blks_dirtied are number of hitted, missed and dirtied pages in shared buffers during a vacuum operation respectively. total_blks_dirtied means 'dirtied only by this action'. So, if this page was dirty before the vacuum operation, it doesn't count this page as 'dirtied'. The tuples_deleted parameter is the number of tuples cleaned up by the vacuum operation. The delay_time value means total vacuum sleep time in vacuum delay point. The pages_removed value is the number of pages by which the physical data storage of the relation was reduced. The value of pages_deleted parameter is the number of freed pages in the table (file size may not have changed). Interruptions number of (auto)vacuum process during vacuuming of a relation. We report from the vacuum_error_callback routine. So we can log all ERROR reports. In the case of autovacuum we can report SIGINT signals too. It maybe dangerous to make such complex task (send) in an error callback - we can catch ERROR in ERROR problem. But it looks like we have so small chance to stuck into this problem. So, let's try to use. This parameter relates to a problem, covered by b19e4250. Tracking of IO during an (auto)vacuum operation. Introduced variables blk_read_time and blk_write_time tracks only access to buffer pages and flushing them to disk. Reading operation is trivial, but writing measurement technique is not obvious. So, during a vacuum writing time can be zero incremented because no any flushing operations were performed. System time and user time are parameters that describes how much time a vacuum operation has spent in executing of code in user space and kernel space accordingly. Also, accumulate total time of a vacuum that is a diff between timestamps in start and finish points in the vacuum code. Remember about idle time, when vacuum waited for IO and locks, so total time isn't equal a sum of user and system time, but no less. pages_frozen - number of pages that are marked as frozen in vm during vacuum. This parameter is incremented if page is marked as all-frozen. pages_all_visible - number of pages that are marked as all-visible in vm during vacuum. Authors: Alena Rybakina , Andrei Lepikhov , Andrei Zubkov --- src/backend/access/heap/vacuumlazy.c | 245 +- src/backend/access/heap/visibilitymap.c | 13 + src/backend/catalog/system_views.sql | 123 + src/backend/commands/vacuum.c | 4 + src/backend/commands/vacuumparallel.c | 1 + src/backend/utils/activity/pgstat.c | 117 +++-- src/backend/utils/activity/pgstat_database.c | 1 + src/backend/utils/activity/pgstat_relation.c | 52 ++- src/backend/utils/adt/pgstatfuncs.c | 290 src/backend/utils/error/elog.c| 13 + src/include/catalog/pg_proc.dat | 28 +- src/include/commands/vacuum.h | 1 + src/include/pgstat.h | 118 - src/include/utils/elog.h | 2 +- src/include/utils/pgstat_internal.h | 36 +- .../expected/vacuum-extended-statistic.out| 419 ++ .../isolation/expected/vacuum-extending.out | 68 +++ src/test/isolation/isolation_schedule | 2 + .../specs/vacuum-extended-statistic.spec | 179 .../isolation/specs/vacuum-extending.spec | 58 +++ src/test/regress/expected/opr_sanity.out | 9 +- src/test/regress/expected/rules.out | 79 22 files changed, 1812 insertions(+), 46 deletions(-) create mode 100644 src/test/isolation/expected/vacuum-extended-statistic.out create mode 100644 src/test/isolation/expected/vacuum-extending.out create mode 100644 src/test/isolation/specs/vacuum-extended-statistic.spec create mode 100644 src/test/isolation/specs/vacuum-extending.spec diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap
Vacuum statistics
Hello, everyone! I think we don't have enough information to analyze vacuum functionality. Needless to say that the vacuum is the most important process for a database system. It prevents problems like table and index bloating and emergency freezing if we have a wraparound problem. Furthermore, it keeps the visibility map up to date. On the other hand, because of incorrectly adjusted aggressive settings of autovacuum it can consume a lot of computing resources that lead to all queries to the system running longer. Nowadays the vacuum gathers statistical information about tables, but it is important not for optimizer only. Because the vacuum is an automation process, there are a lot of settings that determine their aggressive functionality to other objects of the database. Besides, sometimes it is important to set a correct parameter for the specified table, because of its dynamic changes. An administrator of a database needs to set the settings of autovacuum to have a balance between the vacuum's useful action in the database system on the one hand, and the overhead of its workload on the other. However, it is not enough for him to decide on vacuum functionality through statistical information about the number of vacuum passes through tables and operational data from progress_vacuum, because it is available only during vacuum operation and does not provide a strategic overview over the considered period. To sum up, an automation vacuum has a strategic behavior because the frequency of its functionality and resource consumption depends on the workload of the database. Its workload on the database is minimal for an append-only table and it is a maximum for the table with a high-frequency updating. Furthermore, there is a high dependence of the vacuum load on the number and volume of indexes. Because of the absence of the visibility map for indexes, the vacuum scans the index completely, and the worst situation when it needs to do it during a bloating index situation in a small table. I suggest gathering information about vacuum resource consumption for processing indexes and tables and storing it in the table and index relationships (for example, PgStat_StatTabEntry structure like it has realized for usual statistics). It will allow us to determine how well the vacuum is configured and evaluate the effect of overhead on the system at the strategic level, the vacuum has gathered this information already, but this valuable information doesn't store it. -- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Sort operation displays more tuples than it contains its subnode
Yes, I got it. Thank you very much for the explanation. On 23.05.2024 00:17, Tom Lane wrote: "a.rybakina" writes: I faced the issue, when the sorting node in the actual information shows a larger number of tuples than it actually is. And I can not understand why? If I'm reading this correctly, the sort node you're worrying about feeds the inner side of a merge join. Merge join will rewind its inner side to the start of the current group of equal-keyed tuples whenever it sees that the next outer tuple must also be joined to that group. Since what EXPLAIN is counting is the number of tuples returned from the node, that causes it to double-count those tuples. The more duplicate-keyed tuples on the outer side, the bigger the effect. You can see the same thing happening at the Materialize a little further up, which is feeding the inside of the other merge join. -- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Fix parallel vacuum buffer usage reporting
Hi! I could try to check it with the test, but I want to ask you about details, because I'm not sure that I completely understand the test case. You mean that we need to have two backends and on one of them we deleted the tuples before vacuum called the other, do you? On 10.05.2024 13:25, Nazir Bilal Yavuz wrote: Hi, Thank you for working on this! On Wed, 1 May 2024 at 06:37, Masahiko Sawada wrote: Thank you for further testing! I've pushed the patch. I realized a behaviour change while looking at 'Use pgBufferUsage for block reporting in analyze' thread [1]. Since that change applies here as well, I thought it is better to mention it here. Before this commit, VacuumPageMiss did not count the blocks if its read was already completed by other backends [2]. Now, 'pgBufferUsage.local_blks_read + pgBufferUsage.shared_blks_read' counts every block attempted to be read; possibly double counting if someone else has already completed the read. I do not know which behaviour is correct but I wanted to mention this. [1] https://postgr.es/m/CAO6_Xqr__kTTCLkftqS0qSCm-J7_xbRG3Ge2rWhucxQJMJhcRA%40mail.gmail.com [2] In the WaitReadBuffers() function, see comment: /* * Skip this block if someone else has already completed it. If an * I/O is already in progress in another backend, this will wait for * the outcome: either done, or something went wrong and we will * retry. */ -- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Fix parallel vacuum buffer usage reporting
Hi! On 30.04.2024 05:18, Masahiko Sawada wrote: On Fri, Apr 26, 2024 at 9:12 PM Alena Rybakina wrote: Hi! The same script was run, but using vacuum verbose analyze, and I saw the difference again in the fifth step: with your patch: buffer usage: 32312 hits, 607 misses, 1566 dirtied master: buffer usage: 32346 hits, 573 misses, 1360 dirtied Isn't there a chance for the checkpointer to run during this time? That could make the conditions between the two runs slightly different and explain the change in buffer report. [0]https://github.com/postgres/postgres/blob/8a1b31e6e59631807a08a4e9465134c343bbdf5e/src/backend/access/heap/vacuumlazy.c#L2826-L2831 Looking at the script, you won't trigger the problem. Thank you for the link I accounted it in my next experiments. I repeated the test without processing checkpoints with a single index, and the number of pages in the buffer used almost matched: master branch: buffer usage: 32315 hits, 606 misses, 4486 dirtied with applied patch v4 version: buffer usage: 32315 hits, 606 misses, 4489 dirtied I think you are right - the problem was interfering with the checkpoint process, by the way I checked the first version patch. To cut a long story short, everything is fine now with one index. Just in case, I'll explain: I considered this case because your patch could have impact influenced it too. On 25.04.2024 10:17, Anthonin Bonnefoy wrote: On Wed, Apr 24, 2024 at 4:01 PM Alena Rybakina wrote: I tested the main postgres branch with and without your fix using a script that was written by me. It consists of five scenarios and I made a comparison in the logs between the original version of the master branch and the master branch with your patch: Hi! Thanks for the tests. I have attached a test file (vacuum_check_logs.sql) The reporting issue will only happen if there's a parallel index vacuum and it will only happen if there's at least 2 indexes [0]. You will need to create an additional index. Speaking of the problem, I added another index and repeated the test and found a significant difference: I found it when I commited the transaction (3): master: 2964 hits, 0 misses, 0 dirtied with applied patch v4 version: buffer usage: 33013 hits, 0 misses, 3 dirtied When I deleted all the data from the table and later started vacuum verbose again (4): master: buffer usage: 51486 hits, 0 misses, 0 dirtied with applied patch v4 version:buffer usage: 77924 hits, 0 misses, 0 dirtied when I inserted 1 million data into the table and updated it (5): master:buffer usage: 27904 hits, 5021 misses, 1777 dirtied with applied patch v4 version:buffer usage: 41051 hits, 9973 misses, 2564 dirtied As I see, the number of pages is significantly more than it was in the master branch and ,frankly, I couldn't fully figure out if it was a mistake or not. I think that the patch fixes the problem correctly. I've run pgindent and updated the commit message. I realized that parallel vacuum was introduced in pg13 but buffer usage reporting in VACUUM command was implemented in pg15. Therefore, in pg13 and pg14, VACUUM (PARALLEL) is available but VACUUM (PARALLEL, VERBOSE) doesn't show the buffer usage report. Autovacuum does show the buffer usage report but parallel autovacuum is not supported. Therefore, we should backpatch it down to 15, not 13. I'm going to push the patch down to pg15, barring any objections. Regards, I agree with you about porting and I saw that the patch is working correctly. -- Regards, Alena Rybakina Postgres Professional:http://www.postgrespro.com The Russian Postgres Company
Re: Fix parallel vacuum buffer usage reporting
Hi! The same script was run, but using vacuum verbose analyze, and I saw the difference again in the fifth step: with your patch: buffer usage: 32312 hits, 607 misses, 1566 dirtied master: buffer usage: 32346 hits, 573 misses, 1360 dirtied Isn't there a chance for the checkpointer to run during this time? That could make the conditions between the two runs slightly different and explain the change in buffer report. [0] https://github.com/postgres/postgres/blob/8a1b31e6e59631807a08a4e9465134c343bbdf5e/src/backend/access/heap/vacuumlazy.c#L2826-L2831 Looking at the script, you won't trigger the problem. Thank you for the link I accounted it in my next experiments. I repeated the test without processing checkpoints with a single index, and the number of pages in the buffer used almost matched: master branch: buffer usage: 32315 hits, 606 misses, 4486 dirtied with applied patch v4 version: buffer usage: 32315 hits, 606 misses, 4489 dirtied I think you are right - the problem was interfering with the checkpoint process, by the way I checked the first version patch. To cut a long story short, everything is fine now with one index. Just in case, I'll explain: I considered this case because your patch could have impact influenced it too. On 25.04.2024 10:17, Anthonin Bonnefoy wrote: On Wed, Apr 24, 2024 at 4:01 PM Alena Rybakina wrote: I tested the main postgres branch with and without your fix using a script that was written by me. It consists of five scenarios and I made a comparison in the logs between the original version of the master branch and the master branch with your patch: Hi! Thanks for the tests. I have attached a test file (vacuum_check_logs.sql) The reporting issue will only happen if there's a parallel index vacuum and it will only happen if there's at least 2 indexes [0]. You will need to create an additional index. Speaking of the problem, I added another index and repeated the test and found a significant difference: * I found it when I commited the transaction (3): master: 2964hits, 0misses, 0dirtied with applied patch v4 version: buffer usage: 33013hits, 0misses, 3dirtied * When I deleted all the data from the table and later started vacuum verbose again (4): master: buffer usage: 51486hits, 0misses, 0dirtied with applied patch v4 version:buffer usage: 77924hits, 0misses, 0dirtied * when I inserted 1 million data into the table and updated it (5): master:buffer usage: 27904hits, 5021misses, 1777dirtied with applied patch v4 version:buffer usage: 41051hits, 9973misses, 2564dirtied As I see, the number of pages is significantly more than it was in the master branch and ,frankly, I couldn't fully figure out if it was a mistake or not. I attached a test script (vacuum_checks_logs.sql) with two indexes and no checkpoints, I also attached log files: the first one (vacuum_test) is the result of testing on the master branch, the second file with your applied patch (vacuum_test_v4). -- Regards, Alena Rybakina Postgres Professional:http://www.postgrespro.com The Russian Postgres Company vacuum_check_logs.sql Description: application/sql postgres=# \i ~/vacuum_check_logs.sql psql:/home/alena/vacuum_check_logs.sql:2: ERROR: extension "dblink" does not exist CREATE EXTENSION dblink_connect OK (1 row) psql:/home/alena/vacuum_check_logs.sql:5: ERROR: table "vestat" does not exist SET SET CREATE TABLE CREATE INDEX CREATE INDEX psql:/home/alena/vacuum_check_logs.sql:16: INFO: vacuuming "postgres.public.vestat" psql:/home/alena/vacuum_check_logs.sql:16: INFO: finished vacuuming "postgres.public.vestat": index scans: 0 pages: 0 removed, 0 remain, 0 scanned (100.00% of total) tuples: 0 removed, 0 remain, 0 are dead but not yet removable removable cutoff: 742, which was 0 XIDs old when operation ended new relfrozenxid: 742, which is 3 XIDs ahead of previous value frozen: 0 pages from table (100.00% of total) had 0 tuples frozen index scan not needed: 0 pages from table (100.00% of total) had 0 dead item identifiers removed I/O timings: read: 0.049 ms, write: 0.000 ms avg read rate: 40.584 MB/s, avg write rate: 0.000 MB/s buffer usage: 23 hits, 2 misses, 0 dirtied WAL usage: 1 records, 0 full page images, 237 bytes system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s VACUUM pg_sleep -- (1 row) INSERT 0 100 DELETE 10 psql:/home/alena/vacuum_check_logs.sql:25: INFO: vacuuming "postgres.public.vestat" psql:/home/alena/vacuum_check_logs.sql:25: INFO: launched 1 parallel vacuum worker for index vacuuming (planned: 1) psql:/home/alena/vacuum_check_logs.sql:25: INFO: table "vestat": truncated 3922 to 3530 pages psql:/home/alena/vacuum_check_logs.sql:25: INFO: finished vacuuming "postgres.public.vestat": index scans: 1 pages: 392 removed, 3530 remain, 3922 scanned (100.00% o
Re: Fix parallel vacuum buffer usage reporting
On 22.04.2024 11:07, Anthonin Bonnefoy wrote: On Sat, Apr 20, 2024 at 2:00 PM Alena Rybakina wrote: Hi, thank you for your work with this subject. While I was reviewing your code, I noticed that your patch conflicts with another patch [0] that been committed. [0] https://www.postgresql.org/message-id/flat/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com I've rebased the patch and also split the changes: 1: Use pgBufferUsage in Vacuum and Analyze block reporting 2: Stop tracking buffer usage in the now unused Vacuum{Hit,Miss,Dirty} variables 3: Remove declarations of Vacuum{Hit,Miss,Dirty} Hi! Thank you for your work, and I have reviewed your corrections. I tested the main postgres branch with and without your fix using a script that was written by me. It consists of five scenarios and I made a comparison in the logs between the original version of the master branch and the master branch with your patch: 1. I added 1 million data to the table and deleted 10% of them. I ran vacuum verbose and didn't see any differences. buffer usage: 12585 hits, 0 misses, 4 dirtied 2. I opened another connection with a repeatable read transaction through the dblink extension and launched a query updating the records in the table under test. Later, I ran vacuum verbose again and also didn't see any differences. buffer usage: 19424 hits, 0 misses, 6 dirtied 3. I commited transaction and ran vacuum verbose again. Everything is fine in the logs. buffer usage: 22960 hits, 0 misses, 11456 dirtied 4. I deleted all the data from the table and later started vacuum verbose again. The number of pages in the buffer matched with your patch too. 5.I inserted 1 million data into the table and updated it, and I found the difference between the original master version and the version with your patch: with your patch: buffer usage: 32315 hits, 606 misses, 1547 dirtied original version: buffer usage: 32348 hits, 573 misses, 1456 dirtied I suppose, something wasn't calculated. The same script was run, but using vacuum verbose analyze, and I saw the difference again in the fifth step: with your patch: buffer usage: 32312 hits, 607 misses, 1566 dirtied master: buffer usage: 32346 hits, 573 misses, 1360 dirtied I have attached a test file (vacuum_check_logs.sql) and four log files: two with the vacuum verbose command (vacuum_log and vacuum_log_with_patch files) and two others with the vacuum verbose analyze command (vacuum_analyze_test_with_patch and vacuum_analyze_test files). Both test approaches show logs with and without your changes. 1: Use pgBufferUsage in Vacuum and Analyze block reporting I think that if the anayze command doesn't have the same issue, we don't need to change it. Making the vacuum and the analyze consistent is a good point but I'd like to avoid doing unnecessary changes in back branches. I think the patch set would contain: (a) make lazy vacuum use BufferUsage instead of VacuumPage{Hit,Miss,Dirty}. (backpatched down to pg13). (b) make analyze use BufferUsage and remove VacuumPage{Hit,Miss,Dirty} variables for consistency and simplicity (only for HEAD, if we agree). BTW I realized that VACUUM VERBOSE running on a temp table always shows the number of dirtied buffers being 0, which seems to be a bug. The patch (a) will resolve it as well. I agree with that. I think we can leave these changes to the analysis command for master, but I also doubt the need to backport his changes to back versions. -- Regards, Alena Rybakina Postgres Professional:http://www.postgrespro.com The Russian Postgres Company postgres=# \i ~/vacuum_check_logs.sql psql:/home/alena/vacuum_check_logs.sql:2: ERROR: extension "dblink" does not exist CREATE EXTENSION dblink_connect OK (1 row) psql:/home/alena/vacuum_check_logs.sql:5: ERROR: table "vestat" does not exist SET SET CREATE TABLE CREATE INDEX psql:/home/alena/vacuum_check_logs.sql:15: INFO: vacuuming "postgres.public.vestat" psql:/home/alena/vacuum_check_logs.sql:15: INFO: finished vacuuming "postgres.public.vestat": index scans: 0 pages: 0 removed, 0 remain, 0 scanned (100.00% of total) tuples: 0 removed, 0 remain, 0 are dead but not yet removable removable cutoff: 741, which was 0 XIDs old when operation ended new relfrozenxid: 741, which is 2 XIDs ahead of previous value frozen: 0 pages from table (100.00% of total) had 0 tuples frozen index scan not needed: 0 pages from table (100.00% of total) had 0 dead item identifiers removed I/O timings: read: 0.100 ms, write: 0.000 ms avg read rate: 29.260 MB/s, avg write rate: 0.000 MB/s buffer usage: 13 hits, 1 misses, 0 dirtied WAL usage: 1 records, 0 full page images, 237 bytes system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s VACUUM pg_sleep -- (1 row) INSERT 0 100 DELETE 10 psql:/home/alena/vacuum_check_logs.sql:24: INFO: vacuuming &
Re: Fix parallel vacuum buffer usage reporting
Hi, thank you for your work with this subject. While I was reviewing your code, I noticed that your patch conflicts with another patch [0] that been committed. [0] https://www.postgresql.org/message-id/flat/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com On 28.03.2024 13:12, Anthonin Bonnefoy wrote: Hi, Thanks for the review. On Thu, Mar 28, 2024 at 4:07 AM Masahiko Sawada wrote: As for the proposed patch, the following part should handle the temp tables too: True, I've missed the local blocks. I've updated the patch: - read_rate and write_rate now include local block usage - I've added a specific output for reporting local blocks instead of combining them in the same output. Regards, Anthonin -- Regards, Alena Rybakina Postgres Professional:http://www.postgrespro.com The Russian Postgres Company
Re: post-freeze damage control
Hi! I also worked with this patch and until your explanation I didn’t fully understand the reasons why it was wrong to have this implementation when removing duplicate OR expressions. Thank you, now I understand it! I agree with explanation of Andrei Lepikhov regarding the fact that there were difficulties in moving the patch to another place and the explanation of Alexander Korotkov and Peter Geoghegan regarding the need to apply this transformation. Let me just add that initially this patch tried to solve a problem where 50,000 OR expressions were generated and there was a problem running that query using a plan with BitmapScan. I wrote more about this here [0]. If the patch can be improved and you can tell me how, because I don’t quite understand how to do it yet, to be honest, then I’ll be happy to work on it too. [0] https://www.postgresql.org/message-id/052172e4-6d75-8069-3179-26de339dca03%40postgrespro.ru -- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Parallel Bitmap Heap Scan reports per-worker stats in EXPLAIN ANALYZE
Hi! Thank you for your work on this issue! Your patch required a little revision. I did this and attached the patch. Also, I think you should add some clarification to the comments about printing 'exact' and 'loosy' pages in show_hashagg_info function, which you get from planstate->stats, whereas previously it was output only from planstate. Perhaps it is enough to mention this in the comment to the commit. I mean this place: diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 926d70afaf..02251994c6 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -3467,26 +3467,57 @@ show_hashagg_info(AggState *aggstate, ExplainState *es) static void show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es) { + Assert(es->analyze); + if (es->format != EXPLAIN_FORMAT_TEXT) { ExplainPropertyInteger("Exact Heap Blocks", NULL, - planstate->exact_pages, es); + planstate->stats.exact_pages, es); ExplainPropertyInteger("Lossy Heap Blocks", NULL, - planstate->lossy_pages, es); + planstate->stats.lossy_pages, es); } else { - if (planstate->exact_pages > 0 || planstate->lossy_pages > 0) + if (planstate->stats.exact_pages > 0 || planstate->stats.lossy_pages > 0) { ExplainIndentText(es); appendStringInfoString(es->str, "Heap Blocks:"); - if (planstate->exact_pages > 0) - appendStringInfo(es->str, " exact=%ld", planstate->exact_pages); - if (planstate->lossy_pages > 0) - appendStringInfo(es->str, " lossy=%ld", planstate->lossy_pages); + if (planstate->stats.exact_pages > 0) + appendStringInfo(es->str, " exact=%ld", planstate->stats.exact_pages); + if (planstate->stats.lossy_pages > 0) + appendStringInfo(es->str, " lossy=%ld", planstate->stats.lossy_pages); appendStringInfoChar(es->str, '\n'); } } + + if (planstate->pstate != NULL) + { + for (int n = 0; n < planstate->sinstrument->num_workers; n++) + { + BitmapHeapScanInstrumentation *si = >sinstrument->sinstrument[n]; + + if (si->exact_pages == 0 && si->lossy_pages == 0) + continue; + + if (es->workers_state) + ExplainOpenWorker(n, es); + + if (es->format == EXPLAIN_FORMAT_TEXT) + { + ExplainIndentText(es); + appendStringInfo(es->str, "Heap Blocks: exact=%ld lossy=%ld\n", + si->exact_pages, si->lossy_pages); + } + else + { + ExplainPropertyInteger("Exact Heap Blocks", NULL, si->exact_pages, es); + ExplainPropertyInteger("Lossy Heap Blocks", NULL, si->lossy_pages, es); + } + + if (es->workers_state) + ExplainCloseWorker(n, es); + } + } } I suggest some code refactoring (diff.diff.no-cfbot file) that allows you to improve your code. -- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c index 973bf56022d..ca8d94ba09a 100644 --- a/src/backend/executor/nodeBitmapHeapscan.c +++ b/src/backend/executor/nodeBitmapHeapscan.c @@ -634,8 +634,9 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node) */ if (node->sinstrument != NULL && IsParallelWorker()) { + BitmapHeapScanInstrumentation *si; Assert(ParallelWorkerNumber <= node->sinstrument->num_workers); - BitmapHeapScanInstrumentation *si = >sinstrument->sinstrument[ParallelWorkerNumber]; + si = >sinstrument->sinstrument[ParallelWorkerNumber]; si->exact_pages += node->stats.exact_pages; si->lossy_pages += node->stats.lossy_pages; } @@ -864,7 +865,7 @@ ExecBitmapHeapInitializeDSM(BitmapHeapScanState *node, ptr = shm_toc_allocate(pcxt->toc, size); pstate = (ParallelBitmapHeapState *) ptr; - ptr += MAXALIGN(sizeof(ParallelBitmapHeapState)); + ptr += size; if (node->ss.ps.instrument && pcxt->nworkers > 0) sinstrument = (SharedBitmapHeapInstrumentation *) ptr; From 8dc939749a3f0cea9ba337c020b9ccd474e7c8e3 Mon Sep 17 00:00:00 2001 From: Alena Rybakina Date: Mon, 8 Apr 2024 23:41:41 +0300 Subject: [P
Re: Support "Right Semi Join" plan shapes
On 06.03.2024 05:23, wenhui qiu wrote: Hi Alena Rybakina For merge join + /* + * For now we do not support RIGHT_SEMI join in mergejoin. + */ + if (jointype == JOIN_RIGHT_SEMI) + { + *mergejoin_allowed = false; + return NIL; + } + Tanks Yes, I see it, thank you. Sorry for the noise. -- Regards, Alena Rybakina Postgres Professional:http://www.postgrespro.com The Russian Postgres Company
Re: POC, WIP: OR-clause support for indexes
Hi! On 07.03.2024 17:51, Alexander Korotkov wrote: Hi! On Tue, Mar 5, 2024 at 9:59 AM Andrei Lepikhov wrote: > On 5/3/2024 12:30, Andrei Lepikhov wrote: > > On 4/3/2024 09:26, jian he wrote: > ... and the new version of the patchset is attached. I made some revisions for the patchset. 1) Use hash_combine() to combine hash values. 2) Upper limit the number of array elements by MAX_SAOP_ARRAY_SIZE. 3) Better save the original order of clauses by putting hash entries and untransformable clauses to the same list. A lot of differences in regression tests output have gone. Thank you for your changes. I agree with them. One important issue I found. # create table t as (select i::int%100 i from generate_series(1,1) i); # analyze t; # explain select * from t where i = 1 or i = 1; QUERY PLAN - Seq Scan on t (cost=0.00..189.00 rows=200 width=4) Filter: (i = ANY ('{1,1}'::integer[])) (2 rows) # set enable_or_transformation = false; SET # explain select * from t where i = 1 or i = 1; QUERY PLAN - Seq Scan on t (cost=0.00..189.00 rows=100 width=4) Filter: (i = 1) (2 rows) We don't make array values unique. That might make query execution performance somewhat worse, and also makes selectivity estimation worse. I suggest Andrei and/or Alena should implement making array values unique. I have corrected this and some spelling mistakes. The unique_any_elements_change.no-cfbot file contains changes. While I was correcting the test results caused by such changes, I noticed that the same behavior was when converting the IN expression, and this can be seen in the result of the regression test: EXPLAIN (COSTS OFF) SELECT unique2 FROM onek2 WHERE stringu1 IN ('A', 'A') AND (stringu1 = 'A' OR stringu1 = 'A'); QUERY PLAN --- Bitmap Heap Scan on onek2 Recheck Cond: (stringu1 < 'B'::name) Filter: ((stringu1 = ANY ('{A,A}'::name[])) AND (stringu1 = 'A'::name)) -> Bitmap Index Scan on onek2_u2_prtl (4 rows) -- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index baeb83e58b..e9813ef6ab 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -312,8 +312,8 @@ TransformOrExprToANY(ParseState *pstate, List *args, int location) if (unlikely(found)) { - entry->consts = lappend(entry->consts, const_expr); - entry->exprs = lappend(entry->exprs, orqual); + entry->consts = list_append_unique(entry->consts, const_expr); + entry->exprs = list_append_unique(entry->exprs, orqual); } else { @@ -352,7 +352,6 @@ TransformOrExprToANY(ParseState *pstate, List *args, int location) entry = (OrClauseGroupEntry *) lfirst(lc); Assert(list_length(entry->consts) > 0); - Assert(list_length(entry->exprs) == list_length(entry->consts)); if (list_length(entry->consts) == 1 || list_length(entry->consts) > MAX_SAOP_ARRAY_SIZE) diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out index 7b721bac71..606a4399f9 100644 --- a/src/test/regress/expected/create_index.out +++ b/src/test/regress/expected/create_index.out @@ -1915,16 +1915,16 @@ SELECT count(*) FROM tenk1 EXPLAIN (COSTS OFF) SELECT count(*) FROM tenk1 WHERE hundred = 42 AND (thousand < 42 OR thousand < 99 OR 43 > thousand OR 42 > thousand); -QUERY PLAN --- + QUERY PLAN +--- Aggregate -> Bitmap Heap Scan on tenk1 - Recheck Cond: ((hundred = 42) AND (thousand < ANY ('{42,99,43,42}'::integer[]))) + Recheck Cond: ((hundred = 42) AND (thousand < ANY ('{42,99,43}'::integer[]))) -> BitmapAnd -> Bitmap Index Scan on tenk1_hundred Index Cond: (hundred = 42) -> Bitmap Index Scan on tenk1_thous_tenthous - Index Cond: (thousand < ANY ('{42,99,43,42}'::integer[])) + Index Cond: (thousand < ANY ('{42,99,43}'::integer[])) (8 rows) EXPLAIN (COSTS OFF) diff --git a/src/test/regress/expected/se
Re: Support "Right Semi Join" plan shapes
To be honest, I didn't see it in the code, could you tell me where they are, please? On 05.03.2024 05:44, Richard Guo wrote: On Tue, Jan 30, 2024 at 2:51 PM Alena Rybakina wrote: I have reviewed your patch and I think it is better to add an Assert for JOIN_RIGHT_SEMI to the ExecMergeJoin and ExecNestLoop functions to prevent the use of RIGHT_SEMI for these types of connections (NestedLoop and MergeJoin). Hmm, I don't see why this is necessary. The planner should already guarantee that we won't have nestloops/mergejoins with right-semi joins. Thanks Richard -- Regards, Alena Rybakina Postgres Professional:http://www.postgrespro.com The Russian Postgres Company
Re: POC, WIP: OR-clause support for indexes
Sorry, I found explanation - https://www.postgresql.org/message-id/CACJufxFS-xcjaWq2Du2OyJUjRAyqCk12Q_zGOPxv61sgrdpw9w%40mail.gmail.com On 03.03.2024 12:26, Alena Rybakina wrote: I found that it was mentioned here - https://www.postgresql.org/message-id/CACJufxFrZS07oBHMk1_c8P3A84VZ3ysXiZV8NeU6gAnvu%2BHsVA%40mail.gmail.com. To be honest, I couldn't find any explanation for that. On 01.03.2024 18:33, Alexander Korotkov wrote: Hi, Andrei, Hi, Alena! On Thu, Feb 29, 2024 at 10:59 AM Andrei Lepikhov wrote: On 28/2/2024 17:27, Alena Rybakina wrote: > Maybe like that: > > It also considers the way to generate a path using BitmapScan indexes, > converting the transformed expression into expressions separated by "OR" > operations, and if it turns out to be the best and finally selects the > best one. Thanks, I spent some time describing the feature with documentation. A condensed description of the GUC is in the runtime-config file. Feature description has spread between TransformOrExprToANY and generate_saop_pathlist routines. Also, I've made tiny changes in the code to look more smoothly. All modifications are integrated into the two new patches. Feel free to add, change or totally rewrite these changes. I'm going to review and revise the patch. One question I have yet. > /* > * Transformation only works with both side type is not > * { array | composite | domain | record }. Why do we limit transformation for these types? Also, it doesn't seem the current code restricts anything except composite/record. -- Regards, Alexander Korotkov -- Regards, Alena Rybakina Postgres Professional:http://www.postgrespro.com The Russian Postgres Company -- Regards, Alena Rybakina Postgres Professional:http://www.postgrespro.com The Russian Postgres Company
Re: POC, WIP: OR-clause support for indexes
I found that it was mentioned here - https://www.postgresql.org/message-id/CACJufxFrZS07oBHMk1_c8P3A84VZ3ysXiZV8NeU6gAnvu%2BHsVA%40mail.gmail.com. To be honest, I couldn't find any explanation for that. On 01.03.2024 18:33, Alexander Korotkov wrote: Hi, Andrei, Hi, Alena! On Thu, Feb 29, 2024 at 10:59 AM Andrei Lepikhov wrote: On 28/2/2024 17:27, Alena Rybakina wrote: > Maybe like that: > > It also considers the way to generate a path using BitmapScan indexes, > converting the transformed expression into expressions separated by "OR" > operations, and if it turns out to be the best and finally selects the > best one. Thanks, I spent some time describing the feature with documentation. A condensed description of the GUC is in the runtime-config file. Feature description has spread between TransformOrExprToANY and generate_saop_pathlist routines. Also, I've made tiny changes in the code to look more smoothly. All modifications are integrated into the two new patches. Feel free to add, change or totally rewrite these changes. I'm going to review and revise the patch. One question I have yet. > /* > * Transformation only works with both side type is not > * { array | composite | domain | record }. Why do we limit transformation for these types? Also, it doesn't seem the current code restricts anything except composite/record. -- Regards, Alexander Korotkov -- Regards, Alena Rybakina Postgres Professional:http://www.postgrespro.com The Russian Postgres Company
Re: POC, WIP: OR-clause support for indexes
On 28.02.2024 13:07, jian he wrote: On Wed, Feb 28, 2024 at 12:19 PM Andrei Lepikhov wrote: On 26/2/2024 11:10, Alena Rybakina wrote: On 24.02.2024 14:28, jian he wrote: Hi. I wrote the first draft patch of the documentation. it's under the section: Planner Method Configuration (runtime-config-query.html) but this feature's main meat is in src/backend/parser/parse_expr.c so it may be slightly inconsistent, as mentioned by others. You can further furnish it. Thank you for your work. I found a few spelling mistakes - I fixed this and added some information about generating a partial index plan. I attached it. Thanks Jian and Alena, It is a good start for the documentation. But I think the runtime-config section needs only a condensed description of a feature underlying the GUC. The explanations in this section look a bit awkward. Having looked through the documentation for a better place for a detailed explanation, I found array.sgml as a candidate. Also, we have the parser's short overview section. I'm unsure about the best place but it is better when the server config section. doc/src/sgml/array.sgml corresponds to https://www.postgresql.org/docs/current/arrays.html. this GUC is related to parser|optimzier. adding a GUC to array.sgml seems strange. (I think). I suggest describing our feature in array.sgml and mentioning a GUC there. We can describe a GUC in config.sgml. 2. We should describe the second part of the feature, where the optimiser can split an array to fit the optimal BitmapOr scan path. we can add a sentence explaining that: it may not do the expression transformation when the original expression can be utilized by index mechanism. I am not sure how to rephrase it. Maybe like that: It also considers the way to generate a path using BitmapScan indexes, converting the transformed expression into expressions separated by "OR" operations, and if it turns out to be the best and finally selects the best one. -- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: POC, WIP: OR-clause support for indexes
On 24.02.2024 14:28, jian he wrote: Hi. I wrote the first draft patch of the documentation. it's under the section: Planner Method Configuration (runtime-config-query.html) but this feature's main meat is in src/backend/parser/parse_expr.c so it may be slightly inconsistent, as mentioned by others. You can further furnish it. Thank you for your work. I found a few spelling mistakes - I fixed this and added some information about generating a partial index plan. I attached it. -- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company From e3a0e01c43a70099f6870a468d0cc3a8bdcb2775 Mon Sep 17 00:00:00 2001 From: Alena Rybakina Date: Mon, 26 Feb 2024 06:37:36 +0300 Subject: [PATCH] doc1 --- doc/src/sgml/config.sgml | 74 1 file changed, 74 insertions(+) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 36a2a5ce431..47f82ca2dc9 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -5294,6 +5294,80 @@ ANY num_sync ( + enable_or_transformation (boolean) + +enable_or_transformation configuration parameter + + + + +Enables or disables in query planner's ability to transform multiple expressions in () + to a ANY expression (). +This transformations only apply under the following condition: + + + +Each expression should be formed as: expression operator (expression). +The right-hand side of the operator should be just a plain constant. +The left-hand side of these expressions should remain unchanged. + + + + + + +At the stage of index formation, a check is made on the restructuring of the plan using partial indexes or the formation of expressions combined by the "OR" operator. + + + + + + + Each expression form should return Boolean (true/false) result. + + + + + + +These expressions are logically linked in a OR condition. + + + + The default is on. + + + For example, the following query without setting enable_or_transformation is usually applied to the three filtering conditions separately, + but if we set enable_or_transformation, we combine the three expressions into only one expression: unique1 = ANY ('{1,2,3}'::integer[]) . + + EXPLAIN SELECT * FROM tenk1 WHERE unique1 = 1 or unique1 = 2 or unique1 = 3; + + QUERY PLAN + - + Seq Scan on tenk1 (cost=0.00..482.50 rows=3 width=244) +Filter: (unique1 = ANY ('{1,2,3}'::integer[])) + + + + Another example is the following query with a given enable_or_transformation value, but we have generated a plan with partial indexes. + + EXPLAIN SELECT unique2, stringu1 FROM onek2 WHERE unique1 = 1 OR unique1 = PI()::integer; + QUERY PLAN + -- + Bitmap Heap Scan on onek2 +Recheck Cond: ((unique1 = 3) OR (unique1 = 1)) +-> BitmapOr + -> Bitmap Index Scan on onek2_u1_prtl +Index Cond: (unique1 = 3) + -> Bitmap Index Scan on onek2_u1_prtl +Index Cond: (unique1 = 1) + (7 rows) + + + + + enable_parallel_append (boolean) -- 2.34.1
Re: Optimize planner memory consumption for huge arrays
Hi! On 20.02.2024 07:41, Andrei Lepikhov wrote: On 20/2/2024 04:51, Tom Lane wrote: Tomas Vondra writes: On 2/19/24 16:45, Tom Lane wrote: Tomas Vondra writes: For example, I don't think we expect selectivity functions to allocate long-lived objects, right? So maybe we could run them in a dedicated memory context, and reset it aggressively (after each call). Here's a quick and probably-incomplete implementation of that idea. I've not tried to study its effects on memory consumption, just made sure it passes check-world. Thanks for the sketch. The trick with the planner_tmp_cxt_depth especially looks interesting. I think we should design small memory contexts - one per scalable direction of memory utilization, like selectivity or partitioning (appending ?). My coding experience shows that short-lived GEQO memory context forces people to learn on Postgres internals more precisely :). I think there was a problem in your patch when you freed up the memory of a variable in the eqsel_internal function, because we have a case where the variable was deleted by reference in the eval_const_expressions_mutator function (it is only for T_SubPlan and T_AlternativeSubPlan type of nodes. This query just causes an error in your case: create table a (id bigint, c1 bigint, primary key(id)); create table b (a_id bigint, b_id bigint, b2 bigint, primary key(a_id, b_id)); explain select id from a, b where id = a_id and b2 = (select min(b2) from b where id = a_id); drop table a; drop table b; We can return a copy of the variable or not release the memory of this variable. I attached two patch: the first one is removing your memory cleanup and another one returns the copy of variable. The author of the corrections is not only me, but also Daniil Anisimov. -- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index edc25d712e9..ac560b1605e 100644 --- a/src/backend/optimizer/util/clauses.c +++ b/src/backend/optimizer/util/clauses.c @@ -2915,7 +2915,7 @@ eval_const_expressions_mutator(Node *node, * XXX should we ereport() here instead? Probably this routine * should never be invoked after SubPlan creation. */ - return node; + return CopyObject(node); case T_RelabelType: { RelabelType *relabel = (RelabelType *) node; diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index cea777e9d40..e5bad75ec1c 100644 --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -281,6 +281,7 @@ eqsel_internal(PG_FUNCTION_ARGS, bool negate) selec = var_eq_non_const(, operator, collation, other, varonleft, negate); + pfree(other); ReleaseVariableStats(vardata); return selec; @@ -1961,15 +1962,15 @@ scalararraysel(PlannerInfo *root, { List *args; Selectivity s2; - - args = list_make2(leftop, - makeConst(nominal_element_type, - -1, - nominal_element_collation, - elmlen, - elem_values[i], - elem_nulls[i], - elmbyval)); + Const *c = makeConst(nominal_element_type, + -1, + nominal_element_collation, + elmlen, + elem_values[i], + elem_nulls[i], + elmbyval); + + args = list_make2(leftop, c); if (is_join_clause) s2 = DatumGetFloat8(FunctionCall5Coll(, clause->inputcollid, @@ -1985,7 +1986,8 @@ scalararraysel(PlannerInfo *r
Running the fdw test from the terminal crashes into the core-dump
Hi, hackers! After starting the server (initdb + pg_ctl start) I ran a regress test create_misc.sql ('\i src/test/regress/sql/create_misc.sql') and, after that, I ran the fdw test ('\i contrib/postgres_fdw/sql/postgres_fdw.sql') in the psql, and it failed in the core-dump due to the worked assert. To be honest, such a failure occurred only if the fdw extension was not installed earlier. script to reproduce the error: ./configure CFLAGS='-g -ggdb -O0' --enable-debug --enable-cassert --prefix=`pwd`/tmp_install && make -j8 -s install export CDIR=$(pwd) export PGDATA=$CDIR/postgres_data rm -r $PGDATA mkdir $PGDATA ${CDIR}/tmp_install/bin/initdb -D $PGDATA >> log.txt ${CDIR}/tmp_install/bin/pg_ctl -D $PGDATA -l logfile start ${CDIR}/tmp_install/bin/psql -d postgres -f src/test/regress/sql/create_misc.sql && ${CDIR}/tmp_install/bin/psql -d postgres -f contrib/postgres_fdw/sql/postgres_fdw.sql The query, where the problem is: -- MERGE ought to fail cleanly merge into itrtest using (select 1, 'foo') as source on (true) when matched then do nothing; Coredump: #5 0x555d1451f483 in ExceptionalCondition (conditionName=0x555d146bba13 "requiredPerms != 0", fileName=0x555d146bb7b0 "execMain.c", lineNumber=654) at assert.c:66 #6 0x555d14064ebe in ExecCheckOneRelPerms (perminfo=0x555d1565ef90) at execMain.c:654 #7 0x555d14064d94 in ExecCheckPermissions (rangeTable=0x555d1565eef0, rteperminfos=0x555d1565efe0, ereport_on_violation=true) at execMain.c:623 #8 0x555d140652ca in InitPlan (queryDesc=0x555d156cde50, eflags=0) at execMain.c:850 #9 0x555d140644a8 in standard_ExecutorStart (queryDesc=0x555d156cde50, eflags=0) at execMain.c:266 #10 0x555d140641ec in ExecutorStart (queryDesc=0x555d156cde50, eflags=0) at execMain.c:145 #11 0x555d1432c025 in ProcessQuery (plan=0x555d1565f3e0, sourceText=0x555d1551b020 "merge into itrtest using (select 1, 'foo') as source on (true)\n when matched then do nothing;", params=0x0, queryEnv=0x0, dest=0x555d1565f540, qc=0x7fffc9454080) at pquery.c:155 #12 0x555d1432dbd8 in PortalRunMulti (portal=0x555d15597ef0, isTopLevel=true, setHoldSnapshot=false, dest=0x555d1565f540, altdest=0x555d1565f540, qc=0x7fffc9454080) at pquery.c:1277 #13 0x555d1432d0cf in PortalRun (portal=0x555d15597ef0, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x555d1565f540, altdest=0x555d1565f540, qc=0x7fffc9454080) at pquery.c:791 #14 0x555d14325ec8 in exec_simple_query ( --Type for more, q to quit, c to continue without paging-- query_string=0x555d1551b020 "merge into itrtest using (select 1, 'foo') as source on (true)\n when matched then do nothing;") at postgres.c:1273 #15 0x555d1432ae4c in PostgresMain (dbname=0x555d1ab0 "aaa", username=0x555d1a98 "alena") at postgres.c:4645 #16 0x555d14244a5d in BackendRun (port=0x555d1554b3b0) at postmaster.c:4440 #17 0x555d14244072 in BackendStartup (port=0x555d1554b3b0) at postmaster.c:4116 #18 0x555d142405c5 in ServerLoop () at postmaster.c:1768 #19 0x555d1423fec2 in PostmasterMain (argc=3, argv=0x555d1547fcf0) at postmaster.c:1467 #20 0x555d140f3122 in main (argc=3, argv=0x555d1547fcf0) at main.c:198 This error is consistently reproduced. To be honest, I wasn't able to fully figure out the reason for this, but it seems that this operation on partitions should not be available at all? alena@postgres=# SELECT relname, relkind FROM pg_class where relname='itrtest'; relname | relkind -+- itrtest | p (1 row) -- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [PoC] Reducing planning time when tables have many partitions
Hi! Sorry my delayed reply too. On 17.01.2024 12:33, Yuya Watari wrote: Hello Alena, Thank you for your quick response, and I'm sorry for my delayed reply. On Sun, Dec 17, 2023 at 12:41 AM Alena Rybakina wrote: I thought about this earlier and was worried that the index links of the equivalence classes might not be referenced correctly for outer joins, so I decided to just overwrite them and reset the previous ones. Thank you for pointing this out. I have investigated this problem and found a potential bug place. The code quoted below modifies RestrictInfo's clause_relids. Here, our indexes, namely eclass_source_indexes and eclass_derive_indexes, are based on clause_relids, so they should be adjusted after the modification. However, my patch didn't do that, so it may have missed some references. The same problem occurs in places other than the quoted one. = /* * Walker function for replace_varno() */ static bool replace_varno_walker(Node *node, ReplaceVarnoContext *ctx) { ... else if (IsA(node, RestrictInfo)) { RestrictInfo *rinfo = (RestrictInfo *) node; ... if (bms_is_member(ctx->from, rinfo->clause_relids)) { replace_varno((Node *) rinfo->clause, ctx->from, ctx->to); replace_varno((Node *) rinfo->orclause, ctx->from, ctx->to); rinfo->clause_relids = replace_relid(rinfo->clause_relids, ctx->from, ctx->to); ... } ... } ... } = I have attached a new version of the patch, v23, to fix this problem. v23-0006 adds a helper function called update_clause_relids(). This function modifies RestrictInfo->clause_relids while adjusting its related indexes. I have also attached a sanity check patch (sanity-check.txt) to this email. This sanity check patch verifies that there are no missing references between RestrictInfos and our indexes. I don't intend to commit this patch, but it helps find potential bugs. v23 passes this sanity check, but the v21 you submitted before does not. This means that the adjustment by update_clause_relids() is needed to prevent missing references after modifying clause_relids. I'd appreciate your letting me know if v23 doesn't solve your concern. One of the things I don't think is good about my approach is that it adds some complexity to the code. In my approach, all modifications to clause_relids must be done through the update_clause_relids() function, but enforcing this rule is not so easy. In this sense, my patch may need to be simplified more. this is due to the fact that I explained before: we zeroed the values indicated by the indexes, then this check is not correct either - since the zeroed value indicated by the index is correct. That's why I removed this check. Thank you for letting me know. I fixed this in v23-0005 to adjust the indexes in update_eclasses(). With this change, the assertion check will be correct. Yes, it is working correctly now with the assertion check. I suppose it's better to add this code with an additional comment and a recommendation for other developers to use it for checking in case of manipulations with the list of equivalences. -- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: POC, WIP: OR-clause support for indexes
On 12.02.2024 12:01, Andrei Lepikhov wrote: On 12/2/2024 15:55, Alena Rybakina wrote: Hi! I can't unnderstand this part of code: /* Time to generate index paths */ MemSet(, 0, sizeof(clauseset)); match_clauses_to_index(root, list_make1(rinfo1), index, ); As I understand it, match_clauses_to_index is necessary if you have a RestrictInfo (rinfo1) variable, so maybe we should run it after the make_restrictonfo procedure, otherwise calling it, I think, is useless. I think you must explain your note in more detail. Before this call, we already called make_restrictinfo() and built rinfo1, haven't we? I got it, I think, I was confused by the “else“ block when we can process the index, otherwise we move on to the next element. I think maybe “else“ block of creating restrictinfo with the indexpaths list creation should be moved to a separate function or just remove "else"? I think we need to check that rinfo->clause is not empty, because if it is we can miss calling build_paths_for_OR function. We should add it there: restriction_is_saop_clause(RestrictInfo *restrictinfo) { if (IsA(restrictinfo->clause, ScalarArrayOpExpr)) ... By the way, I think we need to add a check that the clauseset is not empty (if (!clauseset.nonempty)) otherwise we could get an error. The same check I noticed in build_paths_for_OR function. -- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: POC, WIP: OR-clause support for indexes
Hi! I can't unnderstand this part of code: /* Time to generate index paths */ MemSet(, 0, sizeof(clauseset)); match_clauses_to_index(root, list_make1(rinfo1), index, ); As I understand it, match_clauses_to_index is necessary if you have a RestrictInfo (rinfo1) variable, so maybe we should run it after the make_restrictonfo procedure, otherwise calling it, I think, is useless. -- Regards, Alena Rybakina Postgres Professional:http://www.postgrespro.com The Russian Postgres Company
Re: POC, WIP: OR-clause support for indexes
On 01.02.2024 08:00, jian he wrote: On Wed, Jan 31, 2024 at 7:10 PM Alena Rybakina wrote: Hi, thank you for your review and interest in this subject. On 31.01.2024 13:15, jian he wrote: On Wed, Jan 31, 2024 at 10:55 AM jian he wrote: based on my understanding of https://www.postgresql.org/docs/current/xoper-optimization.html#XOPER-COMMUTATOR I think you need move commutator check right after the `if (get_op_rettype(opno) != BOOLOID)` branch I was wrong about this part. sorry for the noise. I have made some changes (attachment). * if the operator expression left or right side type category is {array | domain | composite}, then don't do the transformation. (i am not 10% sure with composite) To be honest, I'm not sure about this check, because we check the type of variable there: if (!IsA(orqual, OpExpr)) { or_list = lappend(or_list, orqual); continue; } And below: if (IsA(leftop, Const)) { opno = get_commutator(opno); if (!OidIsValid(opno)) { /* Commuter doesn't exist, we can't reverse the order */ or_list = lappend(or_list, orqual); continue; } nconst_expr = get_rightop(orqual); const_expr = get_leftop(orqual); } else if (IsA(rightop, Const)) { const_expr = get_rightop(orqual); nconst_expr = get_leftop(orqual); } else { or_list = lappend(or_list, orqual); continue; } Isn't that enough? alter table tenk1 add column arr int[]; set enable_or_transformation to on; EXPLAIN (COSTS OFF) SELECT count(*) FROM tenk1 WHERE arr = '{1,2,3}' or arr = '{1,2}'; the above query will not do the OR transformation. because array type doesn't have array type. ` scalar_type = entry->key.exprtype; if (scalar_type != RECORDOID && OidIsValid(scalar_type)) array_type = get_array_type(scalar_type); else array_type = InvalidOid; ` If either side of the operator expression is array or array related type, we can be sure it cannot do the transformation (get_array_type will return InvalidOid for anyarray type). we can check it earlier, so hash related code will not be invoked for array related types. Agree. Besides, some of examples (with ARRAY) works fine: postgres=# CREATE TABLE sal_emp ( pay_by_quarter integer[], pay_by_quater1 integer[] ); CREATE TABLE postgres=# INSERT INTO sal_emp VALUES ( '{1, 1, 1, 1}', '{1,2,3,4}'); INSERT 0 1 postgres=# select * from sal_emp where pay_by_quarter[1] = 1 or pay_by_quarter[1]=2; pay_by_quarter | pay_by_quater1 ---+ {1,1,1,1} | {1,2,3,4} (1 row) postgres=# explain select * from sal_emp where pay_by_quarter[1] = 1 or pay_by_quarter[1]=2; QUERY PLAN -- Seq Scan on sal_emp (cost=0.00..21.00 rows=9 width=64) Filter: (pay_by_quarter[1] = ANY ('{1,2}'::integer[])) (2 rows) * if the left side of the operator expression node contains volatile functions, then don't do the transformation. I'm also not sure about the volatility check function, because we perform such a conversion at the parsing stage, and at this stage we don't have a RelOptInfo variable and especially a RestictInfo such as PathTarget. see the example in here: https://www.postgresql.org/message-id/CACJufxGXhJ823cdAdp2Ho7qC-HZ3_-dtdj-myaAi_u9RQLn45g%40mail.gmail.com set enable_or_transformation to on; create or replace function retint(int) returns int as $func$ begin raise notice 'hello'; return $1 + round(10 * random()); end $func$ LANGUAGE plpgsql; SELECT count(*) FROM tenk1 WHERE thousand = 42; will return 10 rows. SELECT count(*) FROM tenk1 WHERE thousand = 42 AND (retint(1) = 4 OR retint(1) = 3); this query I should return 20 notices 'hello', but now only 10. EXPLAIN (COSTS OFF) SELECT count(*) FROM tenk1 WHERE thousand = 42 AND (retint(1) = 4 OR retint(1) = 3); QUERY PLAN -- Aggregate -> Seq Scan on tenk1 Filter: ((thousand = 42) AND (retint(1) = ANY ('{4,3}'::integer[]))) (3 rows) Agree. I added your code to the patch. -- Regards, Alena Rybakina Postgres Professional:http://www.postgrespro.com The Russian Postgres Company From 0fcad72153c9eaaf436e5b417c803a70fbeaf8ac Mon Sep 17 00:00:00 2001 From: Alena Rybakina Date: Fri, 2 Feb 2024 22:01:09 +0300 Subject: [PATCH] Transform OR clause to ANY expressions. Replace (expr op C1) OR (expr op C2) ... with expr op ANY(C1, C2, ...) on the preliminary stage of optimization when we are still working with the tree expression. Here C is a constant expression, 'expr' is non-constant e
Re: POC, WIP: OR-clause support for indexes
Hi, thank you for your review and interest in this subject. On 31.01.2024 13:15, jian he wrote: On Wed, Jan 31, 2024 at 10:55 AM jian he wrote: based on my understanding of https://www.postgresql.org/docs/current/xoper-optimization.html#XOPER-COMMUTATOR I think you need move commutator check right after the `if (get_op_rettype(opno) != BOOLOID)` branch I was wrong about this part. sorry for the noise. I have made some changes (attachment). * if the operator expression left or right side type category is {array | domain | composite}, then don't do the transformation. (i am not 10% sure with composite) To be honest, I'm not sure about this check, because we check the type of variable there: if (!IsA(orqual, OpExpr)) { or_list = lappend(or_list, orqual); continue; } And below: if (IsA(leftop, Const)) { opno = get_commutator(opno); if (!OidIsValid(opno)) { /* Commuter doesn't exist, we can't reverse the order */ or_list = lappend(or_list, orqual); continue; } nconst_expr = get_rightop(orqual); const_expr = get_leftop(orqual); } else if (IsA(rightop, Const)) { const_expr = get_rightop(orqual); nconst_expr = get_leftop(orqual); } else { or_list = lappend(or_list, orqual); continue; } Isn't that enough? Besides, some of examples (with ARRAY) works fine: postgres=# CREATE TABLE sal_emp ( pay_by_quarter integer[], pay_by_quater1 integer[] ); CREATE TABLE postgres=# INSERT INTO sal_emp VALUES ( '{1, 1, 1, 1}', '{1,2,3,4}'); INSERT 0 1 postgres=# select * from sal_emp where pay_by_quarter[1] = 1 or pay_by_quarter[1]=2; pay_by_quarter | pay_by_quater1 ---+ {1,1,1,1} | {1,2,3,4} (1 row) postgres=# explain select * from sal_emp where pay_by_quarter[1] = 1 or pay_by_quarter[1]=2; QUERY PLAN -- Seq Scan on sal_emp (cost=0.00..21.00 rows=9 width=64) Filter: (pay_by_quarter[1] = ANY ('{1,2}'::integer[])) (2 rows) * if the left side of the operator expression node contains volatile functions, then don't do the transformation. I'm also not sure about the volatility check function, because we perform such a conversion at the parsing stage, and at this stage we don't have a RelOptInfo variable and especially a RestictInfo such as PathTarget. Speaking of NextValueExpr, I couldn't find any examples where the current patch wouldn't work. I wrote one of them below: postgres=# create table foo (f1 int, f2 int generated always as identity); CREATE TABLE postgres=# insert into foo values(1); INSERT 0 1 postgres=# explain verbose update foo set f1 = 2 where f1=1 or f1=2 ; QUERY PLAN --- Update on public.foo (cost=0.00..38.25 rows=0 width=0) -> Seq Scan on public.foo (cost=0.00..38.25 rows=23 width=10) Output: 2, ctid Filter: (foo.f1 = ANY ('{1,2}'::integer[])) (4 rows) Maybe I missed something. Do you have any examples? * some other minor cosmetic changes. Thank you, I agree with them. -- Regards, Alena Rybakina Postgres Professional:http://www.postgrespro.com The Russian Postgres Company
Re: Support "Right Semi Join" plan shapes
Hi! Thank you for your work on this subject. I have reviewed your patch and I think it is better to add an Assert for JOIN_RIGHT_SEMI to the ExecMergeJoin and ExecNestLoop functions to prevent the use of RIGHT_SEMI for these types of connections (NestedLoop and MergeJoin). Mostly I'm suggesting this because of the set_join_pathlist_hook function, which is in the add_paths_to_joinrel function, which allows you to create a custom node. What do you think? -- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Evaluate arguments of correlated SubPlans in the referencing ExprState
On 26.01.2024 05:37, vignesh C wrote: On Tue, 24 Oct 2023 at 01:47, Alena Rybakina wrote: Hi! I looked through your patch and noticed that it was not applied to the current version of the master. I rebased it and attached a version. I didn't see any problems and, honestly, no big changes were needed, all regression tests were passed. I think it's better to add a test, but to be honest, I haven't been able to come up with something yet. The patch does not apply anymore as in CFBot at [1]: === Applying patches on top of PostgreSQL commit ID 7014c9a4bba2d1b67d60687afb5b2091c1d07f73 === === applying patch ./v2-0001-WIP-Evaluate-arguments-of-correlated-SubPlans-in-the.patch patching file src/include/executor/execExpr.h Hunk #1 succeeded at 160 (offset 1 line). Hunk #2 succeeded at 382 (offset 2 lines). Hunk #3 FAILED at 778. 1 out of 3 hunks FAILED -- saving rejects to file src/include/executor/execExpr.h.rej patching file src/include/nodes/execnodes.h Hunk #1 succeeded at 959 (offset 7 lines). Please have a look and post an updated version. [1] - http://cfbot.cputube.org/patch_46_4209.log Regards, Vignesh Thank you! I fixed it. The code remains the same. -- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company From bf40b14c0cb63f47280299fd3f76a1711db6aada Mon Sep 17 00:00:00 2001 From: Alena Rybakina Date: Sun, 28 Jan 2024 11:58:44 +0300 Subject: [PATCH] WIP: Evaluate arguments of correlated SubPlans in the referencing ExprState. --- src/backend/executor/execExpr.c | 93 +-- src/backend/executor/execExprInterp.c | 22 +++ src/backend/executor/execProcnode.c | 5 ++ src/backend/executor/nodeSubplan.c| 30 - src/backend/jit/llvm/llvmjit_expr.c | 6 ++ src/backend/jit/llvm/llvmjit_types.c | 1 + src/include/executor/execExpr.h | 6 +- src/include/nodes/execnodes.h | 1 - 8 files changed, 112 insertions(+), 52 deletions(-) diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c index 3181b1136a2..d2e539e7b28 100644 --- a/src/backend/executor/execExpr.c +++ b/src/backend/executor/execExpr.c @@ -88,6 +88,9 @@ static void ExecBuildAggTransCall(ExprState *state, AggState *aggstate, FunctionCallInfo fcinfo, AggStatePerTrans pertrans, int transno, int setno, int setoff, bool ishash, bool nullcheck); +static void ExecInitSubPlanExpr(SubPlan *subplan, +ExprState *state, +Datum *resv, bool *resnull); /* @@ -1386,7 +1389,6 @@ ExecInitExprRec(Expr *node, ExprState *state, case T_SubPlan: { SubPlan*subplan = (SubPlan *) node; -SubPlanState *sstate; /* * Real execution of a MULTIEXPR SubPlan has already been @@ -1403,19 +1405,7 @@ ExecInitExprRec(Expr *node, ExprState *state, break; } -if (!state->parent) - elog(ERROR, "SubPlan found with no parent plan"); - -sstate = ExecInitSubPlan(subplan, state->parent); - -/* add SubPlanState nodes to state->parent->subPlan */ -state->parent->subPlan = lappend(state->parent->subPlan, - sstate); - -scratch.opcode = EEOP_SUBPLAN; -scratch.d.subplan.sstate = sstate; - -ExprEvalPushStep(state, ); +ExecInitSubPlanExpr(subplan, state, resv, resnull); break; } @@ -2752,29 +2742,12 @@ ExecPushExprSetupSteps(ExprState *state, ExprSetupInfo *info) foreach(lc, info->multiexpr_subplans) { SubPlan*subplan = (SubPlan *) lfirst(lc); - SubPlanState *sstate; Assert(subplan->subLinkType == MULTIEXPR_SUBLINK); - /* This should match what ExecInitExprRec does for other SubPlans: */ - - if (!state->parent) - elog(ERROR, "SubPlan found with no parent plan"); - - sstate = ExecInitSubPlan(subplan, state->parent); - - /* add SubPlanState nodes to state->parent->subPlan */ - state->parent->subPlan = lappend(state->parent->subPlan, - sstate); - - scratch.opcode = EEOP_SUBPLAN; - scratch.d.subplan.sstate = sstate; - /* The result can be ignored, but we better put it somewhere */ - scratch.resvalue = >resvalue; - scratch.resnull = >resnull; - - ExprEvalPushStep(state, ); + ExecInitSubPlanExpr(subplan, state, + >resvalue, >resnull); } } @@ -4181,3 +4154,57 @@ ExecBuildParamSetEqual(TupleDesc desc, return state; } + +static void +ExecInitSubPlanExpr(SubPlan *subplan, + ExprState *state, + Datum *resv, bool *resnull) +{ + ExprEvalStep scratch = {0}; + SubPlanState *sstate; + ListCell *pvar; + ListCell *l; + + if (!state->parent) + elog(ERROR, "SubPlan found with no parent plan"); + + /* + * Generate steps to evaluate input arguments for the subplan. + * + * We evaluate the argument expressions into ExprState's resvalue/resnull, + * and then use PARAM_SET to update the parameter. We do that, instead of + * evaluating dire
Re: POC: GROUP BY optimization
On 15.01.2024 12:46, Andrei Lepikhov wrote: On 15/1/2024 13:42, Richard Guo wrote: On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov mailto:aekorot...@gmail.com>> wrote: Thank you for providing the test case relevant for this code change. The revised patch incorporating this change is attached. Now the patchset looks good to me. I'm going to push it if there are no objections. Seems I'm late for the party. Can we hold for several more days? I'd like to have a review on this patch. Get on board! It looks like this feature needs as much review as possible (likewise SJE). Hi! Thank you for your work on this issue! I believe that this will help the scheduler to make a more optimal query plan here and therefore speed up their execution. I have reviewed patches and noticed that we can add some code refactoring. I have attached a diff file (group_by.diff) to this email. The changes involve spelling corrections, renaming variables and porting some common parts. In addition, I have a few questions, since some points in the code remained unclear to me. 1. I didn't understand why we have a question in the comment next to the enable_group_by_reordering variable in src/backend/optimizer/path/pathkeys.c file, I assumed it was spelling and fixed it in the diff file. 2. Why do we set the variable (path = path_save) here (add_paths_to_grouping_rel function) if we change its variable below and we can pass path_save as a parameter? foreach(lc2, pathkey_orderings) { PathKeyInfo *info = (PathKeyInfo *) lfirst(lc2); /* restore the path (we replace it in the loop) */ path = path_save; path = make_ordered_path(root, grouped_rel, path, cheapest_path, info->pathkeys); if (path == NULL) continue; -- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c index 5aac6d66776..8be58fa2b0e 100644 --- a/src/backend/optimizer/path/pathkeys.c +++ b/src/backend/optimizer/path/pathkeys.c @@ -29,7 +29,7 @@ #include "partitioning/partbounds.h" #include "utils/lsyscache.h" -/* Consider reordering of GROUP BY keys? */ +/* Consider reordering of GROUP BY keys */ bool enable_group_by_reordering = true; static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys); @@ -362,7 +362,7 @@ pathkeys_contained_in(List *keys1, List *keys2) * * Returns the number of GROUP BY keys with a matching pathkey. */ -static int +static PathKeyInfo * group_keys_reorder_by_pathkeys(List *pathkeys, List **group_pathkeys, List **group_clauses, int num_groupby_pathkeys) @@ -421,7 +421,16 @@ group_keys_reorder_by_pathkeys(List *pathkeys, List **group_pathkeys, *group_clauses = list_concat_unique_ptr(new_group_clauses, *group_clauses); - return n; + if (n > 0 && + (enable_incremental_sort || n == list_length(*group_pathkeys))) + { + PathKeyInfo *info = makeNode(PathKeyInfo); + info->pathkeys = *group_pathkeys; + info->clauses = *group_clauses; + return info; + } + + return NULL; } /* @@ -436,7 +445,7 @@ group_keys_reorder_by_pathkeys(List *pathkeys, List **group_pathkeys, * * - the original ordering, as specified by the GROUP BY clause, * - GROUP BY keys reordered to match 'path' ordering (as much as possible), - * - GROUP BY keys to match target ORDER BY clause (as much as possible). + * - GROUP BY keys should match the target ORDER BY clause (as much as possible). */ List * get_useful_group_keys_orderings(PlannerInfo *root, Path *path) @@ -475,20 +484,11 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path) */ if (path->pathkeys) { - int n; - - n = group_keys_reorder_by_pathkeys(path->pathkeys, , , + info = group_keys_reorder_by_pathkeys(path->pathkeys, , , root->num_groupby_pathkeys); - if (n > 0 && - (enable_incremental_sort || n == list_length(path->pathkeys))) - { - info = makeNode(PathKeyInfo); - info->pathkeys = pathkeys; - info->clauses = clauses; - + if (info) infos = lappend(infos, info); - } } /* @@ -497,21 +497,12 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path) */ if (root->sort_pathkeys) { - int n; - - n = group_keys_reorder_by_pathkeys(root->sort_pathkeys, , + info = group_keys_reorder_by_pathkeys(root->sort_pathkeys, , , root->num_groupby_pathkeys); - if (n > 0 && - (enable_incremental_sort || n == list_length(path->pathkeys))) - { - info = makeNode(PathKeyInfo); - info->pathkeys = pathkeys; - info->clauses = clauses; - + if (info) infos = lappend(infos, info); - } } return infos; @@ -2163
Re: [PoC] Reducing planning time when tables have many partitions
rte = root->simple_rte_array[i]; + RelOptInfo *rel = root->simple_rel_array[i]; - rte->eclass_source_indexes = bms_add_member(rte->eclass_source_indexes, + /* +* If the corresponding RelOptInfo does not exist, we create a 'dummy' +* RelOptInfo for storing EquivalenceClass indexes. +*/ + if (rel == NULL) + { + rel = root->simple_rel_array[i] = makeNode(RelOptInfo); + rel->eclass_source_indexes = NULL; + rel->eclass_derive_indexes = NULL; + } + rel->eclass_source_indexes = bms_add_member(rel->eclass_source_indexes, source_idx); } = At this point, I'm not sure if this approach is correct. It seems to pass the regression tests, but we should doubt its correctness. I will continue to experiment with this idea. [1]https://www.postgresql.org/message-id/CAPpHfdseB13zJJPZuBORevRnZ0vcFyUaaJeSGfAysX7S5er%2BEQ%40mail.gmail.com Yes, I also thought in this direction before and I agree that this is the best way to develop the patch. -- Regards, Alena Rybakina Postgres Professional:http://www.postgrespro.com The Russian Postgres Company
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
On 12.12.2023 16:04, jian he wrote: On Mon, Dec 11, 2023 at 10:05 PM Alena Rybakina wrote: Hi! Thank you for your work. Your patch looks better! Yes, thank you! It works fine, and I see that the regression tests have been passed. However, when I ran 'copy from with save_error' operation with simple csv files (copy_test.csv, copy_test1.csv) for tables test, test1 (how I created it, I described below): postgres=# create table test (x int primary key, y int not null); postgres=# create table test1 (x int, z int, CONSTRAINT fk_x FOREIGN KEY(x) REFERENCES test(x)); I did not find a table with saved errors after operation, although I received a log about it: postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV save_error NOTICE: 2 rows were skipped because of error. skipped row saved to table public.test_error ERROR: duplicate key value violates unique constraint "test_pkey" DETAIL: Key (x)=(2) already exists. CONTEXT: COPY test, line 3 postgres=# select * from public.test_error; ERROR: relation "public.test_error" does not exist LINE 1: select * from public.test_error; postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ',' CSV save_error NOTICE: 2 rows were skipped because of error. skipped row saved to table public.test1_error ERROR: insert or update on table "test1" violates foreign key constraint "fk_x" DETAIL: Key (x)=(2) is not present in table "test". postgres=# select * from public.test1_error; ERROR: relation "public.test1_error" does not exist LINE 1: select * from public.test1_error; Two lines were written correctly in the csv files, therefore they should have been added to the tables, but they were not added to the tables test and test1. If I leave only the correct rows, everything works fine and the rows are added to the tables. in copy_test.csv: 2,0 1,1 in copy_test1.csv: 2,0 2,1 1,1 postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV COPY 2 postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ',' CSV save_error NOTICE: No error happened.Error holding table public.test1_error will be droped COPY 3 Maybe I'm launching it the wrong way. If so, let me know about it. looks like the above is about constraints violation while copying. constraints violation while copying not in the scope of this patch. Since COPY FROM is very like the INSERT command, you do want all the valid constraints to check all the copied rows? No, I think it will be too much. but the notice raised by the patch is not right. So I place the drop error saving table or raise notice logic above `ExecResetTupleTable(estate->es_tupleTable, false)` in the function CopyFrom. Yes, I see it and agree with you. I also notice interesting behavior if the table was previously created by the user. When I was creating an error_table before the 'copy from' operation, I received a message saying that it is impossible to create a table with the same name (it is shown below) during the 'copy from' operation. I think you should add information about this in the documentation, since this seems to be normal behavior to me. doc changed. you may check it. Yes, I saw it. Thank you. -- Regards, Alena Rybakina Postgres Professional:http://www.postgrespro.com The Russian Postgres Company
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Hi! Thank you for your work. Your patch looks better! On 10.12.2023 13:32, jian he wrote: On Fri, Dec 8, 2023 at 3:09 PM Alena Rybakina wrote: Thank you for your work. Unfortunately, your code contained errors during the make installation: 'SAVEPOINT' after 'SAVE_ERROR' in unreserved_keyword list is misplaced 'SAVEPOINT' after 'SAVE_ERROR' in bare_label_keyword list is misplaced make[2]: *** [../../../src/Makefile.global:783: gram.c] Error 1 make[1]: *** [Makefile:131: parser/gram.h] Error 2 make[1]: *** Waiting for unfinished jobs make: *** [src/Makefile.global:383: submake-generated-headers] Error 2 I have ubuntu 22.04 operation system. On 06.12.2023 13:47, jian he wrote: On Tue, Dec 5, 2023 at 6:07 PM Alena Rybakina wrote: Hi! Thank you for your contribution to this thread. I reviewed it and have a few questions. 1. I have seen that you delete a table before creating it, to which you want to add errors due to a failed "copy from" operation. I think this is wrong because this table can save useful data for the user. At a minimum, we should warn the user about this, but I think we can just add some number at the end of the name, such as name_table1, name_table_2. Sorry. I don't understand this part. Currently, if the error table name already exists, then the copy will fail, an error will be reported. I try to first create a table, if no error then the error table will be dropped. To be honest, first of all, I misunderstood this part of the code. Now I see that it works the way you mentioned. However, I didn't see if you dealt with cases where we already had a table with the same name as the table error. I mean, when is he trying to create for the first time, or will we never be able to face such a problem? Can you demo the expected behavior? Unfortunately, I was unable to launch it due to a build issue. Hopefully attached will work. Yes, thank you! It works fine, and I see that the regression tests have been passed. However, when I ran 'copy from with save_error' operation with simple csv files (copy_test.csv, copy_test1.csv) for tables test, test1 (how I created it, I described below): postgres=# create table test (x int primary key, y int not null); postgres=# create table test1 (x int, z int, CONSTRAINT fk_x FOREIGN KEY(x) REFERENCES test(x)); I did not find a table with saved errors after operation, although I received a log about it: postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV save_error NOTICE: 2 rows were skipped because of error. skipped row saved to table public.test_error ERROR: duplicate key value violates unique constraint "test_pkey" DETAIL: Key (x)=(2) already exists. CONTEXT: COPY test, line 3 postgres=# select * from public.test_error; ERROR: relation "public.test_error" does not exist LINE 1: select * from public.test_error; postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ',' CSV save_error NOTICE: 2 rows were skipped because of error. skipped row saved to table public.test1_error ERROR: insert or update on table "test1" violates foreign key constraint "fk_x" DETAIL: Key (x)=(2) is not present in table "test". postgres=# select * from public.test1_error; ERROR: relation "public.test1_error" does not exist LINE 1: select * from public.test1_error; Two lines were written correctly in the csv files, therefore they should have been added to the tables, but they were not added to the tables test and test1. If I leave only the correct rows, everything works fine and the rows are added to the tables. in copy_test.csv: 2,0 1,1 in copy_test1.csv: 2,0 2,1 1,1 postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV COPY 2 postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ',' CSV save_error NOTICE: No error happened.Error holding table public.test1_error will be droped COPY 3 Maybe I'm launching it the wrong way. If so, let me know about it. I also notice interesting behavior if the table was previously created by the user. When I was creating an error_table before the 'copy from' operation, I received a message saying that it is impossible to create a table with the same name (it is shown below) during the 'copy from' operation. I think you should add information about this in the documentation, since this seems to be normal behavior to me. postgres=# CREATE TABLE test_error (LINENO BIGINT, LINE TEXT, FIELD TEXT, SOURCE TEXT, ERR_MESSAGE TEXT, ERR_DETAIL TEXT, ERRORCODE TEXT); CREATE TABLE postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV save_error ERROR: Error save table public.test_error already exists. Cannot use it for COPY FROM error saving 2. I noticed that you are forming a table name using the type of errors that prevent rows from being added during 'copy from' operation. I think it would be bet
Re: Oversight in reparameterize_path_by_child leading to executor crash
On 06.12.2023 10:30, Richard Guo wrote: I've self-reviewed this patch again and I think it's now in a committable state. I'm wondering if we can mark it as 'Ready for Committer' now, or we need more review comments/feedbacks. To recap, this patch postpones reparameterization of paths until createplan.c, which would help avoid building the reparameterized expressions we might not use. More importantly, it makes it possible to modify the expressions in RTEs (e.g. tablesample) and in RelOptInfos (e.g. baserestrictinfo) for reparameterization. Failing to do that can lead to executor crashes, wrong results, or planning errors, as we have already seen. I marked it as 'Ready for Committer'. I think it is ready. -- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Thank you for your work. Unfortunately, your code contained errors during the make installation: 'SAVEPOINT' after 'SAVE_ERROR' in unreserved_keyword list is misplaced 'SAVEPOINT' after 'SAVE_ERROR' in bare_label_keyword list is misplaced make[2]: *** [../../../src/Makefile.global:783: gram.c] Error 1 make[1]: *** [Makefile:131: parser/gram.h] Error 2 make[1]: *** Waiting for unfinished jobs make: *** [src/Makefile.global:383: submake-generated-headers] Error 2 I have ubuntu 22.04 operation system. On 06.12.2023 13:47, jian he wrote: On Tue, Dec 5, 2023 at 6:07 PM Alena Rybakina wrote: Hi! Thank you for your contribution to this thread. I reviewed it and have a few questions. 1. I have seen that you delete a table before creating it, to which you want to add errors due to a failed "copy from" operation. I think this is wrong because this table can save useful data for the user. At a minimum, we should warn the user about this, but I think we can just add some number at the end of the name, such as name_table1, name_table_2. Sorry. I don't understand this part. Currently, if the error table name already exists, then the copy will fail, an error will be reported. I try to first create a table, if no error then the error table will be dropped. To be honest, first of all, I misunderstood this part of the code. Now I see that it works the way you mentioned. However, I didn't see if you dealt with cases where we already had a table with the same name as the table error. I mean, when is he trying to create for the first time, or will we never be able to face such a problem? Can you demo the expected behavior? Unfortunately, I was unable to launch it due to a build issue. 2. I noticed that you are forming a table name using the type of errors that prevent rows from being added during 'copy from' operation. I think it would be better to use the name of the source file that was used while 'copy from' was running. In addition, there may be several such files, it is also worth considering. Another column added. now it looks like: SELECT * FROM save_error_csv_error; filename | lineno |line | field | source | err_message | err_detail | errorcode --+++---++-++--- STDIN| 1 | 2002232 40 50 60 70 80 | NULL | NULL | extra data after last expected column | NULL | 22P04 STDIN| 1 | 2000230 23 | d | NULL | missing data for column "d" | NULL | 22P04 STDIN| 1 | z,,"" | a | z | invalid input syntax for type integer: "z" | NULL | 22P02 STDIN| 2 | \0,, | a | \0 | invalid input syntax for type integer: "\0" | NULL | 22P02 Yes, I see the "filename" column, and this will solve the problem, but "STDIN" is unclear to me. 3. I found spelling: /* no err_nsp.error_rel table then crete one. for holding error. */ fixed. 4. Maybe rewrite this comment these info need, no error will drop err_nsp.error_rel table to: this information is necessary, no error will lead to the deletion of the err_sp.error_rel table. fixed. Thank you. 5. Is this part of the comment needed? I think it duplicates the information below when we form the query. * . column list(order by attnum, begin from ctid) = *{ctid, lineno,line,field,source,err_message,err_detail,errorcode} * . data types (from attnum = -1) ={tid, int8,text,text,text,text,text,text} I'm not sure if we need to order the rows by number. It might be easier to work with these lines in the order they appear. Simplified the comment. "order by attnum" is to make sure that if there is a table already existing, and the column name is like X and the data type like Y, then we consider this table is good for holding potential error info. COPY FROM, main entry point is NextCopyFrom. Now for non-binary mode, if you specified save_error then it will not fail at NextCopyFrom. all these three errors will be tolerated: extra data after last expected column, missing data for column, data type conversion. It looks clearer and better, thanks! Comments in the format of questions are unusual for me, I perceive them to think about it, for example, as here (contrib/bloom/blinsert.c:312): /* * Didn't find place to insert in notFullPage array. Allocate new page. * (XXX is it good to do this while holding ex-lock on the metapage??) */ Maybe we can rewrite it like this: /* Check, the err_nsp.error_rel table has already existed * and if it is, check its column name and data types. -- Regards, Alena Rybakina Postgres Professional:http://www.postgrespro.com The Russian Postgres Company
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
re if we need to order the rows by number. It might be easier to work with these lines in the order they appear. -- Regards, Alena Rybakina Postgres Professional:http://www.postgrespro.com The Russian Postgres Company
Re: Implement missing join selectivity estimation for range types
Hi! Thank you for your work on the subject, I think it's a really useful feature and it allows optimizer to estimate more correctly clauses with such type of operator. I rewieved your patch and noticed that some comments are repeated into multirangejoinsel functions, I suggest combining them. The proposed changes are in the attached patch. If this topic is about calculating selectivity, have you thought about adding cardinality calculation test for queries with this type of operator? For example, you could form queries similar to those that you use in src/test/regress/sql/multirangetypes.sql and src/test/regress/sql/rangetypes.sql. I added a few in the attached patch. -- Regards, Alena Rybakina diff --git a/src/backend/utils/adt/multirangetypes_selfuncs.c b/src/backend/utils/adt/multirangetypes_selfuncs.c index c670d225a0c..7708768b89f 100644 --- a/src/backend/utils/adt/multirangetypes_selfuncs.c +++ b/src/backend/utils/adt/multirangetypes_selfuncs.c @@ -1620,14 +1620,15 @@ multirangejoinsel(PG_FUNCTION_ARGS) hist1_lower, nhist1); break; + /* +* Start by comparing lower bounds and if they are equal +* compare upper bounds for comparison operators +*/ case OID_MULTIRANGE_LESS_EQUAL_OP: /* * A <= B * -* Start by comparing lower bounds and if they are equal -* compare upper bounds -* * Negation of OID_RANGE_GREATER_OP. * * Overestimate by comparing only the lower bounds. Higher @@ -1644,9 +1645,6 @@ multirangejoinsel(PG_FUNCTION_ARGS) /* * A < B * -* Start by comparing lower bounds and if they are equal -* compare upper bounds -* * Underestimate by comparing only the lower bounds. Higher * accuracy would require us to add P(lower1 = lower2) * * P(upper1 < upper2) @@ -1661,9 +1659,6 @@ multirangejoinsel(PG_FUNCTION_ARGS) /* * A >= B * -* Start by comparing lower bounds and if they are equal -* compare upper bounds -* * Negation of OID_RANGE_LESS_OP. * * Overestimate by comparing only the lower bounds. Higher @@ -1680,9 +1675,6 @@ multirangejoinsel(PG_FUNCTION_ARGS) /* * A > B == B < A * -* Start by comparing lower bounds and if they are equal -* compare upper bounds -* * Underestimate by comparing only the lower bounds. Higher * accuracy would require us to add P(lower1 = lower2) * * P(upper1 > upper2) @@ -1773,18 +1765,16 @@ multirangejoinsel(PG_FUNCTION_ARGS) case OID_MULTIRANGE_ADJACENT_MULTIRANGE_OP: case OID_MULTIRANGE_ADJACENT_RANGE_OP: case OID_RANGE_ADJACENT_MULTIRANGE_OP: - - /* -* just punt for now, estimation would require equality -* selectivity for bounds -*/ case OID_MULTIRANGE_CONTAINS_ELEM_OP: case OID_MULTIRANGE_ELEM_CONTAINED_OP: - /* -* just punt for now, estimation would require extraction of -* histograms for the anyelement -*/ + /* + * just punt for now: + * if it is a type of adjucent operation estimation + * it will require equality selectivity for bounds; + * if it is one of type of contain operation + * it will extraction of histograms for the any element. + */
Re: POC, WIP: OR-clause support for indexes
On 30.11.2023 11:00, Alena Rybakina wrote: Hi! Honestly, it seems very hard to avoid the conclusion that this transformation is being done at too early a stage. Parse analysis is not the time to try to do query optimization. I can't really believe that there's a way to produce a committable patch along these lines. Ideally, a transformation like this should be done after we know what plan shape we're using (or considering using), so that we can make cost-based decisions about whether to transform or not. But at the very least it should happen somewhere in the planner. There's really no justification for parse analysis rewriting the SQL that the user entered. Here, we assume that array operation is generally better than many ORs. As a result, it should be more effective to make OR->ANY transformation in the parser (it is a relatively lightweight operation here) and, as a second phase, decompose that in the optimizer. We implemented earlier prototypes in different places of the optimizer, and I'm convinced that only this approach resolves the issues we found. Does this approach look weird? Maybe. We can debate it in this thread. I think this is incorrect, and the example of A. Korotkov confirms this. If we perform the conversion at the parsing stage, we will skip the more important conversion using OR expressions. I'll show you in the example below. First of all, I will describe my idea to combine two approaches to obtaining plans with OR to ANY transformation and ANY to OR transformation. I think they are both good, and we can't work with just one of them, we should consider both the option of OR expressions, and with ANY. Just in case, I have attached a patch or->any transformation where this happens at the index creation stage. I get diff file during make check, but judging by the changes, it shows that the transformation is going well. I also attached it. -- Regards, Alena Rybakina Postgres Professional From 9f75b58525b4892db2c360401771f69599ed21c8 Mon Sep 17 00:00:00 2001 From: Alena Rybakina Date: Wed, 29 Nov 2023 18:58:55 +0300 Subject: [PATCH] Transform OR clause to ANY expressions. Replace (X=N1) OR (X=N2) ... with X = ANY(N1, N2) on the preliminary stage of optimization when we are still working with a tree expression. Sometimes it can lead to not optimal plan. But we think it is better to have array of elements instead of a lot of OR clauses. Here is a room for further optimizations on decomposing that array into more optimal parts. --- src/backend/nodes/queryjumblefuncs.c | 30 ++ src/backend/optimizer/path/indxpath.c | 394 +- src/backend/utils/misc/guc_tables.c | 10 + src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/nodes/queryjumble.h | 1 + src/include/optimizer/paths.h | 1 + src/test/regress/expected/create_index.out| 120 ++ src/test/regress/expected/guc.out | 3 +- src/test/regress/expected/join.out| 48 +++ src/test/regress/expected/partition_prune.out | 179 src/test/regress/expected/sysviews.out| 3 +- src/test/regress/expected/tidscan.out | 17 + src/test/regress/sql/create_index.sql | 32 ++ src/test/regress/sql/join.sql | 10 + src/test/regress/sql/partition_prune.sql | 22 + src/test/regress/sql/tidscan.sql | 6 + src/tools/pgindent/typedefs.list | 1 + 17 files changed, 875 insertions(+), 3 deletions(-) diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c index 281907a4d83..c98fda73d17 100644 --- a/src/backend/nodes/queryjumblefuncs.c +++ b/src/backend/nodes/queryjumblefuncs.c @@ -283,6 +283,36 @@ _jumbleNode(JumbleState *jstate, Node *node) } } +JumbleState * +JumbleExpr(Expr *expr, uint64 *queryId) +{ + JumbleState *jstate = NULL; + + Assert(queryId != NULL); + + jstate = (JumbleState *) palloc(sizeof(JumbleState)); + + /* Set up workspace for query jumbling */ + jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE); + jstate->jumble_len = 0; + jstate->clocations_buf_size = 32; + jstate->clocations = (LocationLen *) + palloc(jstate->clocations_buf_size * sizeof(LocationLen)); + jstate->clocations_count = 0; + jstate->highest_extern_param_id = 0; + + /* Compute query ID */ + _jumbleNode(jstate, (Node *) expr); + *queryId = DatumGetUInt64(hash_any_extended(jstate->jumble, +jstate->jumble_len, +0)); + + if (*queryId == UINT64CONST(0)) + *queryId = UINT64CONST(1); + + return jstate; +} + static void _jumbleList(JumbleState *jstate, Node *node) { diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 03a5fbdc6dc..acf4937db01 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -34,7 +34,12 @@ #include "optimizer/restrictinfo.h&qu
Re: POC, WIP: OR-clause support for indexes
On 30.11.2023 11:30, Andrei Lepikhov wrote: On 30/11/2023 15:00, Alena Rybakina wrote: 2. The second patch is my patch version when I moved the OR transformation in the s index formation stage: So, I got the best query plan despite the possible OR to ANY transformation: If the user uses a clause like "x IN (1,2) AND y=100", it will break your 'good' solution. No, unfortunately I still see the plan with Seq scan node: postgres=# explain analyze select * from test where x in (1,2) and y = 100; QUERY PLAN Gather (cost=1000.00..12690.10 rows=1 width=12) (actual time=72.985..74.832 rows=0 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on test (cost=0.00..11690.00 rows=1 width=12) (actual time=68.573..68.573 rows=0 loops=3) Filter: ((x = ANY ('{1,2}'::integer[])) AND (y = '100'::double precision)) Rows Removed by Filter: 33 Planning Time: 0.264 ms Execution Time: 74.887 ms (8 rows) In my opinion, the general approach here is to stay with OR->ANY transformation at the parsing stage and invent one more way for picking an index by looking into the array and attempting to find a compound index. Having a shorter list of expressions, where uniform ORs are grouped into arrays, the optimizer will do such work with less overhead. Looking at the current index generation code, implementing this approach will require a lot of refactoring so that functions starting with get_indexes do not rely on the current baserestrictinfo, but use only the indexrestrictinfo, which is a copy of baserestrictinfo. And I think, potentially, there may be complexity also with the equivalences that we can get from OR expressions. All interesting transformations are available only for OR expressions, not for ANY, that is, it makes sense to try the last chance to find a suitable plan with the available OR expressions and if that plan turns out to be better, use it. -- Regards, Alena Rybakina Postgres Professional
Re: POC, WIP: OR-clause support for indexes
Sorry, I forgot to apply my patches. For the first experiment was 0001-OR-to-ANY-in-parser-and-ANY-to-OR-in-index.diff and for the second experiment was 0002-OR-to-ANY-in-index.diff. On 30.11.2023 11:00, Alena Rybakina wrote: Hi! Honestly, it seems very hard to avoid the conclusion that this transformation is being done at too early a stage. Parse analysis is not the time to try to do query optimization. I can't really believe that there's a way to produce a committable patch along these lines. Ideally, a transformation like this should be done after we know what plan shape we're using (or considering using), so that we can make cost-based decisions about whether to transform or not. But at the very least it should happen somewhere in the planner. There's really no justification for parse analysis rewriting the SQL that the user entered. Here, we assume that array operation is generally better than many ORs. As a result, it should be more effective to make OR->ANY transformation in the parser (it is a relatively lightweight operation here) and, as a second phase, decompose that in the optimizer. We implemented earlier prototypes in different places of the optimizer, and I'm convinced that only this approach resolves the issues we found. Does this approach look weird? Maybe. We can debate it in this thread. I think this is incorrect, and the example of A. Korotkov confirms this. If we perform the conversion at the parsing stage, we will skip the more important conversion using OR expressions. I'll show you in the example below. First of all, I will describe my idea to combine two approaches to obtaining plans with OR to ANY transformation and ANY to OR transformation. I think they are both good, and we can't work with just one of them, we should consider both the option of OR expressions, and with ANY. I did this by creating a RelOptInfo with which has references from the original RelOptInfo, for which conversion is possible either from ANY->OR, or vice versa. After obtaining the necessary transformation, I started the procedure for obtaining the seq and index paths for both relations and then calculated their cost. The relation with the lowest cost is considered the best. I'm not sure if this is the best approach, but it's less complicated. I noticed that I got a lower cost for not the best plan, but I think this corresponds to another topic related to the wrong estimate calculation. 1. The first patch is a mixture of the original patch (when we perform the conversion of OR to ANY at the parsing stage), and when we perform the conversion at the index creation stage with the conversion to an OR expression. We can see that the query proposed by A.Korotkov did not have the best plan with ANY expression at all, and even despite receiving a query with OR expressions, we cannot get anything better than SeqScan, due to the lack of effective logical transformations that would have been performed if we had left the OR expressions. So, I got query plans using enable_or_transformation if it is enabled: postgres=# create table test as (select (random()*10)::int x, (random()*1000) y from generate_series(1,100) i); create index test_x_1_y on test (y) where x = 1; create index test_x_2_y on test (y) where x = 2; vacuum analyze test; SELECT 100 CREATE INDEX CREATE INDEX VACUUM postgres=# explain select * from test where (x = 1 or x = 2) and y = 100; WARNING: cost with original approach: - 20440.00 WARNING: cost with OR to ANY applied transfomation: - 15440.00 QUERY PLAN -- Gather (cost=1000.00..12690.10 rows=1 width=12) Workers Planned: 2 -> Parallel Seq Scan on test (cost=0.00..11690.00 rows=1 width=12) Filter: (((x = 1) OR (x = 2)) AND (y = '100'::double precision)) (4 rows) and if it is off: postgres=# set enable_or_transformation =off; SET postgres=# explain select * from test where (x = 1 or x = 2) and y = 100; QUERY PLAN -- Bitmap Heap Scan on test (cost=8.60..12.62 rows=1 width=12) Recheck Cond: (((y = '100'::double precision) AND (x = 1)) OR ((y = '100'::double precision) AND (x = 2))) -> BitmapOr (cost=8.60..8.60 rows=1 width=0) -> Bitmap Index Scan on test_x_1_y (cost=0.00..4.30 rows=1 width=0) Index Cond: (y = '100'::double precision) -> Bitmap Index Scan on test_x_2_y (cost=0.00..4.30 rows=1 width=0) Index Cond: (y = '100'::double precision) (7 rows) 2. The second patch is my patch version when I moved the OR transformation in the s index formation stage: So, I got the best query plan despite the possible OR to ANY transformation: postgres=# create table test as (s
Re: POC, WIP: OR-clause support for indexes
REATE INDEX VACUUM postgres=# explain select * from test where (x = 1 or x = 2) and y = 100; WARNING: cost with original approach: - 12.618000 WARNING: cost with OR to ANY applied transfomation: - 15440.00 QUERY PLAN -- Bitmap Heap Scan on test (cost=8.60..12.62 rows=1 width=12) Recheck Cond: (((y = '100'::double precision) AND (x = 1)) OR ((y = '100'::double precision) AND (x = 2))) -> BitmapOr (cost=8.60..8.60 rows=1 width=0) -> Bitmap Index Scan on test_x_1_y (cost=0.00..4.30 rows=1 width=0) Index Cond: (y = '100'::double precision) -> Bitmap Index Scan on test_x_2_y (cost=0.00..4.30 rows=1 width=0) Index Cond: (y = '100'::double precision) (7 rows) -- Regards, Alena Rybakina Postgres Professional
Re: POC, WIP: OR-clause support for indexes
On 25.11.2023 19:13, Alexander Korotkov wrote: On Sat, Nov 25, 2023 at 1:10 PM Alena Rybakina wrote: On 25.11.2023 04:13, Alexander Korotkov wrote: It seems to me there is a confusion. I didn't mean we need to move conversion of OR-expressions to ANY into choose_bitmap_and() function or anything like this. My idea was to avoid degradation of plans, which I've seen in [1]. Current code for generation of bitmap paths considers the possibility to split OR-expressions into distinct bitmap index scans. But it doesn't consider this possibility for ANY-expressions. So, my idea was to enhance our bitmap scan generation to consider split values of ANY-expressions into distinct bitmap index scans. So, in the example [1] and similar queries conversion of OR-expressions to ANY wouldn't affect the generation of bitmap paths. Thanks for the explanation, yes, I did not understand the idea correctly at first. I will try to implement something similar. Alena, great, thank you. I'm looking forward to the updated patch. I wrote the patch (any_to_or.diff.txt), although it is still under development (not all regression tests have been successful so far), it is already clear that for a query where a bad plan was chosen before, it is now choosing a more optimal query plan. postgres=# set enable_or_transformation =on; SET postgres=# explain select * from test where (x = 1 or x = 2) and y = 100; QUERY PLAN -- Bitmap Heap Scan on test (cost=8.60..12.62 rows=1 width=12) Recheck Cond: (((y = '100'::double precision) AND (x = 1)) OR ((y = '100'::double precision) AND (x = 2))) -> BitmapOr (cost=8.60..8.60 rows=1 width=0) -> Bitmap Index Scan on test_x_1_y (cost=0.00..4.30 rows=1 width=0) Index Cond: (y = '100'::double precision) -> Bitmap Index Scan on test_x_2_y (cost=0.00..4.30 rows=1 width=0) Index Cond: (y = '100'::double precision) (7 rows) While I'm thinking how to combine it now. BTW, while I was figuring out how create_index_paths works and creating bitmapscan indexes, I think I found a bug with unallocated memory (fix patch is bugfix.diff.txt). Without a fix here, it falls into the crust at the stage of assigning a value to any of the variables, specifically, skip_lower_stop and skip_nonnative_saop. I discovered it when I forced to form a bitmapindex plan for ANY (any_to_or.diff.txt). I'm causing a problem with my OR->ANY conversion patch. -- Regards, Alena Rybakina Postgres Professional diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 03a5fbdc6dc..c242eec34d6 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -34,6 +34,7 @@ #include "optimizer/restrictinfo.h" #include "utils/lsyscache.h" #include "utils/selfuncs.h" +#include "parser/parse_expr.h" /* XXX see PartCollMatchesExprColl */ @@ -715,8 +716,8 @@ get_index_paths(PlannerInfo *root, RelOptInfo *rel, List **bitindexpaths) { List *indexpaths; - boolskip_nonnative_saop = false; - boolskip_lower_saop = false; + boolskip_nonnative_saop; + boolskip_lower_saop; ListCell *lc; /* @@ -737,7 +738,7 @@ get_index_paths(PlannerInfo *root, RelOptInfo *rel, * that supports them, then try again including those clauses. This will * produce paths with more selectivity but no ordering. */ - if (skip_lower_saop) + if (skip_lower_saop || enable_or_transformation) { indexpaths = list_concat(indexpaths, build_index_paths(root, rel, @@ -778,7 +779,7 @@ get_index_paths(PlannerInfo *root, RelOptInfo *rel, * natively, generate bitmap scan paths relying on executor-managed * ScalarArrayOpExpr. */ - if (skip_nonnative_saop) + if (skip_nonnative_saop || enable_or_transformation) { indexpaths = build_index_paths(root, rel, index, clauses, @@ -908,7 +912,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel, { if (!index->amsearcharray) { - if (skip_nonnative_saop) + if (skip_nonnative_saop || enable_or_transformation) { /* Ignore because not supported by index */ *skip_nonnative_saop = true;
Re: POC, WIP: OR-clause support for indexes
On 25.11.2023 04:13, Alexander Korotkov wrote: It seems to me there is a confusion. I didn't mean we need to move conversion of OR-expressions to ANY into choose_bitmap_and() function or anything like this. My idea was to avoid degradation of plans, which I've seen in [1]. Current code for generation of bitmap paths considers the possibility to split OR-expressions into distinct bitmap index scans. But it doesn't consider this possibility for ANY-expressions. So, my idea was to enhance our bitmap scan generation to consider split values of ANY-expressions into distinct bitmap index scans. So, in the example [1] and similar queries conversion of OR-expressions to ANY wouldn't affect the generation of bitmap paths. Thanks for the explanation, yes, I did not understand the idea correctly at first. I will try to implement something similar. -- Regards, Alena Rybakina Postgres Professional
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
On 24.11.2023 13:20, Alena Rybakina wrote: Hi! Thank you for your work on the subject, I think it's a really useful feature. I've reviewed your patch and I have a few questions. First of all, have you thought about creating a gun parameter to display memory scheduling information? I agree that this is an important feature, but I think only for debugging. Secondly, I noticed that for the child_rinfo_hash key you use a counter (int) and can it lead to collisions? Why didn't you generate a hash from childRestrictInfo for this? For example, something like how it is formed here [0]. [0] https://www.postgresql.org/message-id/43ad8a48-b980-410d-a83c-5beebf82a4ed%40postgrespro.ru Sorry, my first question was not clear, I mean: have you thought about creating a guc parameter to display memory planning information? -- Regards, Alena Rybakina
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
Hi! Thank you for your work on the subject, I think it's a really useful feature. I've reviewed your patch and I have a few questions. First of all, have you thought about creating a gun parameter to display memory scheduling information? I agree that this is an important feature, but I think only for debugging. Secondly, I noticed that for the child_rinfo_hash key you use a counter (int) and can it lead to collisions? Why didn't you generate a hash from childRestrictInfo for this? For example, something like how it is formed here [0]. [0] https://www.postgresql.org/message-id/43ad8a48-b980-410d-a83c-5beebf82a4ed%40postgrespro.ru -- Regards, Alena Rybakina Postgres Professional
Re: POC, WIP: OR-clause support for indexes
On 23.11.2023 12:23, Andrei Lepikhov wrote: I think the usage of nodeToString for the generation of clause hash is too expensive and buggy. Also, in the code, you didn't resolve hash collisions. So, I've rewritten the patch a bit (see the attachment). One more thing: I propose to enable transformation by default at least for quick detection of possible issues. This code changes tests in many places. But, as I see it, it mostly demonstrates the positive effect of the transformation. On 24.11.2023 06:30, Andrei Lepikhov wrote: On 23/11/2023 16:23, Andrei Lepikhov wrote: This code changes tests in many places. But, as I see it, it mostly demonstrates the positive effect of the transformation. I found out that the postgres_fdw tests were impacted by the feature. Fix it, because the patch is on the commitfest and passes buildfarm. Taking advantage of this, I suppressed the expression evaluation procedure to make regression test changes more clear. Thank you for your work. You are right, the patch with the current changes looks better and works more correctly. To be honest, I didn't think we could use JumbleExpr in this way. -- Regards, Alena Rybakina Postgres Professional
Re: POC, WIP: OR-clause support for indexes
On 21.11.2023 03:50, Alena Rybakina wrote: On 20.11.2023 11:52, Andrei Lepikhov wrote: Looking into the patch, I found some trivial improvements (see attachment). Also, it is not obvious that using a string representation of the clause as a hash table key is needed here. Also, by making a copy of the node in the get_key_nconst_node(), you replace the location field, but in the case of complex expression, you don't do the same with other nodes. I propose to generate expression hash instead + prove the equality of two expressions by calling equal(). I was thinking about your last email and a possible error where the location field may not be cleared in complex expressions. Unfortunately, I didn't come up with any examples either, but I think I was able to handle this with a special function that removes location-related patterns. The alternative to this is to bypass this expression, but I think it will be more invasive. In addition, I have added changes related to the hash table: now the key is of type int. All changes are displayed in the attached v9-0001-Replace-OR-clause-to_ANY.diff.txt file. I haven't measured it yet. But what do you think about these changes? Sorry, I lost your changes during the revision process. I returned them. I raised the patch version just in case to run ci successfully. -- Regards, Alena Rybakina Postgres Professional From 66d1c479347d80ea45fc7aebaed873d45f9c4351 Mon Sep 17 00:00:00 2001 From: Alena Rybakina Date: Tue, 21 Nov 2023 14:20:56 +0300 Subject: [PATCH] [PATCH] Replace OR clause to ANY expressions. Replace (X=N1) OR (X=N2) ... with X = ANY(N1, N2) on the stage of the optimiser when we are still working with a tree expression. Firstly, we do not try to make a transformation for "non-or" expressions or inequalities and the creation of a relation with "or" expressions occurs according to the same scenario. Secondly, we do not make transformations if there are less than set or_transform_limit. Thirdly, it is worth considering that we consider "or" expressions only at the current level. --- src/backend/parser/parse_expr.c | 280 +- src/backend/utils/misc/guc_tables.c | 10 + src/include/parser/parse_expr.h | 1 + src/test/regress/expected/create_index.out| 115 +++ src/test/regress/expected/guc.out | 3 +- src/test/regress/expected/join.out| 50 src/test/regress/expected/partition_prune.out | 156 ++ src/test/regress/expected/sysviews.out| 3 +- src/test/regress/expected/tidscan.out | 17 ++ src/test/regress/sql/create_index.sql | 32 ++ src/test/regress/sql/join.sql | 10 + src/test/regress/sql/partition_prune.sql | 22 ++ src/test/regress/sql/tidscan.sql | 6 + src/tools/pgindent/typedefs.list | 1 + 14 files changed, 703 insertions(+), 3 deletions(-) diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index 64c582c344c..7b76c9f29a1 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -18,6 +18,7 @@ #include "catalog/pg_aggregate.h" #include "catalog/pg_proc.h" #include "catalog/pg_type.h" +#include "common/hashfn.h" #include "commands/dbcommands.h" #include "miscadmin.h" #include "nodes/makefuncs.h" @@ -43,6 +44,7 @@ /* GUC parameters */ bool Transform_null_equals = false; +bool enable_or_transformation = false; static Node *transformExprRecurse(ParseState *pstate, Node *expr); @@ -98,7 +100,283 @@ static Expr *make_distinct_op(ParseState *pstate, List *opname, Node *ltree, Node *rtree, int location); static Node *make_nulltest_from_distinct(ParseState *pstate, A_Expr *distincta, Node *arg); +typedef struct OrClauseGroupEntry +{ + int32 hash_leftvar_key; + + Node *node; + List *consts; + Oidscalar_type; + Oidopno; + Node *expr; +} OrClauseGroupEntry; + +/* + * TODO: consider different algorithm to manage complexity + * in the case of many different clauses, + * like Rabin-Karp or Boyer–Moore algorithms. + */ +static char * +clear_patterns(const char *str, const char *start_pattern) +{ + int i = 0; + int j = 0; + int start_pattern_len = strlen(start_pattern); + char *res = palloc0(sizeof(*res) * (strlen(str) + 1)); + + for (i = 0; str[i];) + { + if (i >= start_pattern_len && strncmp([i - start_pattern_len], + start_pattern, + start_pattern_len) == 0) + { + while (str[i] && !(str[i] == '{' || str[i] == '}')) +i++; + } + if (str[i]) + res[j++] = str[i++]; + } + + return res; +} + +static int32 +get_key_nconst_node(Node *nconst_node) +{ + char *str = nodeToString(nconst_node); + + str = clear_patterns(str, " :location"); + + if (str == NULL) + return 0; +
Re: POC, WIP: OR-clause support for indexes
On 20.11.2023 11:52, Andrei Lepikhov wrote: Looking into the patch, I found some trivial improvements (see attachment). Also, it is not obvious that using a string representation of the clause as a hash table key is needed here. Also, by making a copy of the node in the get_key_nconst_node(), you replace the location field, but in the case of complex expression, you don't do the same with other nodes. I propose to generate expression hash instead + prove the equality of two expressions by calling equal(). I was thinking about your last email and a possible error where the location field may not be cleared in complex expressions. Unfortunately, I didn't come up with any examples either, but I think I was able to handle this with a special function that removes location-related patterns. The alternative to this is to bypass this expression, but I think it will be more invasive. In addition, I have added changes related to the hash table: now the key is of type int. All changes are displayed in the attached v9-0001-Replace-OR-clause-to_ANY.diff.txt file. I haven't measured it yet. But what do you think about these changes? -- Regards, Alena Rybakina Postgres Professional From 8e579b059ffd415fc477e0e8930e718084e58683 Mon Sep 17 00:00:00 2001 From: Alena Rybakina Date: Tue, 21 Nov 2023 03:22:13 +0300 Subject: [PATCH] [PATCH] Replace OR clause to ANY expressions. Replace (X=N1) OR (X=N2) ... with X = ANY(N1, N2) on the stage of the optimiser when we are still working with a tree expression. Firstly, we do not try to make a transformation for "non-or" expressions or inequalities and the creation of a relation with "or" expressions occurs according to the same scenario. Secondly, we do not make transformations if there are less than set or_transform_limit. Thirdly, it is worth considering that we consider "or" expressions only at the current level. Authors: Alena Rybakina , Andrey Lepikhov Reviewed-by: Peter Geoghegan , Ranier Vilela Reviewed-by: Alexander Korotkov , Robert Haas --- src/backend/parser/parse_expr.c | 280 +- src/backend/utils/misc/guc_tables.c | 10 + src/include/parser/parse_expr.h | 1 + src/test/regress/expected/create_index.out| 115 +++ src/test/regress/expected/guc.out | 3 +- src/test/regress/expected/join.out| 50 src/test/regress/expected/partition_prune.out | 156 ++ src/test/regress/expected/sysviews.out| 3 +- src/test/regress/expected/tidscan.out | 17 ++ src/test/regress/sql/create_index.sql | 32 ++ src/test/regress/sql/join.sql | 10 + src/test/regress/sql/partition_prune.sql | 22 ++ src/test/regress/sql/tidscan.sql | 6 + src/tools/pgindent/typedefs.list | 1 + 14 files changed, 703 insertions(+), 3 deletions(-) diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index 64c582c344c..290422005a3 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -18,6 +18,7 @@ #include "catalog/pg_aggregate.h" #include "catalog/pg_proc.h" #include "catalog/pg_type.h" +#include "common/hashfn.h" #include "commands/dbcommands.h" #include "miscadmin.h" #include "nodes/makefuncs.h" @@ -43,6 +44,7 @@ /* GUC parameters */ bool Transform_null_equals = false; +bool enable_or_transformation = false; static Node *transformExprRecurse(ParseState *pstate, Node *expr); @@ -98,7 +100,283 @@ static Expr *make_distinct_op(ParseState *pstate, List *opname, Node *ltree, Node *rtree, int location); static Node *make_nulltest_from_distinct(ParseState *pstate, A_Expr *distincta, Node *arg); +typedef struct OrClauseGroupEntry +{ + int32 hash_leftvar_key; + + Node *node; + List *consts; + Oidscalar_type; + Oidopno; + Node *expr; +} OrClauseGroupEntry; + +/* + * TODO: consider different algorithm to manage complexity + * in the case of many different clauses, + * like Rabin-Karp or Boyer–Moore algorithms. + */ +static char * +clear_patterns(const char *str, const char *start_pattern) +{ + int i = 0; + int j = 0; + int start_pattern_len = strlen(start_pattern); + char *res = palloc0(sizeof(*res) * (strlen(str) + 1)); + + for (i = 0; str[i];) + { + if (i >= start_pattern_len && strncmp([i - start_pattern_len], + start_pattern, + start_pattern_len) == 0) + { + while (str[i] && !(str[i] == '{' || str[i] == '}')) +i++; + } + if (str[i]) + res[j++] = str[i++]; + } + + return res; +} + +static int32 +get_key_nconst_node(Node *nconst_node) +{ + char *str = nodeToString(nconst_node); + + str = clear_patterns(str, " :location"); + + if (str == NULL) + return 0; + + return DatumGetInt32(hash_any((unsigned char *)s
Re: Wrong results with grouping sets
Hi! Thank you for your work on the subject. On 25.09.2023 10:11, Richard Guo wrote: I think I've come across a wrong result issue with grouping sets, as shown by the query below. -- result is correct with only grouping sets select a, b from (values (1, 1), (2, 2)) as t (a, b) where a = b group by grouping sets((a, b), (a)); a | b ---+--- 1 | 1 1 | 2 | 2 2 | (4 rows) -- result is NOT correct with grouping sets and distinct on select distinct on (a, b) a, b from (values (1, 1), (2, 2)) as t (a, b) where a = b group by grouping sets((a, b), (a)); a | b ---+--- 1 | 1 2 | 2 (2 rows) The distinct on expressions include both 'a' and 'b', so rows (1, 1) and (1, NULL) should not be considered equal. (The same for rows (2, 2) and (2, NULL).) I noticed that this query worked correctly in the main branch with the inequality operator: postgres=# select distinct on (a, b) a, b from (values (3, 1), (2, 2)) as t (a, b) where a > b group by grouping sets((a, b), (a)); a | b ---+--- 3 | 1 3 | (2 rows) So, I think you are right) I think the root cause is that when we generate distinct_pathkeys, we failed to realize that Var 'b' might be nullable by the grouping sets, so it's no longer always equal to Var 'a'. It's not correct to deem that the PathKey for 'b' is redundant and thus remove it from the pathkeys list. We have the same issue when generating sort_pathkeys. As a result, we may have the final output in the wrong order. There were several reports about this issue before, such as [1][2]. To fix this issue, I'm thinking that we mark the grouping expressions nullable by grouping sets with a dummy RTE for grouping sets, something like attached. In practice we do not need to create a real RTE for that, what we need is just a RT index. In the patch I use 0, because it's not a valid outer join relid, so it would not conflict with the outer-join-aware-Var infrastructure. If the grouping expression is a Var or PHV, we can just set its nullingrels, very straightforward. For an expression that is neither a Var nor a PHV, I'm not quite sure how to set the nullingrels. I tried the idea of wrapping it in a new PHV to carry the nullingrels, but that may cause some unnecessary plan diffs. In the patch for such an expression I just set the nullingrels of Vars or PHVs that are contained in it. This is not really 'correct' in theory, because it is the whole expression that can be nullable by grouping sets, not its individual vars. But it works in practice, because what we need is that the expression can be somehow distinguished from the same expression in ECs, and marking its vars is sufficient for this purpose. But what if the expression is variable-free? This is the point that needs more work. Fow now the patch just handles variable-free expressions of type Const, by wrapping it in a new PHV. There are some places where we need to artificially remove the RT index of grouping sets from the nullingrels, such as when we generate PathTarget for initial input to grouping nodes, or when we generate PathKeys for the grouping clauses, because the expressions there are logically below the grouping sets. We also need to do that in set_upper_references when we update the targetlist of an Agg node, so that we can perform exact match for nullingrels, rather than superset match. Since the fix depends on the nullingrels stuff, it seems not easy for back-patching. I'm not sure what we should do in back branches. Any thoughts? [1] https://www.postgresql.org/message-id/cambws48atqtqgk37msydk_eagdo3y0ia_lzvuvgq2uo_wh2...@mail.gmail.com [2] https://www.postgresql.org/message-id/17071-24dc13fbfa296...@postgresql.org I looked at your patch and noticed a few things: 1. I think you should add a test with the cube operator, because I noticed that the order of the query in the result has also changed: master: postgres=# select a,b from (values(1,1),(2,2)) as t (a,b) where a=b group by cube (a, (a,b)) order by b,a; a | b ---+--- 1 | 1 1 | 1 1 | 2 | 2 2 | 2 2 | | (7 rows) with patch: postgres=# select a, b from (values (1, 1), (2, 2)) as t (a, b) where a = b group by cube(a, (a, b)) order by b,a; a | b ---+--- 1 | 1 1 | 1 2 | 2 2 | 2 1 | 2 | | (7 rows) -- Regards, Alena Rybakina
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Hi! On 14.11.2023 13:10, Damir Belyalov wrote: Here is a very straw-man-level sketch of what I think might work. The option to COPY FROM looks something like ERRORS TO other_table_name (item [, item [, ...]]) I tried to implement the patch using a table and came across a number of questions. Which table should we implement for this feature: a system catalog table or store this table as a file or create a new table? In these cases, security and user rights management issues arise. It is better for other users not to see error lines from another user. It is also not clear how access rights to this table are inherited and be given. Maybe we can add a guc or a parameter to output such errors during the execution of the copy function with errors and check whether the user has enough rights to set such a parameter? That is, I propose to give the user a choice to run copy with and without saving errors and at the same time immediately check whether the option with error output is possible for him in principle? -- Regards, Alena Rybakina
Re: POC, WIP: OR-clause support for indexes
On 06.11.2023 16:51, Alena Rybakina wrote: I also support this approach. I have almost finished writing a patch that fixes the first problem related to the quadratic complexity of processing expressions by adding a hash table. I also added a check: if the number of groups is equal to the number of OR expressions, we assume that no expressions need to be converted and interrupt further execution. Now I am trying to fix the last problem in this patch: three tests have indicated a problem related to incorrect conversion. I don't think it can be serious, but I haven't figured out where the mistake is yet. I added log like that: ERROR: unrecognized node type: 0. I fixed this issue and added some cosmetic refactoring. The changes are presented in the or_patch_changes.diff file. -- Regards, Alena Rybakina Postgres Professional diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index 25a4235dbd9..46212a77c64 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -44,7 +44,7 @@ /* GUC parameters */ bool Transform_null_equals = false; -bool or_transform_limit = false; +bool enable_or_transformation = false; static Node *transformExprRecurse(ParseState *pstate, Node *expr); @@ -108,7 +108,7 @@ typedef struct OrClauseGroupEntry List *consts; Oidscalar_type; Oidopno; - Expr *expr; + Node *expr; } OrClauseGroupEntry; static int @@ -189,16 +189,16 @@ transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig) HASH_ELEM | HASH_FUNCTION | HASH_COMPARE); /* If this is not an 'OR' expression, skip the transformation */ - if (expr_orig->boolop != OR_EXPR || !or_transform_limit || len_ors == 1 || !or_group_htab) + if (expr_orig->boolop != OR_EXPR || !enable_or_transformation || len_ors == 1 || !or_group_htab) return transformBoolExpr(pstate, (BoolExpr *) expr_orig); foreach(lc, expr_orig->args) { Node *arg = lfirst(lc); - Node *orqual; - Node *const_expr; - Node *nconst_expr; - OrClauseGroupEntry *gentry; + Node *orqual = NULL; + Node *const_expr = NULL; + Node *nconst_expr = NULL; + OrClauseGroupEntry *gentry = NULL; boolfound; char *str; @@ -270,7 +270,7 @@ transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig) gentry = hash_search(or_group_htab, , HASH_ENTER, ); gentry->node = nconst_expr; gentry->consts = list_make1(const_expr); - gentry->expr = (Expr *) orqual; + gentry->expr = orqual; gentry->hash_leftvar_key = str; } @@ -283,7 +283,6 @@ transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig) * transformed bool expression. */ hash_destroy(or_group_htab); - list_free(or_list); return (Node *) makeBoolExpr(OR_EXPR, or_list, expr_orig->location); } else @@ -345,9 +344,9 @@ transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig) * OK: coerce all the right-hand non-Var inputs to the common * type and build an ArrayExpr for them. */ -List *aexprs; -ArrayExpr *newa; -ScalarArrayOpExpr *saopexpr; +List *aexprs = NIL; +ArrayExpr *newa = NULL; +ScalarArrayOpExpr *saopexpr = NULL; ListCell *l; aexprs = NIL; diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index 54fd09abde7..3411f023df8 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -1049,12 +1049,12 @@ struct config_bool ConfigureNamesBool[] = NULL, NULL, NULL }, { - {"or_transform_limit", PGC_USERSET, QUERY_TUNING_OTHER, + {"enable_or_transformation", PGC_USERSET, QUERY_TUNING_OTHER, gettext_noop("Transform a sequence of OR clauses to an IN expression."), gettext_noop("The planner will replace clauses like 'x=c1 OR x=c2 .." "to the clause 'x IN (c1,c2,...)'") }, - _transform_limit, + _or_transformation, false, NULL, NULL, NULL }, diff --git a/src/include/parser/parse_expr.h b/src/include/parser/parse_expr.h index 7a6943c116c..3a87de02859 100644 --- a/src/include/parser/parse_expr.h +++ b/src/include/parser/parse_expr.h @@ -17,7 +17,7 @@ /* GUC parameters */ extern PGDLLIMPORT bool Transform_null_equals; -extern PGDLLIMPORT bool or_transform_limit; +extern PGDLLIMPORT bool enable_or_transformation; extern Node *transformExpr(ParseState *pstate, Node *expr, ParseExprKind exprKind); diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out index 29c2bc6a2b2..dbc8bc3bed0 100644 --- a/src/test/regress/expected/create_index.out +++ b/src/test/regress/expected/create_index.out @@ -1883,7 +1883,7 @@ SELECT count(*) FROM tenk1 10 (1 row) -SET or_transform_limit = on; +SET enable_or_transformation = on; EXPLAIN (COSTS OFF) SELECT * FROM tenk1 WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR ten
Re: POC, WIP: OR-clause support for indexes
On 30.10.2023 17:06, Alexander Korotkov wrote: On Mon, Oct 30, 2023 at 3:40 PM Robert Haas wrote: On Thu, Oct 26, 2023 at 5:05 PM Peter Geoghegan wrote: On Thu, Oct 26, 2023 at 12:59 PM Robert Haas wrote: Alexander's example seems to show that it's not that simple. If I'm reading his example correctly, with things like aid = 1, the transformation usually wins even if the number of things in the OR expression is large, but with things like aid + 1 * bid = 1, the transformation seems to lose at least with larger numbers of items. So it's not JUST the number of OR elements but also what they contain, unless I'm misunderstanding his point. Alexander said "Generally, I don't see why ANY could be executed slower than the equivalent OR clause". I understood that this was his way of expressing the following idea: "In principle, there is no reason to expect execution of ANY() to be slower than execution of an equivalent OR clause (except for noise-level differences). While it might not actually look that way for every single type of plan you can imagine right now, that doesn't argue for making a cost-based decision. It actually argues for fixing the underlying issue, which can't possibly be due to some kind of fundamental advantage enjoyed by expression evaluation with ORs". This is also what I think of all this. I agree with that, with some caveats, mainly that the reverse is to some extent also true. Maybe not completely, because arguably the ANY() formulation should just be straight-up easier to deal with, but in principle, the two are equivalent and it shouldn't matter which representation we pick. But practically, it may, and we need to be sure that we don't put in place a translation that is theoretically a win but in practice leads to large regressions. Avoiding regressions here is more important than capturing all the possible gains. A patch that wins in some scenarios and does nothing in others can be committed; a patch that wins in even more scenarios but causes serious regressions in some cases probably can't. +1 Sure, I've identified two cases where patch shows regression [1]. The first one (quadratic complexity of expression processing) should be already addressed by usage of hash. The second one (planning regression with Bitmap OR) is not yet addressed. Links 1. https://www.postgresql.org/message-id/CAPpHfduJtO0s9E%3DSHUTzrCD88BH0eik0UNog1_q3XBF2wLmH6g%40mail.gmail.com I also support this approach. I have almost finished writing a patch that fixes the first problem related to the quadratic complexity of processing expressions by adding a hash table. I also added a check: if the number of groups is equal to the number of OR expressions, we assume that no expressions need to be converted and interrupt further execution. Now I am trying to fix the last problem in this patch: three tests have indicated a problem related to incorrect conversion. I don't think it can be serious, but I haven't figured out where the mistake is yet. I added log like that: ERROR: unrecognized node type: 0. -- Regards, Alena Rybakina Postgres Professional From 3062a2408af258b9e327219020221b75501e8530 Mon Sep 17 00:00:00 2001 From: Alena Rybakina Date: Mon, 6 Nov 2023 16:48:00 +0300 Subject: [PATCH] Replace OR clause to ANY expressions. Replace (X=N1) OR (X=N2) ... with X = ANY(N1, N2) on the stage of the optimiser when we are still working with a tree expression. Firstly, we do not try to make a transformation for "non-or" expressions or inequalities and the creation of a relation with "or" expressions occurs according to the same scenario. Secondly, we do not make transformations if there are less than set or_transform_limit. Thirdly, it is worth considering that we consider "or" expressions only at the current level. Authors: Alena Rybakina , Andrey Lepikhov Reviewed-by: Peter Geoghegan , Ranier Vilela Reviewed-by: Alexander Korotkov , Robert Haas --- src/backend/parser/parse_expr.c | 299 +- src/backend/utils/misc/guc_tables.c | 10 + src/include/parser/parse_expr.h | 1 + src/test/regress/expected/create_index.out| 115 +++ src/test/regress/expected/guc.out | 3 +- src/test/regress/expected/join.out| 50 +++ src/test/regress/expected/partition_prune.out | 179 +++ src/test/regress/expected/tidscan.out | 17 + src/test/regress/sql/create_index.sql | 32 ++ src/test/regress/sql/join.sql | 10 + src/test/regress/sql/partition_prune.sql | 22 ++ src/test/regress/sql/tidscan.sql | 6 + src/tools/pgindent/typedefs.list | 1 + 13 files changed, 743 insertions(+), 2 deletions(-) diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index 64c582c344c..25a4235dbd9 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser
Not deleted mentions of the cleared path
Hi, hackers! I have already written about the problem of InvalidPath [0] appearing. I investigated this and found an error in the add_path() function, when we reject a path, we free up the memory of the path, but do not delete various mentions of it (for example, in the ancestor of relation, as in the example below). Thus, we may face the problem of accessing the freed memory. I demonstrated this below using gdb when I call a query after running a test in test/regression/sql/create_misc.sql: *Query:* :-- That ALTER TABLE should have added TOAST tables. SELECT relname, reltoastrelid <> 0 AS has_toast_table FROM pg_class WHERE oid::regclass IN ('a_star', 'c_star') ORDER BY 1; --UPDATE b_star* -- SET a = text 'gazpacho' -- WHERE aa > 4; SELECT class, aa, a FROM a_star*; *gdb: * 0x7ff3f7325fda in epoll_wait (epfd=5, events=0x55bf9ee75c38, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30 30 ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory. (gdb) b /home/alena/postgrespro_3/src/backend/optimizer/util/pathnode.c:620 Breakpoint 1 at 0x55bf9cfe4c65: file pathnode.c, line 621. (gdb) c Continuing. Breakpoint 1, add_path (parent_rel=0x55bf9ef7f5c0, new_path=0x55bf9ef7f4e0) at pathnode.c:621 621 if (!IsA(new_path, IndexPath)) (gdb) n 622 pfree(new_path); (gdb) n 624 } (gdb) p *new_path $1 = {type = T_Invalid, pathtype = T_Invalid, parent = 0x7f7f7f7f7f7f7f7f, pathtarget = 0x7f7f7f7f7f7f7f7f, param_info = 0x7f7f7f7f7f7f7f7f, parallel_aware = 127, parallel_safe = 127, parallel_workers = 2139062143, rows = 1.3824172084878715e+306, startup_cost = 1.3824172084878715e+306, total_cost = 1.3824172084878715e+306, pathkeys = 0x7f7f7f7f7f7f7f7f} *(gdb) p new_path $2 = (Path *) 0x55bf9ef7f4e0* (gdb) p ((ProjectionPath *)((SortPath*)parent_rel->pathlist->elements [0].ptr_value)->subpath)->path->parent->cheapest_startup_path *$20 = (struct Path *) 0x55bf9ef7f4e0* (gdb) p *((ProjectionPath *)((SortPath*)parent_rel->pathlist->elements [0].ptr_value)->subpath)->path->parent->cheapest_startup_path $17 = {type = T_Invalid, pathtype = T_Invalid, parent = 0x7f7f7f7f7f7f7f7f, pathtarget = 0x7f7f7f7f7f7f7f7f, param_info = 0x7f7f7f7f7f7f7f7f, parallel_aware = 127, parallel_safe = 127, parallel_workers = 2139062143, rows = 1.3824172084878715e+306, startup_cost = 1.3824172084878715e+306, total_cost = 1.3824172084878715e+306, pathkeys = 0x7f7f7f7f7f7f7f7f} (gdb) p (Path*)(((ProjectionPath *)((SortPath*)parent_rel->pathlist->elements [0].ptr_value)->subpath)->path->parent->pathlist->elements[1].ptr_value) *$21 = (Path *) 0x55bf9ef7f4e0* (gdb) p *(Path*)(((ProjectionPath *)((SortPath*)parent_rel->pathlist->elements [0].ptr_value)->subpath)->path->parent->pathlist->elements[1].ptr_value) $19 = {type = T_Invalid, pathtype = T_Invalid, parent = 0x7f7f7f7f7f7f7f7f, pathtarget = 0x7f7f7f7f7f7f7f7f, param_info = 0x7f7f7f7f7f7f7f7f, parallel_aware = 127, parallel_safe = 127, parallel_workers = 2139062143, rows = 1.3824172084878715e+306, startup_cost = 1.3824172084878715e+306, total_cost = 1.3824172084878715e+306, pathkeys = 0x7f7f7f7f7f7f7f7f} (gdb) The same problem may be in the add_partial_path() function. Unfortunately, I have not yet been able to find a problematic query with the described case, but I have prepared a patch to fix this problem. What do you think? 0. https://www.postgresql.org/message-id/flat/01aa76cc-93e1-4de9-ab34-b3a890bf7...@postgrespro.ru -- Regards, Alena Rybakina Postgres Professional From b9483800eb9d8dc9bafb114c602c63d5cf596ee7 Mon Sep 17 00:00:00 2001 From: Alena Rybakina Date: Mon, 30 Oct 2023 14:22:21 +0300 Subject: [PATCH] Fix reject released path: we shouldn't only clean path, but we should delete all mentions ottherwice we will face the problem access to released memory. --- src/backend/optimizer/util/pathnode.c | 26 -- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index 0b1d17b9d33..4df9d34c9f1 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -619,7 +619,19 @@ add_path(RelOptInfo *parent_rel, Path *new_path) { /* Reject and recycle the new path */ if (!IsA(new_path, IndexPath)) - pfree(new_path); + { + if(new_path == new_path->parent->cheapest_startup_path) +new_path->parent->cheapest_startup_path = NULL; + if(new_path == new_path->parent->cheapest_total_path) +new_path->parent->cheapest_total_path = NULL; + foreach(p1, new_path->parent->pathlist) + { +Path *path = (Path *) lfirst(p1); + +if (path == new_path) + new_path->parent->pathlist = foreach_delete_current(new_path->parent->pathlist, p1); + } + } } } @@ -849,7 +861,17
Re: POC, WIP: OR-clause support for indexes
Hi! On 27.10.2023 00:04, Peter Geoghegan wrote: On Thu, Oct 26, 2023 at 12:59 PM Robert Haas wrote: Alexander's example seems to show that it's not that simple. If I'm reading his example correctly, with things like aid = 1, the transformation usually wins even if the number of things in the OR expression is large, but with things like aid + 1 * bid = 1, the transformation seems to lose at least with larger numbers of items. So it's not JUST the number of OR elements but also what they contain, unless I'm misunderstanding his point. Alexander said "Generally, I don't see why ANY could be executed slower than the equivalent OR clause". I understood that this was his way of expressing the following idea: "In principle, there is no reason to expect execution of ANY() to be slower than execution of an equivalent OR clause (except for noise-level differences). While it might not actually look that way for every single type of plan you can imagine right now, that doesn't argue for making a cost-based decision. It actually argues for fixing the underlying issue, which can't possibly be due to some kind of fundamental advantage enjoyed by expression evaluation with ORs". This is also what I think of all this. Alexander's partial index example had this quality to it. Obviously, the planner *could* be taught to do the right thing with such a case, with a little more work. The fact that it doesn't right now is definitely a problem, and should probably be treated as a blocker for this patch. But that doesn't really argue against the general idea behind the patch -- it just argues for fixing that one problem. There may also be a separate problem that comes from the added planner cycles required to do the transformation -- particularly in extreme or adversarial cases. We should worry about that, too. But, again, it doesn't change the basic fact, which is that having a standard/normalized representation of OR lists/DNF transformation is extremely useful in general, and *shouldn't* result in any real slowdowns at execution time if done well. I think it would be more correct to finalize the current approach to converting "OR" expressions to "ANY", since quite a few problems related to this patch have already been found here, I think you can solve them first, and then you can move on. To me, this sort of output suggests that perhaps the transformation is being done in the wrong place. I expect that we have to decide whether to convert from OR to = ANY(...) at a very early stage of the planner, before we have any idea what the selected path will ultimately be. But this output suggests that we want the answer to depend on which kind of path is going to be faster, which would seem to argue for doing this sort of transformation as part of path generation for only those paths that will benefit from it, rather than during earlier phases of expression processing/simplification. I don't think that that's the right direction. They're semantically equivalent things. But a SAOP-based plan can be fundamentally better, since SAOPs enable passing down useful context to index AMs (at least nbtree). And because we can use a hash table for SAOP expression evaluation. It's a higher level, standardized, well optimized way of expressing exactly the same concept. I can come up with a case that'll be orders of magnitude more efficient with this patch, despite the transformation process only affecting a small OR list of 3 or 5 elements -- a 100x reduction in heap page accesses is quite possible. This is particularly likely to come up if you assume that the nbtree patch that I'm currently working on is also available. In general, I think that we totally over-rely on bitmap index scans, especially BitmapOrs. Regarding the application of the transformation at an early stage, the patch is almost ready, except for solving cases related to queries that work slower. I haven't figured out how to exclude such requests without comparing the cost or parameter by the number of OR elements yet. The simplest option is not to process Expr types (already mentioned earlier) in the queries that Alexander gave as an example, but as I already said, I don't like this approach very much. -- Regards, Alena Rybakina Postgres Professional
Invalid Path with UpperRel
Hi, hackers! I recently encountered strange behavior when, after running the create_ms.sql test, I ran the last query from this test. In general, the playback looks like this: \i src/test/regress/sql/create_misc.sql I added Assert(0) in create_sort_plan() before calling create_plan_recurse and restarted postgres. After that I run query: SELECT relname, reltoastrelid <> 0 AS has_toast_table FROM pg_class WHERE oid::regclass IN ('a_star', 'c_star') ORDER BY 1; I found Invalid_path in cheapest_startup_path: (gdb) p *(IndexPath *)((SortPath *)best_path)->subpath->parent->cheapest_startup_path $12 = {path = {type = T_Invalid, pathtype = T_Invalid, parent = 0x7f7f7f7f7f7f7f7f, pathtarget = 0x7f7f7f7f7f7f7f7f, param_info = 0x7f7f7f7f7f7f7f7f, parallel_aware = 127, parallel_safe = 127, parallel_workers = 2139062143 , rows = 1.3824172084878715e+306, startup_cost = 1.3824172084878715e+306, total_cost = 1.3824172084878715e+306, pathkeys = 0x7f7f7f7f7f7f7f7f}, indexinfo = 0x7f7f7f7f7f7f7f7f, indexclauses = 0x7f7f7f7f7f7f7f7f, indexorderbys = 0x7f7f7f7f7f7f7f7f, indexorderbycols = 0x7f7f7f7f7f7f7f7f, indexscandir = 2139062143 , indextotalcost = 1.3824172084878715e+306, indexselectivity = 1.3824172084878715e+306} (gdb) p (IndexPath *)((SortPath *)best_path)->subpath->parent->cheapest_startup_path $11 = (IndexPath *) 0x555febc66160 I found that this beginning since creation upperrel (fetch_upper_rel function): /* primary planning entry point (may recurse for subqueries) */ root = subquery_planner(glob, parse, NULL, false, tuple_fraction); /* Select best Path and turn it into a Plan */ * final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL);* best_path = get_cheapest_fractional_path(final_rel, tuple_fraction); Red Heart (gdb) p *(IndexPath *)((SortPath *)final_rel->cheapest_total_path )->subpath->parent->cheapest_startup_path $15 = {path = {type = T_Invalid, pathtype = T_Invalid, parent = 0x7f7f7f7f7f7f7f7f, pathtarget = 0x7f7f7f7f7f7f7f7f, param_info = 0x7f7f7f7f7f7f7f7f, parallel_aware = 127, parallel_safe = 127, parallel_workers = 2139062143 , rows = 1.3824172084878715e+306, startup_cost = 1.3824172084878715e+306, total_cost = 1.3824172084878715e+306, pathkeys = 0x7f7f7f7f7f7f7f7f}, indexinfo = 0x7f7f7f7f7f7f7f7f, indexclauses = 0x7f7f7f7f7f7f7f7f, indexorderbys = 0x7f7f7f7f7f7f7f7f, indexorderbycols = 0x7f7f7f7f7f7f7f7f, indexscandir = 2139062143 , indextotalcost = 1.3824172084878715e+306, indexselectivity = 1.3824172084878715e+306} (gdb) p (IndexPath *)((SortPath *)final_rel->cheapest_total_path )->subpath->parent->cheapest_startup_path $16 = (IndexPath *) 0x555febc66160 I know it doesn't cause a crash anywhere, but can anybody explain me what's going on here and why Invalid Path appears? -- Regards, Alena Rybakina
Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements
On 25.10.2023 18:35, Andrei Zubkov wrote: Hi Alena, On Wed, 2023-10-25 at 16:25 +0300, Alena Rybakina wrote: Hi! Thank you for your work on the subject. 1. I didn't understand why we first try to find pgssEntry with a false top_level value, and later find it with a true top_level value. In case of pg_stat_statements the top_level field is the bool field of the pgssHashKey struct used as the key for pgss_hash hashtable. When we are performing a reset only userid, dbid and queryid values are provided. We need to reset both top-level and non-top level entries. We have only one way to get them all from a hashtable - search for entries having top_level=true and with top_level=false in their keys. Thank you for explanation, I got it. 2. And honestly, I think you should change "Remove the entry if it exists, starting with the non-top-level entry." on "Remove the entry or reset min/max time statistic information and the timestamp if it exists, starting with the non-top-level entry." And the same with the comment bellow: "Also reset the top-level entry if it exists." "Also remove the entry or reset min/max time statistic information and the timestamp if it exists." There are four such places actually - every time when the SINGLE_ENTRY_RESET macro is used. The logic of reset performed is described a bit in this macro definition. It seems quite redundant to repeat this description four times. But I've noted that "remove" terms should be replaced by "reset". I've attached v24 with commits updated. Yes, I agree with the changes. -- Regards, Alena Rybakina
Re: POC, WIP: OR-clause support for indexes
On 26.10.2023 22:58, Robert Haas wrote: On Thu, Oct 26, 2023 at 3:47 PM Alena Rybakina wrote: With small amounts of "OR" elements, the cost of orexpr is lower than with "ANY", on the contrary, higher. Alexander's example seems to show that it's not that simple. If I'm reading his example correctly, with things like aid = 1, the transformation usually wins even if the number of things in the OR expression is large, but with things like aid + 1 * bid = 1, the transformation seems to lose at least with larger numbers of items. So it's not JUST the number of OR elements but also what they contain, unless I'm misunderstanding his point. Yes, I agree, with Alexander's example, this option will not help and here I need to look inside Expr itself. But I noticed that such a complex non-constant expression is always an OpExpr type, otherwise if the non-constant part contains only one variable, then it is a Var type. We can add a constraint that we will transform expressions with the simple variables like x=1 or x=2 or x=3, etc., but expressions like x*1+y=1 or x*2+y=2... we ignore. But then, we do not consider expressions when the nonconstant part is always the same for expressions. For example, we could transform x*1+y=1 or x*1+y=2... to x*1+y = ANY([1,2,...]). But I think it's not so critical, because such cases are rare. Index Scan using pg_class_oid_index on pg_class (cost=0.27..2859.42 rows=414 width=68) (actual time=1.504..34.183 rows=260 loops=1) Index Cond: (oid = ANY (ARRAY['1'::oid, '2'::oid, '3'::oid, '4'::oid, '5'::oid, '6'::oid, '7'::oid, Bitmap Heap Scan on pg_class (cost=43835.00..54202.14 rows=414 width=68) (actual time=39.958..41.293 rows=260 loops=1) Recheck Cond: ((oid = '1'::oid) OR (oid = '2'::oid) OR (oid = '3'::oid) OR (oid = '4'::oid) OR (oid = I think we could see which value is lower, and if lower with expressions converted to ANY, then work with it further, otherwise work with the original "OR" expressions. But we still need to make this conversion to find out its cost. To me, this sort of output suggests that perhaps the transformation is being done in the wrong place. I expect that we have to decide whether to convert from OR to = ANY(...) at a very early stage of the planner, before we have any idea what the selected path will ultimately be. But this output suggests that we want the answer to depend on which kind of path is going to be faster, which would seem to argue for doing this sort of transformation as part of path generation for only those paths that will benefit from it, rather than during earlier phases of expression processing/simplification. I'm not sure I have the full picture here, though, so I might have this all wrong. This would be the most ideal option, and to be honest, I like the conversion at an early stage also because there are no problems with selectivity or link updates if we changed the structure of RestrictInfo of relation. But in terms of calculating which option is better to use transformed or original, I think this solution might be complicated, since we need not only to highlight the cases in which the transformation wins in principle, but also with which types of data it will work best and there is a risk of missing some cases and we may need the own evaluation model. Now it's hard for me to come up with something simple. The cost option seems simpler and clearer to me, but yes, it is difficult to decide when it is better to do the conversion for the most correct estimate. -- Regards, Alena Rybakina
Re: POC, WIP: OR-clause support for indexes
Hi! Thank you for your feedback! On 25.10.2023 22:54, Robert Haas wrote: On Sat, Oct 14, 2023 at 6:37 PM Alexander Korotkov wrote: Regarding the GUC parameter, I don't see we need a limit. It's not yet clear whether a small number or a large number of OR clauses are more favorable for transformation. I propose to have just a boolean enable_or_transformation GUC. That's a poor solution. So is the GUC patch currently has (or_transform_limit). What you need is a heuristic that figures out fairly reliably whether the transformation is going to be better or worse. Or else, do the whole thing in some other way that is always same-or-better. In general, adding GUCs makes sense when the user knows something that we can't know. For example, shared_buffers makes some sense because, even if we discovered how much memory the machine has, we can't know how much of it the user wants to devote to PostgreSQL as opposed to anything else. And track_io_timing makes sense because we can't know whether the user wants to pay the price of gathering that additional data. But GUCs are a poor way of handling cases where the real problem is that we don't know what code to write. In this case, some queries will be better with enable_or_transformation=on, and some will be better with enable_or_transformation=off. Since we don't know which will work out better, we make the user figure it out and set the GUC, possibly differently for each query. That's terrible. It's the query optimizer's whole job to figure out which transformations will speed up the query. It shouldn't turn around and punt the decision back to the user. Notice that superficially-similar GUCs like enable_seqscan aren't really the same thing at all. That's just for developer testing and debugging. Nobody expects that you have to adjust that GUC on a production system - ever. I noticed that the costs of expressions are different and it can help to assess when it is worth leaving the conversion, when not. With small amounts of "OR" elements, the cost of orexpr is lower than with "ANY", on the contrary, higher. postgres=# SET or_transform_limit = 500; EXPLAIN (analyze) SELECT oid,relname FROM pg_class WHERE oid = 13779 AND (oid = 2 OR oid = 4 OR oid = 5) ; SET QUERY PLAN -- Index Scan using pg_class_oid_index on pg_class (*cost=0.27..8.30* rows=1 width=68) (actual time=0.105..0.106 rows=0 loops=1) Index Cond: (oid = '13779'::oid) Filter: ((oid = '2'::oid) OR (oid = '4'::oid) OR (oid = '5'::oid)) Planning Time: 0.323 ms Execution Time: 0.160 ms (5 rows) postgres=# SET or_transform_limit = 0; EXPLAIN (analyze) SELECT oid,relname FROM pg_class WHERE oid = 13779 AND (oid = 2 OR oid = 4 OR oid = 5) ; SET QUERY PLAN --- Index Scan using pg_class_oid_index on pg_class (*cost=0.27..16.86* rows=1 width=68) (actual time=0.160..0.161 rows=0 loops=1) Index Cond: ((oid = ANY (ARRAY['2'::oid, '4'::oid, '5'::oid])) AND (oid = '13779'::oid)) Planning Time: 4.515 ms Execution Time: 0.313 ms (4 rows) Index Scan using pg_class_oid_index on pg_class (*cost=0.27..2859.42* rows=414 width=68) (actual time=1.504..34.183 rows=260 loops=1) Index Cond: (oid = ANY (ARRAY['1'::oid, '2'::oid, '3'::oid, '4'::oid, '5'::oid, '6'::oid, '7'::oid, Bitmap Heap Scan on pg_class (*cost=43835.00..54202.14* rows=414 width=68) (actual time=39.958..41.293 rows=260 loops=1) Recheck Cond: ((oid = '1'::oid) OR (oid = '2'::oid) OR (oid = '3'::oid) OR (oid = '4'::oid) OR (oid = I think we could see which value is lower, and if lower with expressions converted to ANY, then work with it further, otherwise work with the original "OR" expressions. But we still need to make this conversion to find out its cost. In addition, I will definitely have to postpone the transformation of "OR" to "ANY" at the stage of creating indexes (?) or maybe a little earlier so that I have to count only the cost of the transformed expression.
Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements
On 19.10.2023 15:40, Andrei Zubkov wrote: Hi hackers, New version 23 attached. It contains rebase to the current master. Noted that v1.11 adds new fields to the pg_stat_sstatements view, but leaves the PGSS_FILE_HEADER constant unchanged. It this correct? Hi! Thank you for your work on the subject. 1. I didn't understand why we first try to find pgssEntry with a false top_level value, and later find it with a true top_level value. /* * Remove the entry if it exists, starting with the non-top-level entry. */ *key.toplevel = false;* entry = (pgssEntry *) hash_search(pgss_hash, , HASH_FIND, NULL); SINGLE_ENTRY_RESET(entry); */* Also reset the top-level entry if it exists. */ key.toplevel = true;* entry = (pgssEntry *) hash_search(pgss_hash, , HASH_FIND, NULL); SINGLE_ENTRY_RESET(entry); I looked through this topic and found some explanation in this email [0], but I didn't understand it. Can you explain it to me? 2. And honestly, I think you should change "Remove the entry if it exists, starting with the non-top-level entry." on "Remove the entry or reset min/max time statistic information and the timestamp if it exists, starting with the non-top-level entry." And the same with the comment bellow: "Also reset the top-level entry if it exists." "Also remove the entry or reset min/max time statistic information and the timestamp if it exists." In my opinion, this is necessary because the minmax_only parameter is set by the user, so both ways are possible. 0 - https://www.postgresql.org/message-id/62d16845-e74e-a6f9-9661-022e44f48922%40inbox.ru -- Regards, Alena Rybakina
Re: Simplify create_merge_append_path a bit for clarity
Hi! On 11.08.2023 05:31, Richard Guo wrote: As explained in the comments for generate_orderedappend_paths, we don't currently support parameterized MergeAppend paths, and it doesn't seem like going to change anytime soon. Based on that, we could simplify create_merge_append_path a bit, such as set param_info to NULL directly rather than call get_appendrel_parampathinfo() for it. We already have an Assert on that in create_merge_append_plan. I understand that the change would not make any difference for performance, it's just for clarity's sake. I agree with you, and we can indeed directly set the param_info value to NULL, and there are enough comments here to explain. I didn't find anything else to add in your patch. -- Regards, Alena Rybakina
Re: Evaluate arguments of correlated SubPlans in the referencing ExprState
Hi! I looked through your patch and noticed that it was not applied to the current version of the master. I rebased it and attached a version. I didn't see any problems and, honestly, no big changes were needed, all regression tests were passed. I think it's better to add a test, but to be honest, I haven't been able to come up with something yet. -- Regards, Alena Rybakina From f7a8ca7f3263fa5f82056f90231cf937133622c9 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Mon, 23 Oct 2023 22:54:04 +0300 Subject: [PATCH] WIP: Evaluate arguments of correlated SubPlans in the referencing ExprState --- src/backend/executor/execExpr.c | 93 +-- src/backend/executor/execExprInterp.c | 23 +++ src/backend/executor/execProcnode.c | 5 ++ src/backend/executor/nodeSubplan.c| 30 - src/backend/jit/llvm/llvmjit_expr.c | 6 ++ src/backend/jit/llvm/llvmjit_types.c | 1 + src/include/executor/execExpr.h | 6 +- src/include/nodes/execnodes.h | 1 - 8 files changed, 113 insertions(+), 52 deletions(-) diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c index 2c62b0c9c84..14cd56a2db3 100644 --- a/src/backend/executor/execExpr.c +++ b/src/backend/executor/execExpr.c @@ -88,6 +88,9 @@ static void ExecBuildAggTransCall(ExprState *state, AggState *aggstate, FunctionCallInfo fcinfo, AggStatePerTrans pertrans, int transno, int setno, int setoff, bool ishash, bool nullcheck); +static void ExecInitSubPlanExpr(SubPlan *subplan, +ExprState *state, +Datum *resv, bool *resnull); /* @@ -1389,7 +1392,6 @@ ExecInitExprRec(Expr *node, ExprState *state, case T_SubPlan: { SubPlan*subplan = (SubPlan *) node; -SubPlanState *sstate; /* * Real execution of a MULTIEXPR SubPlan has already been @@ -1406,19 +1408,7 @@ ExecInitExprRec(Expr *node, ExprState *state, break; } -if (!state->parent) - elog(ERROR, "SubPlan found with no parent plan"); - -sstate = ExecInitSubPlan(subplan, state->parent); - -/* add SubPlanState nodes to state->parent->subPlan */ -state->parent->subPlan = lappend(state->parent->subPlan, - sstate); - -scratch.opcode = EEOP_SUBPLAN; -scratch.d.subplan.sstate = sstate; - -ExprEvalPushStep(state, ); +ExecInitSubPlanExpr(subplan, state, resv, resnull); break; } @@ -2750,29 +2740,12 @@ ExecPushExprSetupSteps(ExprState *state, ExprSetupInfo *info) foreach(lc, info->multiexpr_subplans) { SubPlan*subplan = (SubPlan *) lfirst(lc); - SubPlanState *sstate; Assert(subplan->subLinkType == MULTIEXPR_SUBLINK); - /* This should match what ExecInitExprRec does for other SubPlans: */ - - if (!state->parent) - elog(ERROR, "SubPlan found with no parent plan"); - - sstate = ExecInitSubPlan(subplan, state->parent); - - /* add SubPlanState nodes to state->parent->subPlan */ - state->parent->subPlan = lappend(state->parent->subPlan, - sstate); - - scratch.opcode = EEOP_SUBPLAN; - scratch.d.subplan.sstate = sstate; - /* The result can be ignored, but we better put it somewhere */ - scratch.resvalue = >resvalue; - scratch.resnull = >resnull; - - ExprEvalPushStep(state, ); + ExecInitSubPlanExpr(subplan, state, + >resvalue, >resnull); } } @@ -4178,3 +4151,57 @@ ExecBuildParamSetEqual(TupleDesc desc, return state; } + +static void +ExecInitSubPlanExpr(SubPlan *subplan, + ExprState *state, + Datum *resv, bool *resnull) +{ + ExprEvalStep scratch = {0}; + SubPlanState *sstate; + ListCell *pvar; + ListCell *l; + + if (!state->parent) + elog(ERROR, "SubPlan found with no parent plan"); + + /* + * Generate steps to evaluate input arguments for the subplan. + * + * We evaluate the argument expressions into ExprState's resvalue/resnull, + * and then use PARAM_SET to update the parameter. We do that, instead of + * evaluating directly into the param, to avoid depending on the pointer + * value remaining stable / being included in the generated expression. No + * danger of conflicts with other uses of resvalue/resnull as storing and + * using the value always is in subsequent steps. + * + * Any calculation we have to do can be done in the parent econtext, since + * the Param values don't need to have per-query lifetime. + */ + forboth(l, subplan->parParam, pvar, subplan->args) + { + int paramid = lfirst_int(l); + + ExecInitExprRec(lfirst(pvar), state, + >resvalue, >resnull); + + scratch.opcode = EEOP_PARAM_SET; + scratch.d.param.paramid = paramid; + /* type isn't needed, but an old value could be confusing */ + scratch.d.param.paramtype = InvalidOid; + ExprEvalPushStep(state, ); + } + + sstate = ExecInitSubPlan(subplan, state->parent); + + /* add SubPlanState nodes to state->parent->sub
Re: Oversight in reparameterize_path_by_child leading to executor crash
Hi! Thank you for your work on the subject. During review your patch I didn't understand why are you checking that the variable is path and not new_path of type T_SamplePath (I highlighted it)? Path * reparameterize_path_by_child(PlannerInfo *root, Path *path, RelOptInfo *child_rel) ... switch (nodeTag(path)) { case T_Path: new_path = path; ADJUST_CHILD_ATTRS(new_path->parent->baserestrictinfo); if (*path*->pathtype == T_SampleScan) { Is it a typo and should be new_path? Besides, it may not be important, but reviewing your code all the time stumbled on the statement of the comments while reading it (I highlighted it too). This: * path_is_reparameterizable_by_child * Given a path parameterized by the parent of the given child relation, * see if it can be translated to be parameterized by the child relation. * * This must return true if *and only if *reparameterize_path_by_child() * would succeed on this path. Maybe is it better to rewrite it simplier: * This must return true *only if *reparameterize_path_by_child() * would succeed on this path. And can we add assert in reparameterize_pathlist_by_child function that pathlist is not a NIL, because according to the comment it needs to be added there: Returns NIL to indicate failure, so pathlist had better not be NIL. -- Regards, Alena Rybakina
Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()
Hi! Thank you for your work on the subject. I reviewed your patch and found that your commit message does not fully explain your code, in addition, I found several spelling mistakes. I think it's better to change to: With parallel seqscan, we should consider materializing the cheapest inner path in case of nested loop if it doesn't contain a unique node or it is unsafe to use it in a subquery. Besides, I couldn't understand why we again check that material path is safe? if (matpath != NULL && matpath->parallel_safe) try_partial_nestloop_path(root, joinrel, outerpath, matpath, pathkeys, jointype, extra); -- Regards, Alena Rybakina
Re: A new strategy for pull-up correlated ANY_SUBLINK
On 13.10.2023 10:04, Andy Fan wrote: It seems to me that the expressions "=" and "IN" are equivalent here due to the fact that the aggregated subquery returns only one value, and the result with the "IN" operation can be considered as the intersection of elements on the left and right. In this query, we have some kind of set on the left, among which there will be found or not only one element on the right. Yes, they are equivalent at the final result, but there are some differences at the execution level. the '=' case will be transformed to a Subplan whose subPlanType is EXPR_SUBLINK, so if there is more than 1 rows is returned in the subplan, error will be raised. select * from tenk1 where ten = (select ten from tenk1 i where i.two = tenk1.two ); ERROR: more than one row returned by a subquery used as an expression However the IN case would not. select * from tenk1 where ten = (select ten from tenk1 i where i.two = tenk1.two ) is OK. I think the test case you added is not related to this feature. the difference is there even without the patch. so I kept the code you changed, but not for the test case. Yes, I understand and agree with you that we should delete the last queries, except to one. The query below have a different result compared to master, and it is correct. Without your patch: explain (costs off) +SELECT * FROM tenk1 A LEFT JOIN tenk2 B ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd); QUERY PLAN - Nested Loop Left Join -> Seq Scan on tenk1 a -> Materialize -> Seq Scan on tenk2 b Filter: (SubPlan 2) SubPlan 2 -> Result InitPlan 1 (returns $1) -> Limit -> Index Scan using tenk2_hundred on tenk2 c Index Cond: (hundred IS NOT NULL) Filter: (odd = b.odd) (12 rows) After your patch: postgres=# explain (costs off) SELECT * FROM tenk1 A LEFT JOIN tenk2 B ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd); QUERY PLAN -- Nested Loop Left Join -> Seq Scan on tenk1 a -> Materialize -> Nested Loop -> Seq Scan on tenk2 b *-> Subquery Scan on "ANY_subquery" Filter: (b.hundred = "ANY_subquery".min)* -> Aggregate -> Seq Scan on tenk2 c Filter: (odd = b.odd) (10 rows) I took the liberty of adding this to your patch and added myself as reviewer, if you don't mind. Sure, the patch after your modification looks better than the original. I'm not sure how the test case around "because of got one row" is relevant to the current changes. After we reach to some agreement on the above discussion, I think v4 is good for committer to review! Thank you!) I am ready to discuss it. Actually I meant to discuss the "Unfortunately, I found a request..", looks we have reached an agreement there:) Yes, we have) -- Regards, Alena Rybakina diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out index 17df6b5dc9c..e41b728df83 100644 --- a/src/test/regress/expected/subselect.out +++ b/src/test/regress/expected/subselect.out @@ -2028,3 +2028,27 @@ ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd); -> Seq Scan on tenk2 b (11 rows) +-- we can pull up the aggregate sublink into RHS of a left join. +explain (costs off) +SELECT * FROM tenk1 A LEFT JOIN tenk2 B +ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd); + QUERY PLAN +--- + Nested Loop Left Join + -> Seq Scan on tenk1 a + -> Materialize + -> Nested Loop + -> Seq Scan on tenk2 b + -> Memoize + Cache Key: b.hundred, b.odd + Cache Mode: binary + -> Subquery Scan on "ANY_subquery" + Filter: (b.hundred = "ANY_subquery".min) + -> Result + InitPlan 1 (returns $1) + -> Limit + -> Index Scan using tenk2_hundred on tenk2 c + Index Cond: (hundred IS NOT NULL) +
Re: A new strategy for pull-up correlated ANY_SUBLINK
-> Seq Scan on tenk2 c (9 rows) I took the liberty of adding this to your patch and added myself as reviewer, if you don't mind. Sure, the patch after your modification looks better than the original. I'm not sure how the test case around "because of got one row" is relevant to the current changes. After we reach to some agreement on the above discussion, I think v4 is good for committer to review! Thank you!) I am ready to discuss it. -- Regards, Alena Rybakina diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out index b70b346696d..0b8f28b157f 100644 --- a/src/test/regress/expected/subselect.out +++ b/src/test/regress/expected/subselect.out @@ -2028,7 +2028,7 @@ ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd); -> Seq Scan on tenk2 b (11 rows) --- we can pull up the aggregate sublink into the subquery scan because of got one row. +-- we can pull up the aggregate sublink into RHS of a left join. explain (costs off) SELECT * FROM tenk1 A LEFT JOIN tenk2 B ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd); @@ -2052,7 +2052,6 @@ ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd); Filter: (odd = b.odd) (16 rows) --- we can pull up the aggregate sublink into the JoinExpr. explain (costs off) SELECT * FROM tenk1 A LEFT JOIN tenk2 B ON B.hundred in (SELECT count(c.hundred) FROM tenk2 C group by (c.odd)); @@ -2071,3 +2070,39 @@ ON B.hundred in (SELECT count(c.hundred) FROM tenk2 C group by (c.odd)); -> Seq Scan on tenk2 c (11 rows) +-- we can't pull up the aggregate sublink into LHS of a left join. +explain (costs off) +SELECT * FROM tenk1 A LEFT JOIN tenk2 B +ON A.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd); + QUERY PLAN +- + Nested Loop Left Join + Join Filter: (SubPlan 2) + -> Seq Scan on tenk1 a + -> Materialize + -> Seq Scan on tenk2 b + SubPlan 2 + -> Result + InitPlan 1 (returns $1) + -> Limit + -> Index Scan using tenk2_hundred on tenk2 c + Index Cond: (hundred IS NOT NULL) + Filter: (odd = b.odd) +(12 rows) + +explain (costs off) +SELECT * FROM tenk1 A LEFT JOIN tenk2 B +ON A.hundred in (SELECT count(c.hundred) FROM tenk2 C group by (c.odd)); +QUERY PLAN +--- + Nested Loop Left Join + Join Filter: (hashed SubPlan 1) + -> Seq Scan on tenk1 a + -> Materialize + -> Seq Scan on tenk2 b + SubPlan 1 + -> HashAggregate + Group Key: c.odd + -> Seq Scan on tenk2 c +(9 rows) + diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql index 5d33eb39baa..c02513c4989 100644 --- a/src/test/regress/sql/subselect.sql +++ b/src/test/regress/sql/subselect.sql @@ -1001,12 +1001,20 @@ explain (costs off) SELECT * FROM tenk1 A INNER JOIN tenk2 B ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd); --- we can pull up the aggregate sublink into the subquery scan because of got one row. +-- we can pull up the aggregate sublink into RHS of a left join. explain (costs off) SELECT * FROM tenk1 A LEFT JOIN tenk2 B ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd); --- we can pull up the aggregate sublink into the JoinExpr. explain (costs off) SELECT * FROM tenk1 A LEFT JOIN tenk2 B -ON B.hundred in (SELECT count(c.hundred) FROM tenk2 C group by (c.odd)); \ No newline at end of file +ON B.hundred in (SELECT count(c.hundred) FROM tenk2 C group by (c.odd)); + +-- we can't pull up the aggregate sublink into LHS of a left join. +explain (costs off) +SELECT * FROM tenk1 A LEFT JOIN tenk2 B +ON A.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd); + +explain (costs off) +SELECT * FROM tenk1 A LEFT JOIN tenk2 B +ON A.hundred in (SELECT count(c.hundred) FROM tenk2 C group by (c.odd)); \ No newline at end of file
Re: A new strategy for pull-up correlated ANY_SUBLINK
-> HashAggregate Group Key: c.odd -> Seq Scan on tenk2 c (11 rows) Unfortunately, I found a request when sublink did not pull-up, as in the examples above. I couldn't quite figure out why. create table a (x int, y int, z int, t int); create table b (x int, t int); create unique index on a (t, x); create index on b (t,x); insert into a select id, id, id, id FROM generate_series(1,10) As id; insert into b select id, id FROM generate_series(1,1000) As id; explain (analyze, costs off, buffers) select b.x, b.x, a.y from b left join a on b.x=a.x and *b.t in (select max(a0.t) * from a a0 where a0.x = b.x and a0.t = b.t); QUERY PLAN Hash Right Join (actual time=1.150..58.512 rows=1000 loops=1) Hash Cond: (a.x = b.x) *Join Filter: (SubPlan 2)* Buffers: shared hit=3546 -> Seq Scan on a (actual time=0.023..15.798 rows=10 loops=1) Buffers: shared hit=541 -> Hash (actual time=1.038..1.042 rows=1000 loops=1) Buckets: 4096 Batches: 1 Memory Usage: 72kB Buffers: shared hit=5 -> Seq Scan on b (actual time=0.047..0.399 rows=1000 loops=1) Buffers: shared hit=5 SubPlan 2 -> Result (actual time=0.018..0.018 rows=1 loops=1000) Buffers: shared hit=3000 InitPlan 1 (returns $2) -> Limit (actual time=0.015..0.016 rows=1 loops=1000) Buffers: shared hit=3000 -> Index Only Scan using a_t_x_idx on a a0 (actual time=0.014..0.014 rows=1 loops=1000) Index Cond: ((t IS NOT NULL) AND (t = b.t) AND (x = b.x)) Heap Fetches: 1000 Buffers: shared hit=3000 Planning Time: 0.630 ms Execution Time: 58.941 ms (23 rows) I thought it would be: explain (analyze, costs off, buffers) select b.x, b.x, a.y from b left join a on b.x=a.x and *b.t = (select max(a0.t) * from a a0 where a0.x = b.x and a0.t <= b.t); QUERY PLAN - Hash Right Join (actual time=1.181..67.927 rows=1000 loops=1) Hash Cond: (a.x = b.x) *Join Filter: (b.t = (SubPlan 2))* Buffers: shared hit=3546 -> Seq Scan on a (actual time=0.022..17.109 rows=10 loops=1) Buffers: shared hit=541 -> Hash (actual time=1.065..1.068 rows=1000 loops=1) Buckets: 4096 Batches: 1 Memory Usage: 72kB Buffers: shared hit=5 -> Seq Scan on b (actual time=0.049..0.401 rows=1000 loops=1) Buffers: shared hit=5 SubPlan 2 -> Result (actual time=0.025..0.025 rows=1 loops=1000) Buffers: shared hit=3000 InitPlan 1 (returns $2) -> Limit (actual time=0.024..0.024 rows=1 loops=1000) Buffers: shared hit=3000 -> Index Only Scan Backward using a_t_x_idx on a a0 (actual time=0.023..0.023 rows=1 loops=1000) Index Cond: ((t IS NOT NULL) AND (t <= b.t) AND (x = b.x)) Heap Fetches: 1000 Buffers: shared hit=3000 Planning Time: 0.689 ms Execution Time: 68.220 ms (23 rows) If you noticed, it became possible after replacing the "in" operator with "=". I took the liberty of adding this to your patch and added myself as reviewer, if you don't mind. -- Regards, Alena Rybakina diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index 6af4a3183ac..f46ec3d1826 100644 --- a/src/backend/optimizer/plan/subselect.c +++ b/src/backend/optimizer/plan/subselect.c @@ -1284,18 +1284,16 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, Assert(sublink->subLinkType == ANY_SUBLINK); /* - * If the sub-select refers to any Vars of the parent query, we have to - * treat it as LATERAL. (Vars of higher levels don't matter here.) + * If the sub-select refers to any Vars of the parent query, we so let's + * considering it as LATERAL. (Vars of higher levels don't matter here.) */ sub_ref_outer_relids = pull_varnos_of_level(NULL, (Node *) subselect, 1); - if (!bms_is_empty(sub_ref_outer_relids)) - { - if (bms_is_subset(sub_ref_outer_relids, available_rels)) - use_lateral = true; - else - return NULL; - } + use_lateral = !bms_is_empty(sub_ref_outer_relids) && + bms_is_subset(sub_ref_outer_relids, available_rels); + + if (!use_lateral && !bms_is_empty(sub_ref_outer_relids)) + return NULL; /* * The test expression must contain some Vars of the pare
Re: Removing unneeded self joins
Hi! I have reviewed your patch and I noticed a few things. First of all, I think I found a bug in your latest patch version, and this query shows it: EXPLAIN (COSTS OFF) SELECT c.oid, e.oid FROM pg_class c FULL JOIN ( SELECT e1.oid FROM pg_extension e1, pg_extension e2 WHERE e1.oid=e2.oid) AS e ON c.oid=e.oid; In the current version we get such a query plan: QUERY PLAN - Hash Full Join Hash Cond: (c.oid = e2.oid) -> Seq Scan on pg_class c -> Hash -> Seq Scan on pg_extension e2 (5 rows) But I think it should be: QUERY PLAN - Hash Full Join Hash Cond: (c.oid = e2.oid) -> Seq Scan on pg_class c -> Hash -> Seq Scan on pg_extension e2 *Filter: (oid IS NOT NULL)* (6 rows) I have looked at the latest version of the code, I assume that the error lies in the replace_varno_walker function, especially in the place where we check the node by type Var, and does not form any NullTest node. if (OidIsValid(reloid) && get_attnotnull(reloid, attnum)) -- this condition works { rinfo->clause = (Expr *) makeBoolConst(true, false); } else { NullTest *ntest = makeNode(NullTest); ntest->arg = leftOp; ntest->nulltesttype = IS_NOT_NULL; ntest->argisrow = false; ntest->location = -1; rinfo->clause = (Expr *) ntest; } Secondly, I added some code in some places to catch erroneous cases and added a condition when we should not try to apply the self-join-removal transformation due to the absence of an empty self-join list after searching for it and in general if there are no joins in the query. Besides, I added a query for testing and wrote about it above. I have attached my diff file. In addition, I found a comment for myself that was not clear to me. I would be glad if you could explain it to me. You mentioned superior outer join in the comment, unfortunately, I didn't find anything about it in the PostgreSQL code, and this explanation remained unclear to me. Could you explain in more detail what you meant? /* * At this stage joininfo lists of inner and outer can contain * only clauses, required for *a superior outer join* that can't * influence on this optimization. So, we can avoid to call the * build_joinrel_restrictlist() routine. */ restrictlist = generate_join_implied_equalities(root, joinrelids, inner->relids, outer, NULL); -- Regards, Alena Rybakina diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c index 3e10083905c..5ba5ca693f1 100644 --- a/src/backend/optimizer/plan/analyzejoins.c +++ b/src/backend/optimizer/plan/analyzejoins.c @@ -1704,6 +1704,8 @@ remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark, List *binfo_candidates = NIL; ReplaceVarnoContext ctx = {.from = toRemove->relid, .to = toKeep->relid}; + Assert(toKeep->relid != -1); + /* * Replace index of removing table with the keeping one. The technique of * removing/distributing restrictinfo is used here to attach just appeared @@ -2007,6 +2009,8 @@ match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses, /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */ continue; + Assert(is_opclause(orinfo->clause)); + oclause = bms_is_empty(orinfo->left_relids) ? get_rightop(orinfo->clause) : get_leftop(orinfo->clause); c2 = (bms_is_empty(orinfo->left_relids) ? @@ -2150,6 +2154,18 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids) split_selfjoin_quals(root, restrictlist, , , inner->relid, outer->relid); + if (list_length(selfjoinquals) == 0) + { + /* + * XXX: + * we would detect self-join without quals like 'x==x' if we had + * an foreign key constraint on some of other quals and this join + * haven't any columns from the outer in the target list. + * But it is still complex task. + */ + continue; + } + /* * To enable SJE for the only degenerate case without any self join * clauses at all, add baserestrictinfo to this list. @@ -2332,7 +2348,7 @@ remove_useless_self_joins(PlannerInfo *root, List *joinlist) Relids ToRemove = NULL; int relid = -1; - if (!enable_self_join_removal) + if ((list_length(joinlist) <=1 && !IsA(linitial(joinlist), List)) || !enable_self_join_removal) return joinlist; /* diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out index 10b23944feb..800410d6b18 100644 --- a/src/test/regress/expected/join.out +++ b/src/test/regress/expected/join.out @@ -5807,13 +5807,11 @@ explain (costs off) select p.* from (parent p left join child c on (p.k = c.k))
Re: POC, WIP: OR-clause support for indexes
the focus was on expression evaluation overhead. But that's only one of the benefits that we now expect from the patch, right? So it seems like something that should be revisited soon. I'm not suggesting that there is no need for some kind of limit. But it seems like a set of heuristics might be a better approach. Although I would like to get a better sense of the costs of the transformation to be able to say too much more. Yes, this may be revised in the future after some transformations. Initially, I was solving the problem described here [0]. So, after testing [1], I come to the conclusion that 500 is the ideal value for or_transform_limit. [0] https://www.postgresql.org/message-id/919bfbcb-f812-758d-d687-71f89f0d9a68%40postgrespro.ru [1] https://www.postgresql.org/message-id/6b97b517-f36a-f0c6-3b3a-0cf8cfba220c%40yandex.ru -- Regards, Alena Rybakina Postgres Professional
Re: POC, WIP: OR-clause support for indexes
On 02.08.2023 21:58, Peter Geoghegan wrote: On Wed, Aug 2, 2023 at 8:58 AM Alena Rybakina wrote: No, I haven't thought about it yet. I studied the example and it would really be nice to add optimization here. I didn't notice any problems with its implementation. I also have an obvious example with the "or" operator, for example , select * from multi_test, where (a, b ) = ( 1, 1 ) or (a, b ) = ( 2, 1 ) ...; Although I think such a case will be used less often. Right. As I said, I don't particularly care about the row constructor syntax -- it's not essential. In my experience patches like this one that ultimately don't succeed usually *don't* have specific problems that cannot be fixed. The real problem tends to be ambiguity about the general high level design. So more than anything else, ambiguity is the thing that you need to minimize to be successful here. This is the #1 practical problem, by far. This may be the only thing about your patch that I feel 100% sure of. In my experience it can actually be easier to expand the scope of a project, and to come up with a more general solution: https://en.wikipedia.org/wiki/Inventor%27s_paradox I'm not trying to make your work more difficult by expanding its scope. I'm actually trying to make your work *easier* by expanding its scope. I don't claim to know what the specific scope of your patch should be at all. Just that it might be possible to get a much clearer picture of what the ideal scope really is by *trying* to generalize it further -- that understanding is what we lack right now. Even if this exercise fails in some way, it won't really have been a failure. The reasons why it fails will still be interesting and practically relevant. As I explained to Jim, I am trying to put things in this area on a more rigorous footing. For example, I have said that the way that the nbtree code executes SAOP quals is equivalent to DNF. That is basically true, but it's also my own slightly optimistic interpretation of history and of the design. That's a good start, but it's not enough on its own. My interpretation might still be wrong in some subtle way, that I have yet to discover. That's really what I'm concerned about with your patch, too. I'm currently trying to solve a problem that I don't yet fully understand, so for me "getting a properly working flow of information" seems like a good practical exercise. I'm trying to generalize the design of my own patch as far as I can, to see what breaks, and why it breaks. My intuition is that this will help me with my own patch by forcing me to gain a truly rigorous understanding of the problem. My suggestion about generalizing your approach to cover RowCompareExpr cases is what I would do, if I were you, and this was my patch. That's almost exactly what I'm doing with my own patch already, in fact. It's all right. I understand your position) I also agree to try to find other optimization cases and generalize them. I read the wiki article, and as I understand it, in such a situation we see a difficult problem with finding expressions that need to be converted into a logically correct expression and simplify execution for the executor. For example, this is a ROW type. It can have a simpler expression with AND and OR operations, besides we can exclude duplicates. But some of these transformations may be incorrect or they will have a more complex representation. We can try to find the problematic expressions and try to combine them into groups and finally find a solutions for each groups or, conversely, discover that the existing transformation is uncorrected. If I understand correctly, we should first start searching for "ROW" expressions (define a group for them) and think about a solution for the group. It seems to me the most difficult thing is to notice problematic cases where the transformations are incorrect, but I think it can be implemented. Right. To be clear, I am sure that it won't be practical to come up with a 100% theoretically pure approach. If for no other reason than this: normalizing to CNF in all cases will run into problems with very complex predicates. It might even be computationally intractable (could just be very slow). So there is a clear need to keep theoretical purity in balance with practical considerations. There is a need for some kind of negotiation between those two things. Probably some set of heuristics will ultimately be required to keep costs and benefits in balance. I don't expect you to determine what set of heuristics will ultimately be required to determine when and how to perform CNF conversions, in the general case. But having at least some vague idea seems like it might increase confidence in your design. I agree, but I think this will be the second step after solutions are found. Do you think that this problem is just an accidental side-effect? It isn't necessarily the responsibility of your patch to fix such things. If it's
Re: POC, WIP: OR-clause support for indexes
Hi! I think it really helps to speed up the search with similar deep filtering compared to cluster indexes, but do we have cases where we don't use this algorithm because it takes longer than an usual index? I thought about the situation with wide indexes (with a lot of multiple columns) and having a lot of filtering predicates for them. I think that it should be possible for the optimizer to only use multi-column SAOP index paths when there is at least likely to be some small advantage -- that's definitely my goal. Importantly, we may not really need to accurately model the costs where the new approach turns out to be much faster. The only essential thing is that we avoid cases where the new approach is much slower than the old approach. Which is possible (in at least some cases) by making the runtime behavior adaptive. The best decision that the planner can make may be no decision at all. Better to wait until runtime where at all possible, since that gives us the latest and most accurate picture of things. But I'm not sure about this, so it seems to me that this is a problem of improper use of indexes rather. It's hard to say how true that is. Certainly, workloads similar to the TPC-DS benchmark kinda need something like MDAM. It's just not practical to have enough indexes to support every possible query -- the benchmark is deliberately designed to have unpredictable, hard-to-optimize access paths. It seems to require having fewer, more general indexes that can support multi-dimensional access reasonably efficiently. Of course, with OLTP it's much more likely that the workload will have predictable access patterns. That makes having exactly the right indexes much more practical. So maybe you're right there. But, I still see a lot of value in a design that is as forgiving as possible. Users really like that kind of thing in my experience. I tend to agree with you, but a runtime estimate cannot give us an accurate picture when using indexes correctly or any other optimizations due to the unstable state of the environment in which the query is executed. I believe that a more complex analysis is needed here. Currently, the optimizer doesn't recognize multi-column indexes with SAOPs on every column as having a valid sort order, except on the first column. It seems possible that that has consequences for your patch. (I'm really only guessing, though; don't trust anything that I say about the optimizer too much.) Honestly, I couldn't understand your concerns very well, could you describe it in more detail? Well, I'm not sure if there is any possible scenario where the transformation from your patch makes it possible to go from an access path that has a valid sort order (usually because there is an underlying index scan) into an access path that doesn't. In fact, the opposite situation seems more likely (which is good news) -- especially if you assume that my own patch is also present. Going from a bitmap OR (which cannot return sorted output) to a multi-column SAOP index scan (which now can) may have significant value in some specific circumstances. Most obviously, it's really useful when it enables us to feed tuples into a GroupAggregate without a separate sort step, and without a hash aggregate (that's why I see value in combining your patch with my own patch). You just need to be careful about allowing the opposite situation to take place. More generally, there is a need to think about strange second order effects. We want to be open to useful second order effects that make query execution much faster in some specific context, while avoiding harmful second order effects. Intuitively, I think that it should be possible to do this with the transformations performed by your patch. In other words, "helpful serendipity" is an important advantage, while "harmful anti-serendipity" is what we really want to avoid. Ideally by making the harmful cases impossible "by construction". I noticed only one thing there: when we have unsorted array values in SOAP, the query takes longer than when it has a sorted array. I'll double-check it just in case and write about the results later. I am also testing some experience with multi-column indexes using SAOPs. -- Regards, Alena Rybakina Postgres Professional
Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
Hi, all! CNF -> DNF conversion = Like many great papers, the MDAM paper takes one core idea, and finds ways to leverage it to the hilt. Here the core idea is to take predicates in conjunctive normal form (an "AND of ORs"), and convert them into disjunctive normal form (an "OR of ANDs"). DNF quals are logically equivalent to CNF quals, but ideally suited to SAOP-array style processing by an ordered B-Tree index scan -- they reduce everything to a series of non-overlapping primitive index scans, that can be processed in keyspace order. We already do this today in the case of SAOPs, in effect. The nbtree "next array keys" state machine already materializes values that can be seen as MDAM style DNF single value predicates. The state machine works by outputting the cartesian product of each array as a multi-column index is scanned, but that could be taken a lot further in the future. We can use essentially the same kind of state machine to do everything described in the paper -- ultimately, it just needs to output a list of disjuncts, like the DNF clauses that the paper shows in "Table 3". In theory, anything can be supported via a sufficiently complete CNF -> DNF conversion framework. There will likely always be the potential for unsafe/unsupported clauses and/or types in an extensible system like Postgres, though. So we will probably need to retain some notion of safety. It seems like more of a job for nbtree preprocessing (or some suitably index-AM-agnostic version of the same idea) than the optimizer, in any case. But that's not entirely true, either (that would be far too easy). The optimizer still needs to optimize. It can't very well do that without having some kind of advanced notice of what is and is not supported by the index AM. And, the index AM cannot just unilaterally decide that index quals actually should be treated as filter/qpquals, after all -- it doesn't get a veto. So there is a mutual dependency that needs to be resolved. I suspect that there needs to be a two way conversation between the optimizer and nbtree code to break the dependency -- a callback that does some of the preprocessing work during planning. Tom said something along the same lines in passing, when discussing the MDAM paper last year [2]. Much work remains here. Honestly, I'm just reading and delving into this thread and other topics related to it, so excuse me if I ask you a few obvious questions. I noticed that you are going to add CNF->DNF transformation at the index construction stage. If I understand correctly, you will rewrite restrictinfo node, change boolean "AND" expressions to "OR" expressions, but would it be possible to apply such a procedure earlier? Otherwise I suppose you could face the problem of incorrect selectivity of the calculation and, consequently, the cardinality calculation? I can't clearly understand at what stage it is clear that the such a transformation needs to be applied? -- Regards, Alena Rybakina Postgres Professional
Re: POC, WIP: OR-clause support for indexes
Hi! On 26.07.2023 02:47, Peter Geoghegan wrote: On Thu, Jun 29, 2023 at 2:32 AM Alena Rybakina wrote: Hi! I'm sorry I didn't answer you right away, I was too busy with work. Same for me, this time. I was busy working on my patch, which I finally posted yesterday. I'm glad to hear it, I've seen your thread ("Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan"), but, unfortunately, I didn't have enough time to read it. I'll review it soon! To be honest, I didn't think about the fact that my optimization can help eliminate duplicates before reading the data before. I'm not surprised that you didn't specifically think of that, because it's very subtle. I am still only in the process of familiarizing myself with the thread [1] (reference from your letter), but I have already seen that there are problems related, for example, to when "or" expressions refer to the parent element. I didn't intend to imply that you might have the same problem here. I just meant that OR optimizations can have problems with duplicate elimination, in general. I suspect that your patch doesn't have that problem, because you are performing a transformation of one kind of OR into another kind of OR. Yes, you are right, but I studied this topic and two other sources to accumulate my knowledge. It was an exciting experience for me) I was especially inspired after studying the interview with Goetz Graf [2], his life experience is the most inspiring, and from this article I was able to get a broad understanding of the field of databases: current problems, future development, how it works... Thank you for the recommendation. I discovered for myself that the idea described in the article [1] is similar to the idea of representing grouped data in OLAP cubes, and also, if I saw correctly, an algorithm like depth-first search is used there, but for indexes. I think it really helps to speed up the search with similar deep filtering compared to cluster indexes, but do we have cases where we don't use this algorithm because it takes longer than an usual index? I thought about the situation with wide indexes (with a lot of multiple columns) and having a lot of filtering predicates for them. But I'm not sure about this, so it seems to me that this is a problem of improper use of indexes rather. I think, I would face the similar problems if I complicate the current code, for example, so that not only or expressions standing on the same level are written in any, but also on different ones without violating the logic of the priority of executing operators. I can't say that I am particularly experienced in this general area -- I have never tried to formally reason about how two different statements are equivalent. It just hasn't been something that I've needed to have a totally rigorous understanding of up until now. But my recent patch changes that. Now I need to study this area to make sure that I have a truly rigorous understanding. Jeff Davis suggested that I read "Logic and Databases", by C.J. Date. So now I have some homework to do. I'll read this book too. Maybe I can finish work with the knowledge I got from there. Thank you for sharing! Unfortunately, when I tried to make a transformation at the stage of index formation, I encountered too incorrect an assessment of the selectivity of relation, which affected the incorrect calculation of the cost and cardinality. It's not surprising that a weird shift in the plan chosen by the optimizer is seen with some random test case, as a result of this added transformation. Even changes that are 100% strictly better (e.g. changes in a selectivity estimation function that is somehow guaranteed to be more accurate in all cases) might do that. Here is a recent example of that with another patch, involving a bitmap OR: https://postgr.es/m/cah2-wzncdk9n2tz6j_-iln563_epuc3nzp6vsvtl6jhzs6n...@mail.gmail.com At first, this surprised me very much. It took time to find a suitable place to implement the transformation. I have looked through this thread many times, I will study it in more detail . This example was *not* a regression, if you go by conventional measures. It was merely a less robust plan than the bitmap OR plan, because it didn't pass down both columns as index quals. BTW, there are various restrictions on the sort order of SAOPs that you might want to try to familiarize yourself with. I describe them (perhaps not very clearly) here: https://postgr.es/m/CAH2-Wz=ksvn_sjcnd1+bt-wtifra5ok48adynq3pkkhxgmq...@mail.gmail.com Thank you! Yes, I'll study it too) Currently, the optimizer doesn't recognize multi-column indexes with SAOPs on every column as having a valid sort order, except on the first column. It seems possible that that has consequences for your patch. (I'm really only guessing, though; don't trust anything that I say about the optimizer too much.) Honestly, I couldn't unde
Re: POC, WIP: OR-clause support for indexes
Hi, all! I sent a patch to commitfest and noticed that the authors and the reviewer were incorrectly marked. Sorry about that. I fixed it and sent the current version of the patch. -- Regards, Alena Rybakina Postgres Professional From 087125cc413429bda05f22ebbd51115c23819285 Mon Sep 17 00:00:00 2001 From: Alena Rybakina Date: Tue, 18 Jul 2023 17:19:53 +0300 Subject: [PATCH] Replace OR clause to ANY expressions. Replace (X=N1) OR (X=N2) ... with X = ANY(N1, N2) on the stage of the optimiser when we are still working with a tree expression. Firstly, we do not try to make a transformation for "non-or" expressions or inequalities and the creation of a relation with "or" expressions occurs according to the same scenario. Secondly, we do not make transformations if there are less than set or_transform_limit. Thirdly, it is worth considering that we consider "or" expressions only at the current level. Authors: Alena Rybakina , Andrey Lepikhov Reviewed-by: Ranier Vilela --- src/backend/parser/parse_expr.c | 230 +- src/backend/utils/misc/guc_tables.c | 10 + src/include/parser/parse_expr.h | 1 + src/test/regress/expected/create_index.out| 115 + src/test/regress/expected/guc.out | 3 +- src/test/regress/expected/join.out| 50 src/test/regress/expected/partition_prune.out | 179 ++ src/test/regress/expected/tidscan.out | 17 ++ src/test/regress/sql/create_index.sql | 32 +++ src/test/regress/sql/join.sql | 10 + src/test/regress/sql/partition_prune.sql | 22 ++ src/test/regress/sql/tidscan.sql | 6 + src/tools/pgindent/typedefs.list | 1 + 13 files changed, 674 insertions(+), 2 deletions(-) diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index 5a05caa8744..b2294af0f43 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -43,6 +43,7 @@ /* GUC parameters */ bool Transform_null_equals = false; +int or_transform_limit = 500; static Node *transformExprRecurse(ParseState *pstate, Node *expr); @@ -95,6 +96,233 @@ static Expr *make_distinct_op(ParseState *pstate, List *opname, static Node *make_nulltest_from_distinct(ParseState *pstate, A_Expr *distincta, Node *arg); +typedef struct OrClauseGroupEntry +{ + Node *node; + List *consts; + Oidscalar_type; + Oidopno; + Expr *expr; +} OrClauseGroupEntry; + +static Node * +transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig) +{ + List *or_list = NIL; + List *groups_list = NIL; + ListCell *lc; + + /* If this is not an 'OR' expression, skip the transformation */ + if (expr_orig->boolop != OR_EXPR || + list_length(expr_orig->args) < or_transform_limit) + return transformBoolExpr(pstate, (BoolExpr *) expr_orig); + + foreach(lc, expr_orig->args) + { + Node *arg = lfirst(lc); + Node *orqual; + Node *const_expr; + Node *nconst_expr; + ListCell *lc_groups; + OrClauseGroupEntry *gentry; + + /* At first, transform the arg and evaluate constant expressions. */ + orqual = transformExprRecurse(pstate, (Node *) arg); + orqual = coerce_to_boolean(pstate, orqual, "OR"); + orqual = eval_const_expressions(NULL, orqual); + + if (!IsA(orqual, OpExpr)) + { + or_list = lappend(or_list, orqual); + continue; + } + + /* + * Detect the constant side of the clause. Recall non-constant + * expression can be made not only with Vars, but also with Params, + * which is not bonded with any relation. Thus, we detect the const + * side - if another side is constant too, the orqual couldn't be + * an OpExpr. + * Get pointers to constant and expression sides of the qual. + */ + if (IsA(get_leftop(orqual), Const)) + { + nconst_expr = get_rightop(orqual); + const_expr = get_leftop(orqual); + } + else if (IsA(get_rightop(orqual), Const)) + { + const_expr = get_rightop(orqual); + nconst_expr = get_leftop(orqual); + } + else + { + or_list = lappend(or_list, orqual); + continue; + } + + if (!op_mergejoinable(((OpExpr *) orqual)->opno, exprType(nconst_expr))) + { + or_list = lappend(or_list, orqual); + continue; + } + + /* + * At this point we definitely have a transformable clause. + * Classify it and add into specific group of clauses, or create new + * group. + * TODO: to manage complexity in the case of many different clauses + * (X1=C1) OR (X2=C2 OR) ... (XN = CN) we could invent something + * like a hash table. But also we believe, that the case of many + * different variable sides is very rare. + */ + foreach(lc_groups, groups_list) + { + OrClauseGroupEntry *v = (OrClauseGroupEntry *) lfirst(lc_groups); + + Assert(v->node != NULL); + + if (equal(v->node, nconst_expr)) + { +v->consts = lap
Re: Problems with estimating OR conditions, IS NULL on LEFT JOINs
Hi, I'm still working on it, but, unfortunately, I didn't have much time to work with it well enough that there would be something that could be shown. Now I am trying to sort out the problems that I drew attention to in the previous letter. -- Regards, Alena Rybakina Postgres Professional
Re: POC, WIP: OR-clause support for indexes
Hi! On 11.07.2023 11:47, Andrey Lepikhov wrote: This patch looks much better than earlier. But it definitely needs some covering with tests. As a first simple approximation, here you can see the result of regression tests, where the transformation limit is set to 0. See in the attachment some test changes induced by these diffs. Yes, I think so too. I also added some tests. I have attached an additional diff-5.diff where you can see the changes. Also, I see some impact of the transformation to other queries: create_view.out: (NOT x > z) > (x <= z) inherit.out: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[]))) to - (((a)::text = ANY ('{NULL,cd}'::text[])) OR ((a)::text = 'ab'::text)) Transformations, mentioned above, are correct, of course. But it can be a sign of possible unstable behavior. I think it can be made more stable if we always add the existing transformed expressions first, and then the original ones, or vice versa. T o do this, we will need two more lists, I think, and then we can combine them, where the elements of the second will be written to the end of the first. But I suppose that this may not be the only unstable behavior - I suppose we need sorting result elements on the left side, what do you think? -- Regards, Alena Rybakina Postgres Professional diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out index cc229d4dcaf..60e053d2179 100644 --- a/src/test/regress/expected/create_index.out +++ b/src/test/regress/expected/create_index.out @@ -1922,81 +1922,6 @@ SELECT count(*) FROM tenk1 10 (1 row) -EXPLAIN (COSTS OFF) -SELECT count(*) FROM tenk1 - WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3) OR thousand = 41; - QUERY PLAN - - Aggregate - -> Bitmap Heap Scan on tenk1 - Recheck Cond: (((thousand = 42) AND (tenthous = ANY ('{1,3}'::integer[]))) OR (thousand = 41)) - -> BitmapOr - -> Bitmap Index Scan on tenk1_thous_tenthous - Index Cond: ((thousand = 42) AND (tenthous = ANY ('{1,3}'::integer[]))) - -> Bitmap Index Scan on tenk1_thous_tenthous - Index Cond: (thousand = 41) -(8 rows) - -SELECT count(*) FROM tenk1 - WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3) OR thousand = 41; - count -10 -(1 row) - -EXPLAIN (COSTS OFF) -SELECT count(*) FROM tenk1 - WHERE hundred = 42 AND (thousand = 42 OR thousand = 99 OR tenthous < 2) OR thousand = 41; - QUERY PLAN -- - Aggregate - -> Bitmap Heap Scan on tenk1 - Recheck Cond: (((hundred = 42) AND ((tenthous < 2) OR (thousand = ANY ('{42,99}'::integer[] OR (thousand = 41)) - -> BitmapOr - -> BitmapAnd - -> Bitmap Index Scan on tenk1_hundred - Index Cond: (hundred = 42) - -> BitmapOr - -> Bitmap Index Scan on tenk1_thous_tenthous - Index Cond: (tenthous < 2) - -> Bitmap Index Scan on tenk1_thous_tenthous - Index Cond: (thousand = ANY ('{42,99}'::integer[])) - -> Bitmap Index Scan on tenk1_thous_tenthous - Index Cond: (thousand = 41) -(14 rows) - -SELECT count(*) FROM tenk1 - WHERE hundred = 42 AND (thousand = 42 OR thousand = 99 OR tenthous < 2) OR thousand = 41; - count -20 -(1 row) - -EXPLAIN (COSTS OFF) -SELECT count(*) FROM tenk1 - WHERE hundred = 42 AND (thousand = 42 OR thousand = 41 OR thousand = 99 AND tenthous = 2); - QUERY PLAN --- - Aggregate - -> Bitmap Heap Scan on tenk1 - Recheck Cond: ((hundred = 42) AND (((thousand = 99) AND (tenthous = 2)) OR (thousand = ANY ('{42,41}'::integer[] - -> BitmapAnd - -> Bitmap Index Scan on tenk1_hundred - Index Cond: (hundred = 42) - -> BitmapOr - -> Bitmap Index Scan on tenk1_thous_tenthous - Index Cond: ((thousand = 99) AND (tenthous = 2)) - -> Bitmap Index Scan on tenk1_thous_tenthous - Index Cond: (thousand = AN
Re: POC, WIP: OR-clause support for indexes
On 11.07.2023 16:29, Ranier Vilela wrote: Em ter., 11 de jul. de 2023 às 09:29, Alena Rybakina escreveu: Hi! On 10.07.2023 15:15, Ranier Vilela wrote: Em seg., 10 de jul. de 2023 às 09:03, Ranier Vilela escreveu: Hi Alena, Em seg., 10 de jul. de 2023 às 05:38, Alena Rybakina escreveu: I agreed with the changes. Thank you for your work. I updated patch and added you to the authors. I specified Ranier Vilela as a reviewer. Is a good habit when post a new version of the patch, name it v1, v2, v3,etc. Makes it easy to follow development and references on the thread. Sorry, I fixed it. Regarding the last patch. 1. I think that variable const_is_left is not necessary. You can stick with: + if (IsA(get_leftop(orqual), Const)) + nconst_expr =get_rightop(orqual); + const_expr = get_leftop(orqual) ; + else if (IsA(get_rightop(orqual), Const)) + nconst_expr =get_leftop(orqual); + const_expr = get_rightop(orqual) ; + else + { + or_list = lappend(or_list, orqual); + continue; + } Agreed. You missed in removing the declaration - bool const_is_left = true; Yes, thank you. I fixed it. . 2. Test scalar_type != RECORDOID is more cheaper, mainly if OidIsValid were a function, we knows that is a macro. + if (scalar_type != RECORDOID && OidIsValid(scalar_type)) Is it safe? Maybe we should first make sure that it can be checked on RECORDOID at all? Yes it's safe, because && connector. But you can leave as is in v5. Added it. -- Regards, Alena Rybakina Postgres Professional From d4ca399ec5a1866025e6cea5ed5cf0e231af38e3 Mon Sep 17 00:00:00 2001 From: Alena Rybakina Date: Tue, 11 Jul 2023 17:10:25 +0300 Subject: [PATCH] Replace OR clause to ANY expressions. Replace (X=N1) OR (X=N2) ... with X = ANY(N1, N2) on the stage of the optimiser when we are still working with a tree expression. Firstly, we do not try to make a transformation for "non-or" expressions or inequalities and the creation of a relation with "or" expressions occurs according to the same scenario. Secondly, we do not make transformations if there are less than 500 or expressions. Thirdly, it is worth considering that we consider "or" expressions only at the current level. --- src/backend/parser/parse_expr.c | 231 ++- src/tools/pgindent/typedefs.list | 1 + 2 files changed, 231 insertions(+), 1 deletion(-) diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index 346fd272b6d..82d51035684 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -95,6 +95,235 @@ static Expr *make_distinct_op(ParseState *pstate, List *opname, static Node *make_nulltest_from_distinct(ParseState *pstate, A_Expr *distincta, Node *arg); +typedef struct OrClauseGroupEntry +{ + Node *node; + List *consts; + Oidscalar_type; + Oidopno; + Expr *expr; +} OrClauseGroupEntry; + +static int const_transform_or_limit = 500; + +static Node * +transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig) +{ + List *or_list = NIL; + List *groups_list = NIL; + ListCell *lc; + + /* If this is not an 'OR' expression, skip the transformation */ + if (expr_orig->boolop != OR_EXPR || + list_length(expr_orig->args) < const_transform_or_limit) + return transformBoolExpr(pstate, (BoolExpr *) expr_orig); + + foreach(lc, expr_orig->args) + { + Node *arg = lfirst(lc); + Node *orqual; + Node *const_expr; + Node *nconst_expr; + ListCell *lc_groups; + OrClauseGroupEntry *gentry; + + /* At first, transform the arg and evaluate constant expressions. */ + orqual = transformExprRecurse(pstate, (Node *) arg); + orqual = coerce_to_boolean(pstate, orqual, "OR"); + orqual = eval_const_expressions(NULL, orqual); + + if (!IsA(orqual, OpExpr)) + { + or_list = lappend(or_list, orqual); + continue; + } + + /* + * Detect the constant side of the clause. Recall non-constant + * expression can be made not only with Vars, but also with Params, + * which is not bonded with any relation. Thus, we detect the const + * side - if another side is constant too, the orqual couldn't be + * an OpExpr. + * Get pointers to constant and expression sides of the qual. + */ + if (IsA(get_leftop(orqual), Const)) + { + nconst_expr = get_rightop(orqual); + const_expr = get_leftop(orqual); + } + else if (IsA(get_rightop(orqual), Const)) + { + const_expr = get_rightop(orqual); + nconst_expr = get_leftop(orqual); + } + else + { + or_list = lappend(or_list, orqual); + continue; + } + + if (!op_mergejoinable(((OpExpr *) orqual)->opno, exprType(nconst_expr))) + { + or_list = lap
Re: POC, WIP: OR-clause support for indexes
Hi! On 10.07.2023 15:15, Ranier Vilela wrote: Em seg., 10 de jul. de 2023 às 09:03, Ranier Vilela escreveu: Hi Alena, Em seg., 10 de jul. de 2023 às 05:38, Alena Rybakina escreveu: I agreed with the changes. Thank you for your work. I updated patch and added you to the authors. I specified Ranier Vilela as a reviewer. Is a good habit when post a new version of the patch, name it v1, v2, v3,etc. Makes it easy to follow development and references on the thread. Sorry, I fixed it. Regarding the last patch. 1. I think that variable const_is_left is not necessary. You can stick with: + if (IsA(get_leftop(orqual), Const)) + nconst_expr =get_rightop(orqual); + const_expr = get_leftop(orqual) ; + else if (IsA(get_rightop(orqual), Const)) + nconst_expr =get_leftop(orqual); + const_expr = get_rightop(orqual) ; + else + { + or_list = lappend(or_list, orqual); + continue; + } Agreed. 2. Test scalar_type != RECORDOID is more cheaper, mainly if OidIsValid were a function, we knows that is a macro. + if (scalar_type != RECORDOID && OidIsValid(scalar_type)) Is it safe? Maybe we should first make sure that it can be checked on RECORDOID at all? 3. Sorry about wrong tip about array_type, but if really necessary, better use it. + newa->element_typeid = scalar_type; + newa->array_typeid = array_type; Agreed. 4. Is a good habit, call free last, to avoid somebody accidentally using it. + or_list = lappend(or_list, gentry->expr); + list_free(gentry->consts); + continue; No, this is not necessary, because we add the original expression in these places to the resulting list and later we will not use the list of constants for this group at all, otherwise it would be an error. -- Regards, Alena Rybakina Postgres Professional From 55e56f1557ca6879eae6ea62845e180ebfd04662 Mon Sep 17 00:00:00 2001 From: Alena Rybakina Date: Tue, 11 Jul 2023 15:19:22 +0300 Subject: [PATCH] Replace OR clause to ANY expressions. Replace (X=N1) OR (X=N2) ... with X = ANY(N1, N2) on the stage of the optimiser when we are still working with a tree expression. Firstly, we do not try to make a transformation for "non-or" expressions or inequalities and the creation of a relation with "or" expressions occurs according to the same scenario. Secondly, we do not make transformations if there are less than 500 or expressions. Thirdly, it is worth considering that we consider "or" expressions only at the current level. --- src/backend/parser/parse_expr.c | 232 ++- src/tools/pgindent/typedefs.list | 1 + 2 files changed, 232 insertions(+), 1 deletion(-) diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index 346fd272b6d..36b50b6441d 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -95,6 +95,236 @@ static Expr *make_distinct_op(ParseState *pstate, List *opname, static Node *make_nulltest_from_distinct(ParseState *pstate, A_Expr *distincta, Node *arg); +typedef struct OrClauseGroupEntry +{ + Node *node; + List *consts; + Oidscalar_type; + Oidopno; + Expr *expr; +} OrClauseGroupEntry; + +static int const_transform_or_limit = 500; + +static Node * +transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig) +{ + List *or_list = NIL; + List *groups_list = NIL; + ListCell *lc; + + /* If this is not an 'OR' expression, skip the transformation */ + if (expr_orig->boolop != OR_EXPR || + list_length(expr_orig->args) < const_transform_or_limit) + return transformBoolExpr(pstate, (BoolExpr *) expr_orig); + + foreach(lc, expr_orig->args) + { + Node *arg = lfirst(lc); + Node *orqual; + Node *const_expr; + Node *nconst_expr; + ListCell *lc_groups; + OrClauseGroupEntry *gentry; + boolconst_is_left = true; + + /* At first, transform the arg and evaluate constant expressions. */ + orqual = transformExprRecurse(pstate, (Node *) arg); + orqual = coerce_to_boolean(pstate, orqual, "OR"); + orqual = eval_const_expressions(NULL, orqual); + + if (!IsA(orqual, OpExpr)) + { + or_list = lappend(or_list, orqual); + continue; + } + + /* + * Detect the constant side of the clause. Recall non-constant + * expression can be made not only with Vars, but also with Params, + * which is not bonded with any relation. Thus, we detect the const + * side - if another side is constant too, the orqual couldn't be + * an OpExpr. + * Get pointers to constant and expression sides of the qual. + */ + if (IsA(get_leftop(orqual), Const)) + { + nconst_expr = get_rightop(orqual); + const_expr = get_leftop(orqual); + } + else if (IsA(get_rightop(orqual), Const)) + { + const_expr = get_rightop(orqual); + nconst_expr = get
Re: Problems with estimating OR conditions, IS NULL on LEFT JOINs
g Time: 0.406 ms Execution Time: 130.109 ms (14 rows) If you disable merge join with the original code, then the plans coincide, as well as the error value approximately, only in one case cardinality overestimation occurs in the other vice versa. QUERY PLAN - Hash Left Join (cost=1347.00..3915.00 rows=75082 width=16) (actual time=43.625..68.901 rows=5 loops=1) Hash Cond: (large.id = small.id) Filter: ((large.a = 1) OR (small.b = 1)) Rows Removed by Filter: 5 -> Seq Scan on large (cost=0.00..1443.00 rows=10 width=8) (actual time=0.008..9.895 rows=10 loops=1) -> Hash (cost=722.00..722.00 rows=5 width=8) (actual time=22.546..22.548 rows=5 loops=1) Buckets: 65536 Batches: 1 Memory Usage: 2466kB -> Seq Scan on small (cost=0.00..722.00 rows=5 width=8) (actual time=0.006..7.218 rows=5 loops=1) Planning Time: 0.239 ms Execution Time: 70.860 ms (10 rows) -- Regards, Alena Rybakina Postgres Professional From 47daf6945403425f3a09a2df450a2cd2e96ae23c Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Mon, 10 Jul 2023 17:29:41 +0300 Subject: [PATCH] Fix the case of calculating the underestimated cardinality for LEFT/RIGHT/FULL OUTER JOIN, where zero elements were not taken into account as a result of the connection operation. This patch fixes this by calculating the fraction of zero values on the one side of the join without of matching row on the other side. Co-authored-by: Alena Rybakina --- src/backend/utils/adt/array_selfuncs.c | 2 +- src/backend/utils/adt/selfuncs.c | 314 ++--- src/include/nodes/pathnodes.h | 3 + src/include/utils/selfuncs.h | 2 +- 4 files changed, 178 insertions(+), 143 deletions(-) diff --git a/src/backend/utils/adt/array_selfuncs.c b/src/backend/utils/adt/array_selfuncs.c index 9207a5ed193..56ce460cf05 100644 --- a/src/backend/utils/adt/array_selfuncs.c +++ b/src/backend/utils/adt/array_selfuncs.c @@ -93,7 +93,7 @@ scalararraysel_containment(PlannerInfo *root, /* * rightop must be a variable, else punt. */ - examine_variable(root, rightop, varRelid, ); + examine_variable(root, rightop, varRelid, NULL, ); if (!vardata.rel) { ReleaseVariableStats(vardata); diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index c4fcd0076ea..072579dbfd5 100644 --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -153,7 +153,7 @@ static double eqjoinsel_inner(Oid opfuncoid, Oid collation, bool isdefault1, bool isdefault2, AttStatsSlot *sslot1, AttStatsSlot *sslot2, Form_pg_statistic stats1, Form_pg_statistic stats2, - bool have_mcvs1, bool have_mcvs2); + bool have_mcvs1, bool have_mcvs2, double *unmatched_frac); static double eqjoinsel_semi(Oid opfuncoid, Oid collation, VariableStatData *vardata1, VariableStatData *vardata2, double nd1, double nd2, @@ -1513,7 +1513,7 @@ boolvarsel(PlannerInfo *root, Node *arg, int varRelid) VariableStatData vardata; double selec; - examine_variable(root, arg, varRelid, ); + examine_variable(root, arg, varRelid, NULL, ); if (HeapTupleIsValid(vardata.statsTuple)) { /* @@ -1541,110 +1541,20 @@ booltestsel(PlannerInfo *root, BoolTestType booltesttype, Node *arg, { VariableStatData vardata; double selec; + double freq_null; + AttStatsSlot sslot; - examine_variable(root, arg, varRelid, ); + examine_variable(root, arg, varRelid, sjinfo, ); - if (HeapTupleIsValid(vardata.statsTuple)) + if (sjinfo) + { + freq_null = sjinfo->unmatched_frac; + } + else if(HeapTupleIsValid(vardata.statsTuple)) { Form_pg_statistic stats; - double freq_null; - AttStatsSlot sslot; - stats = (Form_pg_statistic) GETSTRUCT(vardata.statsTuple); freq_null = stats->stanullfrac; - - if (get_attstatsslot(, vardata.statsTuple, - STATISTIC_KIND_MCV, InvalidOid, - ATTSTATSSLOT_VALUES | ATTSTATSSLOT_NUMBERS) - && sslot.nnumbers > 0) - { - double freq_true; - double freq_false; - - /* - * Get first MCV frequency and derive frequency for true. - */ - if (DatumGetBool(sslot.values[0])) -freq_true = sslot.numbers[0]; - else -freq_true = 1.0 - sslot.numbers[0] - freq_null; - - /* - * Next derive frequency for false. Then use these as appropriate - * to derive frequency for each case. - */ - freq_false = 1.0 - freq_true - freq_null; - - switch (booltesttype) - { -case IS_UNKNOWN: - /* select only NULL values */ - selec = freq_null; - break; -case IS_NOT_UNKNOWN: - /* select non-NULL values */ - selec = 1.0 - freq_null; - break; -case IS_TRUE: - /* select only TRUE values */ - selec = freq_true; - break; -case IS_NOT_TRUE: - /* select non-TRUE value
Re: POC, WIP: OR-clause support for indexes
I agreed with the changes. Thank you for your work. I updated patch and added you to the authors. I specified Ranier Vilela as a reviewer. On 10.07.2023 06:12, Andrey Lepikhov wrote: On 7/7/2023 15:20, Alena Rybakina wrote: because we will provide similar manipulation in this: foreach(l, gentry->consts) { Node *rexpr = (Node *) lfirst(l); rexpr = coerce_to_common_type(pstate, rexpr, scalar_type, "IN"); aexprs = lappend(aexprs, rexpr); } I'm not sure that it should be replaced. In attachment - a bit more corrections to the patch. The most important change - or_list contains already transformed expression subtree. So, I think we don't need to free it at all. -- Regards, Alena Rybakina Postgres Professional From 97239cdeac541d3aaeafde3d15a5b6fdc36f86de Mon Sep 17 00:00:00 2001 From: Alena Rybakina Date: Thu, 29 Jun 2023 17:46:58 +0300 Subject: [PATCH] Replace OR clause to ANY expressions. Replace (X=N1) OR (X=N2) ... with X = ANY(N1, N2) on the stage of the optimiser when we are still working with a tree expression. Firstly, we do not try to make a transformation for "non-or" expressions or inequalities and the creation of a relation with "or" expressions occurs according to the same scenario. Secondly, we do not make transformations if there are less than 500 or expressions. Thirdly, it is worth considering that we consider "or" expressions only at the current level. Authors: Alena Rybakina , Andrey Lepikhov Reviewed-by: Ranier Vilela --- src/backend/parser/parse_expr.c | 231 ++- src/tools/pgindent/typedefs.list | 1 + 2 files changed, 231 insertions(+), 1 deletion(-) diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index 346fd272b6d..0ddcb880ef8 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -95,6 +95,235 @@ static Expr *make_distinct_op(ParseState *pstate, List *opname, static Node *make_nulltest_from_distinct(ParseState *pstate, A_Expr *distincta, Node *arg); +typedef struct OrClauseGroupEntry +{ + Node *node; + List *consts; + Oidscalar_type; + Oidopno; + Expr *expr; +} OrClauseGroupEntry; + +static int const_transform_or_limit = 500; + +static Node * +transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig) +{ + List *or_list = NIL; + List *groups_list = NIL; + ListCell *lc; + + /* If this is not an 'OR' expression, skip the transformation */ + if (expr_orig->boolop != OR_EXPR || + list_length(expr_orig->args) < const_transform_or_limit) + return transformBoolExpr(pstate, (BoolExpr *) expr_orig); + + foreach(lc, expr_orig->args) + { + Node *arg = lfirst(lc); + Node *orqual; + Node *const_expr; + Node *nconst_expr; + ListCell *lc_groups; + OrClauseGroupEntry *gentry; + boolconst_is_left = true; + + /* At first, transform the arg and evaluate constant expressions. */ + orqual = transformExprRecurse(pstate, (Node *) arg); + orqual = coerce_to_boolean(pstate, orqual, "OR"); + orqual = eval_const_expressions(NULL, orqual); + + if (!IsA(orqual, OpExpr)) + { + or_list = lappend(or_list, orqual); + continue; + } + + /* + * Detect the constant side of the clause. Recall non-constant + * expression can be made not only with Vars, but also with Params, + * which is not bonded with any relation. Thus, we detect the const + * side - if another side is constant too, the orqual couldn't be + * an OpExpr. + */ + if (IsA(get_leftop(orqual), Const)) + const_is_left = true; + else if (IsA(get_rightop(orqual), Const)) + const_is_left = false; + else + { + or_list = lappend(or_list, orqual); + continue; + } + + /* Get pointers to constant and expression sides of the qual */ + nconst_expr = (const_is_left) ? get_rightop(orqual) : + get_leftop(orqual); + const_expr = (const_is_left) ? get_leftop(orqual) : + get_rightop(orqual); + + if (!op_mergejoinable(((OpExpr *) orqual)->opno, exprType(nconst_expr))) + { + or_list = lappend(or_list, orqual); + continue; + } + + /* + * At this point we definitely have a transformable clause. + * Classify it and add into specific group of clauses, or create new + * group. + * TODO: to manage complexity in the case of many different clauses + * (X1=C1) OR (X2=C2 OR) ... (XN = CN) we could invent something + * like a hash table. But also we believe, that the case of many + * different variable sides is very rare. + */ + foreach(lc_groups, groups_list) + { + OrClauseGroupEntry *v = (OrClauseGroupEntry *) lfirst(lc_groups); + + Assert(v->node != NULL); + + if (equal(v->node, nconst_expr)) + { +v->consts = lappend(v->consts, const_expr); +nconst_expr = NULL; +break; + } + } + + if
Re: Problems with estimating OR conditions, IS NULL on LEFT JOINs
Well, one option would be to modify all selectivity functions to do something like the patch does for nulltestsel(). That seems a bit cumbersome because why should those places care about maybe running on the outer side of a join, or what? For code in extensions this would be particularly problematic, I think. Agree. I would say that we can try it if nothing else works out. So what I was thinking about doing this in a way that'd make this automatic, without having to modify the selectivity functions. Option (3) is very simple - examine_variable would simply adjust the statistics by tweaking the null_frac field, when looking at variables on the outer side of the join. But it has issues when estimating multiple conditions. Imagine t1 has 1M rows, and we want to estimate SELECT * FROM t1 LEFT JOIN t2 ON (t1.id = t2.id) WHERE ((t2.a=1) AND (t2.b=1)) but only 50% of the t1 rows has a match in t2. Assume each of the t2 conditions matches 100% rows in the table. With the correction, this means 50% selectivity for each condition. And if we combine them the usual way, it's 0.5 * 0.5 = 0.25. But we know all the rows in the "matching" part match the condition, so the correct selectivity should be 0.5. In a way, this is just another case of estimation issues due to the assumption of independence. FWIW, I used "AND" in the example for simplicity, but that'd probably be pushed to the baserel level. There'd need to be OR to keep it at the join level, but the overall issue is the same, I think. Also, this entirely ignores extended statistics - I have no idea how we might tweak those in (3). I understood the idea - it is very similar to what is implemented in the current patch. But I don't understand how to do it in the examine_variable function, to be honest. But (4) was suggesting we could improve this essentially by treating the join as two distinct sets of rows - the inner join result - rows without match on the outer side For the inner part, we would do estimates as now (using the regular per-column statistics). If we knew the conditions match 100% rows, we'd still get 100% when the conditions are combined. For the second part of the join we know the outer side is just NULLs in all columns, and that'd make the estimation much simpler for most clauses. We'd just need to have "fake" statistics with null_frac=1.0 and that's it. And then we'd just combine these two selectivities. If we know the inner side is 50% and all rows match the conditions, and no rows in the other 50% match, the selectivity is 50%. inner_part * inner_sel + outer_part * outer_sel = 0.5 * 1.0 + 0.0 = 0.5 Now, we still have issues with independence assumption in each of these parts separately. But that's OK, I think. I think (4) could be implemented by doing the current estimation for the inner part, and by tweaking examine_variable in the "outer" part in a way similar to (3). Except that it just sets null_frac=1.0 everywhere. For (4) we don't need to tweak those at all, because for inner part we can just apply them as is, and for outer part it's irrelevant because everything is NULL. I like this idea the most) I'll try to start with this and implement the patch. I hope this makes more sense. If not, let me know and I'll try to explain it better. Thank you for your explanation) I will unsubscribe soon based on the results or if I have any questions. -- Regards, Alena Rybakina Postgres Professional
Re: POC, WIP: OR-clause support for indexes
Hi! Thank you for your detailed review, your changes have greatly helped to improve this patch. On 06.07.2023 13:20, Andrey Lepikhov wrote: On 6/7/2023 03:06, Alena Rybakina wrote: I corrected this constant in the patch. The patch don't apply cleanly: it contains some trailing spaces. I fixed it. Also, quick glance into the code shows some weak points; 1. transformBoolExprOr should have input type BoolExpr. Agreed. 2. You can avoid the switch operator at the beginning of the function, because you only need one option. Agreed. 3. Stale comments: RestrictIinfos definitely not exists at this point. Yes, unfortunately, I missed this from the previous version when I tried to perform such a transformation at the index creation stage. 4. I don't know, you really need to copy the expr or not, but it is better to do as late, as possible. Yes, I agree with you, copying "expr" is not necessary in this patch 5. You assume, that leftop is non-constant and rightop - constant. Why? Agreed, It was too presumptuous on my part and I agree with your changes. 6.I doubt about equivalence operator. Someone can invent a custom '=' operator with another semantics, than usual. May be better to check mergejoinability? Yes, I agree with you, and I haven't thought about it before. But I haven't found any functions to arrange this in PostgreSQL, but using mergejoinability turns out to be more beautiful here. 7. I don't know how to confidently identify constant expressions at this level. So, I guess, You can only merge here expressions like "F(X)=Const", not an 'F(X)=ConstExpression'. I see, you can find solution for this case, thank you for this, and I think it's reliable enough. On 07.07.2023 05:43, Andrey Lepikhov wrote: On 6/7/2023 03:06, Alena Rybakina wrote: The test was performed on the same benchmark database generated by 2 billion values. I corrected this constant in the patch. In attempt to resolve some issues had mentioned in my previous letter I used op_mergejoinable to detect mergejoinability of a clause. Constant side of the expression is detected by call of eval_const_expressions() and check each side on the Const type of node. See 'diff to diff' in attachment. I notices you remove condition for checking equal operation. strcmp(strVal(linitial((arg)->name)), "=") == 0 Firstly, it is noticed me not correct, but a simple example convinced me otherwise: postgres=# explain analyze select x from a where x=1 or x>5 or x<3 or x=2; QUERY PLAN Seq Scan on a (cost=0.00..2291.00 rows=97899 width=4) (actual time=0.038..104.168 rows=99000 loops=1) Filter: ((x > '5'::numeric) OR (x < '3'::numeric) OR (x = ANY ('{1,2}'::numeric[]))) Rows Removed by Filter: 1000 Planning Time: 9.938 ms Execution Time: 113.457 ms (5 rows) It surprises me that such check I can write such similar way: eval_const_expressions(NULL, orqual). Yes, I see we can remove this code: bare_orarg = transformExprRecurse(pstate, (Node *)arg); bare_orarg = coerce_to_boolean(pstate, bare_orarg, "OR"); because we will provide similar manipulation in this: foreach(l, gentry->consts) { Node *rexpr = (Node *) lfirst(l); rexpr = coerce_to_common_type(pstate, rexpr, scalar_type, "IN"); aexprs = lappend(aexprs, rexpr); } -- Regards, Alena Rybakina Postgres Professional From 9134099ef9808ca27494bc131558d04b24d9bdeb Mon Sep 17 00:00:00 2001 From: Alena Rybakina Date: Thu, 29 Jun 2023 17:46:58 +0300 Subject: [PATCH] Replace OR clause to ANY expressions. Replace (X=N1) OR (X=N2) ... with X = ANY(N1, N2) on the stage of the optimiser when we are still working with a tree expression. Firstly, we do not try to make a transformation for "non-or" expressions or inequalities and the creation of a relation with "or" expressions occurs according to the same scenario. Secondly, we do not make transformations if there are less than 500 or expressions. Thirdly, it is worth considering that we consider "or" expressions only at the current level. --- src/backend/parser/parse_expr.c | 269 ++- src/tools/pgindent/typedefs.list | 1 + 2 files changed, 269 insertions(+), 1 deletion(-) diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index 346fd272b6d..961ca3e482c 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -95,6 +95,273 @@ static Expr *make_distinct_op(ParseState *pstate, List *opname, static Node *make_nulltest_from_distinct(ParseState *pstate, A_Expr *distincta, Node *arg); +typedef struct OrClauseGroupEntry +{ + Node *node; + List *co
Re: Problems with estimating OR conditions, IS NULL on LEFT JOINs
Hi, all! On 26.06.2023 12:22, Andrey Lepikhov wrote: On 24/6/2023 17:23, Tomas Vondra wrote: I really hope what I just wrote makes at least a little bit of sense. Throw in one more example: SELECT i AS id INTO l FROM generate_series(1,10) i; CREATE TABLE r (id int8, v text); INSERT INTO r (id, v) VALUES (1, 't'), (-1, 'f'); ANALYZE l,r; EXPLAIN ANALYZE SELECT * FROM l LEFT OUTER JOIN r ON (r.id = l.id) WHERE r.v IS NULL; Here you can see the same kind of underestimation: Hash Left Join (... rows=500 width=14) (... rows=9 ...) So the eqjoinsel_unmatch_left() function should be modified for the case where nd1 Unfortunately, this patch could not fix the cardinality calculation in this request, I'll try to look and figure out what is missing here. I tried to fix the cardinality score in the query above by changing: diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index 8e18aa1dd2b..40901836146 100644 --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -2604,11 +2604,16 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation, * if we're calculating fraction of NULLs or fraction of unmatched rows. */ // unmatchfreq = (1.0 - nullfrac1) * (1.0 - nullfrac2); - if (nd1 > nd2) + if (nd1 != nd2) { - selec /= nd1; - *unmatched_frac = (nd1 - nd2) * 1.0 / nd1; + selec /= Max(nd1, nd2); + *unmatched_frac = abs(nd1 - nd2) * 1.0 / Max(nd1, nd2); } + /*if (nd1 > nd2) + { + selec /= nd1; + *unmatched_frac = nd1 - nd2 * 1.0 / nd1; + }*/ else { selec /= nd2; and it worked: SELECT i AS id INTO l FROM generate_series(1,10) i; CREATE TABLE r (id int8, v text); INSERT INTO r (id, v) VALUES (1, 't'), (-1, 'f'); ANALYZE l,r; EXPLAIN ANALYZE SELECT * FROM l LEFT OUTER JOIN r ON (r.id = l.id) WHERE r.v IS NULL; ERROR: relation "l" already exists ERROR: relation "r" already exists INSERT 0 2 ANALYZE QUERY PLAN --- Hash Left Join (cost=1.09..1944.13 rows=8 width=14) (actual time=0.152..84.184 rows=9 loops=1) Hash Cond: (l.id = r.id) Filter: (r.v IS NULL) Rows Removed by Filter: 2 -> Seq Scan on l (cost=0.00..1443.00 rows=10 width=4) (actual time=0.040..27.635 rows=10 loops=1) -> Hash (cost=1.04..1.04 rows=4 width=10) (actual time=0.020..0.022 rows=4 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on r (cost=0.00..1.04 rows=4 width=10) (actual time=0.009..0.011 rows=4 loops=1) Planning Time: 0.954 ms Execution Time: 92.309 ms (10 rows) It looks too simple and I suspect that I might have missed something somewhere, but so far I haven't found any examples of queries where it doesn't work. I didn't see it breaking anything in the examples from my previous letter [1]. 1. https://www.postgresql.org/message-id/7af1464e-2e24-cfb1-b6d4-1544757f8cfa%40yandex.ru Unfortunately, I can't understand your idea from point 4, please explain it? The good thing is this helps even for IS NULL checks on non-join-key columns (where we don't switch to an antijoin), but there's a couple things that I dislike ... 1) It's not restricted to outer joins or anything like that (this is mostly just my laziness / interest in one particular query, but also something the outer-join-aware patch might help with). 2) We probably don't want to pass this kind of information through sjinfo. That was the simplest thing for an experimental patch, but I suspect it's not the only piece of information we may need to pass to the lower levels of estimation code. 3) I kinda doubt we actually want to move this responsibility (to consider fraction of unmatched rows) to the low-level estimation routines (e.g. nulltestsel and various others). AFAICS this just "introduces NULLs" into the relation, so maybe we could "adjust" the attribute statistics (in examine_variable?) by inflating null_frac and modifying the other frequencies in MCV/histogram. 4) But I'm not sure we actually want to do that in these low-level selectivity functions. The outer join essentially produces output with two subsets - one with matches on the outer side, one without them. But the side without matches has NULLs in all columns. In a way, we know exactly how are these columns correlated - if we do the usual estimation (even with the null_frac adjusted), we just throw this information away. And when there's a lot of rows without a match, that seems bad.
Re: POC, WIP: OR-clause support for indexes
On 29.06.2023 14:23, Ranier Vilela wrote: Em qui., 29 de jun. de 2023 às 06:56, Alena Rybakina escreveu: I apologize for breaks the original thread. In my defense, I can say that I'm new to all this and I'm just learning. I will try to make as few mistakes as possible. By no means, your work is excellent and deserves all compliments. Thank you, I will try to work in the same spirit, especially since there is still quite a lot of work left). Thank you for your feedback. -- Regards, Alena Rybakina Postgres Professional
Re: POC, WIP: OR-clause support for indexes
Hi! At the moment, I'm not sure that the constant is the right number for applying transformations, so I'm in search of it, to be honest. I will post my observations on this issue later. If you don't mind, I'll leave the constant equal to 15 for now. It's hard to predict. Perhaps accounting for time on each benchmark could help decide. I will try to test on JOB [1] (because queries are difficult for the optimizer due to the significant number of joins and correlations contained in the dataset) and tpcds [2] (the benchmark I noticed contains a sufficient number of queries with "or" expressions). Sorry, I don't understand well enough what is meant by points "Eliminate unnecessary variables" and "Eliminate unnecessary expressions". Can you explain in more detail? One example is array_type. As you can see in v2 and v3 it no longer exists. I get it. Honestly, I was guided by the example of converting "IN" to "ANY" (transformAExprIn), at least the part of the code when we specifically convert the expression to ScalarArrayOpExpr. Both there and here, we first look for a common type for the collected constants, and if there is one, then we try to find the type for the array structure. Only I think in my current patch it is also worth returning to the original version in this place, since if it is not found, the ScalarArrayOpExpr generation function will be processed incorrectly and the request may not be executed at all, referring to the error that it is impossible to determine the type of node (ERROR: unrecognized node type. ) At the same time we are trying to do this transformation for each group. The group here implies that these are combined "or" expressions on the common left side, and at the same time we consider only expressions that contain a constant and only equality. What else should be taken into account is that we are trying to do this processing before forming a BoolExpr expression (if you notice, then after any outcome we call the makeBoolExpr function, which just forms the "Or" expression, as in the original version, regardless of what type of expressions it combines. As I said earlier, these are just suggestions. But thinking about it now, I think they can be classified as bad early optimizations. Thank you again for your interest in this problem and help. Yes, I think so too) 1. https://github.com/gregrahn/join-order-benchmark 2. https://github.com/Alena0704/s64da-benchmark-toolkit/tree/master/benchmarks/tpcds -- Regards, Alena Rybakina Postgres Professional From f9f5958707bc1d7931323df05d51a60fc9d6cd38 Mon Sep 17 00:00:00 2001 From: Alena Rybakina Date: Thu, 29 Jun 2023 17:46:58 +0300 Subject: [PATCH] Replace OR clause to ANY expressions. Replace (X=N1) OR (X=N2) ... with X = ANY(N1, N2) on the stage of the optimiser when we are still working with a tree expression. Firstly, we do not try to make a transformation for "non-or" expressions or inequalities and the creation of a relation with "or" expressions occurs according to the same scenario. Secondly, we do not make transformations if there are less than 15 or expressions. Thirdly, it is worth considering that we consider "or" expressions only at the current level. --- src/backend/parser/parse_expr.c | 290 ++- src/tools/pgindent/typedefs.list | 1 + 2 files changed, 290 insertions(+), 1 deletion(-) diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index 346fd272b6d..3d395fd6459 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -95,6 +95,294 @@ static Expr *make_distinct_op(ParseState *pstate, List *opname, static Node *make_nulltest_from_distinct(ParseState *pstate, A_Expr *distincta, Node *arg); +typedef struct OrClauseGroupEntry +{ + Node *node; + List *consts; + Oidscalar_type; + Oidopno; + Expr *expr; +} OrClauseGroupEntry; + +static int const_transform_or_limit = 15; + +static Node * +transformBoolExprOr(ParseState *pstate, Expr *expr_orig) +{ + List *or_list = NIL; + List *groups_list = NIL; + ListCell *lc_eargs; + Node *result; + BoolExpr *expr = (BoolExpr *)copyObject(expr_orig); + const char *opname; + bool change_apply = false; + bool or_statement = false; + + Assert(IsA(expr, BoolExpr)); + + /* If this is not expression "Or", then will do it the old way. */ + switch (expr->boolop) + { + case AND_EXPR: + opname = "AND"; + break; + case OR_EXPR: + opname = "OR"; + or_statement = true; + break; + case NOT_EXPR: + opname = "NOT"; + break; + default: + elog(ERROR, "unrecognized boolop: %d", (int) expr->boolop); + opname = NULL; /* keep compiler quiet */ + break; + } + + if (!or_statement || list_length(expr-&
Re: POC, WIP: OR-clause support for indexes
I apologize for breaks the original thread. In my defense, I can say that I'm new to all this and I'm just learning. I will try to make as few mistakes as possible. I try to fix it by forwarding this message to you, besides it might be interesting to you too. This message to you, because it might be interesting to you too. I'm sorry if I didn't state my goals clearly at first, but it seemed to me that initially the problem I encountered was very similar to what is described in this thread, only I suggested a slightly different way to solve it. I have described the problem more or less clearly here [1] and the worst case, as it seems to me, too, but if this is not the case, let me know. 1. https://www.mail-archive.com/pgsql-hackers@lists.postgresql.org/msg146230.html On 29.06.2023 12:32, Alena Rybakina wrote: Hi! I'm sorry I didn't answer you right away, I was too busy with work. On 27.06.2023 22:50, Peter Geoghegan wrote: On Tue, Jun 27, 2023 at 6:19 AM Alena Rybakina wrote: I learned something new from your letter, thank you very much for that! Cool. The MDAM paper is also worth a read: https://vldb.org/conf/1995/P710.PDF Some of the techniques it describes are already in Postgres. With varying degrees of maturity. The paper actually mentions OR optimization at one point, under "Duplicate Elimination". The general idea is that ScalarArrayOpExpr execution can "eliminate duplicates before the data is read". The important underlying principle is that it can be really useful to give the B-Tree code the context it requires to be clever about stuff like that. We can do this by (say) using one ScalarArrayOpExpr, rather than using two or more index scans that the B-Tree code will treat as independent things. So a lot of the value in your patch comes from the way that it can enable other optimizations (the immediate benefits are also nice). In the past, OR optimizations have been prototyped that were later withdrawn/rejected because the duplicate elimination aspect was...too scary [1]. It's very easy to see that ScalarArrayOpExpr index scans don't really have the same problem. "Giving the B-Tree code the required context" helps here too. Thank you for the explanation and the material provided) unfortunately, I am still only studying the article and at the moment I cannot write more. To be honest, I didn't think about the fact that my optimization can help eliminate duplicates before reading the data before. I am still only in the process of familiarizing myself with the thread [1] (reference from your letter), but I have already seen that there are problems related, for example, to when "or" expressions refer to the parent element. I think, I would face the similar problems if I complicate the current code, for example, so that not only or expressions standing on the same level are written in any, but also on different ones without violating the logic of the priority of executing operators. For example, this query works now: postgres=# EXPLAIN (analyze, COSTS OFF) SELECT oid,relname FROM pg_class WHERE (oid = 13779 OR oid = 2) OR (oid = 4 OR oid = 5) OR relname = 'pg_extension' ; QUERY PLAN -- Seq Scan on pg_class (actual time=0.086..0.140 rows=1 loops=1) Filter: ((oid = ANY ('{4,5}'::oid[])) OR (oid = ANY ('{13779,2}'::oid[])) OR (relname = 'pg_extension'::name)) Rows Removed by Filter: 412 Planning Time: 2.135 ms Execution Time: 0.160 ms (5 rows) But I would like it works such as: QUERY PLAN -- Seq Scan on pg_class (actual time=0.279..0.496 rows=1 loops=1) Filter: ((oid = ANY ('{13779,2,4,5}'::oid[])) OR (relname = 'pg_extension'::name)) Rows Removed by Filter: 412 Planning Time: 0.266 ms Execution Time: 0.536 ms (5 rows) I analyzed the buffer consumption when I ran control regression tests using my patch. diff shows me that there is no difference between the number of buffer block scans without and using my patch, as far as I have seen. (regression.diffs) To be clear, I wasn't expecting that there'd be any regressions from your patch. Intuitively, it seems like this optimization should make the query plan do almost the same thing at execution time -- just slightly more efficiently on average, and much more efficiently in some individual cases. It would probably be very hard for the optimizer to model/predict how much work it can save by using a ScalarArrayOpExpr instead of an "equivalent" set of bitmap index scans, OR'd together. But it doesn't necessarily matter -- the only truly critical detail is understanding the worst case for the transformation optimization. Yes, I agree with you and I
Re: POC, WIP: OR-clause support for indexes
Hi! I'm sorry I didn't answer you right away, I was too busy with work. On 27.06.2023 22:50, Peter Geoghegan wrote: On Tue, Jun 27, 2023 at 6:19 AM Alena Rybakina wrote: I learned something new from your letter, thank you very much for that! Cool. The MDAM paper is also worth a read: https://vldb.org/conf/1995/P710.PDF Some of the techniques it describes are already in Postgres. With varying degrees of maturity. The paper actually mentions OR optimization at one point, under "Duplicate Elimination". The general idea is that ScalarArrayOpExpr execution can "eliminate duplicates before the data is read". The important underlying principle is that it can be really useful to give the B-Tree code the context it requires to be clever about stuff like that. We can do this by (say) using one ScalarArrayOpExpr, rather than using two or more index scans that the B-Tree code will treat as independent things. So a lot of the value in your patch comes from the way that it can enable other optimizations (the immediate benefits are also nice). In the past, OR optimizations have been prototyped that were later withdrawn/rejected because the duplicate elimination aspect was...too scary [1]. It's very easy to see that ScalarArrayOpExpr index scans don't really have the same problem. "Giving the B-Tree code the required context" helps here too. Thank you for the explanation and the material provided) unfortunately, I am still only studying the article and at the moment I cannot write more. To be honest, I didn't think about the fact that my optimization can help eliminate duplicates before reading the data before. I am still only in the process of familiarizing myself with the thread [1] (reference from your letter), but I have already seen that there are problems related, for example, to when "or" expressions refer to the parent element. I think, I would face the similar problems if I complicate the current code, for example, so that not only or expressions standing on the same level are written in any, but also on different ones without violating the logic of the priority of executing operators. For example, this query works now: postgres=# EXPLAIN (analyze, COSTS OFF) SELECT oid,relname FROM pg_class WHERE (oid = 13779 OR oid = 2) OR (oid = 4 OR oid = 5) OR relname = 'pg_extension' ; QUERY PLAN -- Seq Scan on pg_class (actual time=0.086..0.140 rows=1 loops=1) Filter: ((oid = ANY ('{4,5}'::oid[])) OR (oid = ANY ('{13779,2}'::oid[])) OR (relname = 'pg_extension'::name)) Rows Removed by Filter: 412 Planning Time: 2.135 ms Execution Time: 0.160 ms (5 rows) But I would like it works such as: QUERY PLAN -- Seq Scan on pg_class (actual time=0.279..0.496 rows=1 loops=1) Filter: ((oid = ANY ('{13779,2,4,5}'::oid[])) OR (relname = 'pg_extension'::name)) Rows Removed by Filter: 412 Planning Time: 0.266 ms Execution Time: 0.536 ms (5 rows) I analyzed the buffer consumption when I ran control regression tests using my patch. diff shows me that there is no difference between the number of buffer block scans without and using my patch, as far as I have seen. (regression.diffs) To be clear, I wasn't expecting that there'd be any regressions from your patch. Intuitively, it seems like this optimization should make the query plan do almost the same thing at execution time -- just slightly more efficiently on average, and much more efficiently in some individual cases. It would probably be very hard for the optimizer to model/predict how much work it can save by using a ScalarArrayOpExpr instead of an "equivalent" set of bitmap index scans, OR'd together. But it doesn't necessarily matter -- the only truly critical detail is understanding the worst case for the transformation optimization. Yes, I agree with you and I have yet to analyze this. It cannot be too bad (maybe it's ~zero added runtime overhead relative to not doing the transformation, even?). I haven't seen a major performance degradation so far, but to be honest, I have not conducted a detailed analysis on other types of queries other than x=1 or x=2 or x=1 or y=2, etc. As soon as something is known, I will provide the data, it is very interesting to me. At the same time, nbtree can be clever about ScalarArrayOpExpr execution at runtime (once that's implemented), without ever needing to make any kind of up-front commitment to navigating through the index in any particular way. It's all dynamic, and can be driven by the actual observed characteristics of the index structure. In other words, we don't really need to gamble (in the planner, or at execution time). We're just ke
Re: POC, WIP: OR-clause support for indexes
k1 b on a.unique2 = 3 or a.unique2 = 7 or a.unique1 = 1; QUERY PLAN -- Nested Loop (cost=0.00..22247.02 rows=135 width=32) (actual time=0.094..373.627 rows=135 loops=1) -> Seq Scan on tenk1 b (cost=0.00..2311.00 rows=15 width=16) (actual time=0.051..14.667 rows=15 loops=1) -> Materialize (cost=0.00..3061.05 rows=9 width=16) (actual time=0.000..0.001 rows=9 loops=15) -> Seq Scan on tenk1 a (cost=0.00..3061.00 rows=9 width=16) (actual time=0.026..42.389 rows=9 loops=1) Filter: ((unique2 = ANY ('{3,7}'::integer[])) OR (unique1 = 1)) Rows Removed by Filter: 149991 Planning Time: 0.414 ms Execution Time: 409.154 ms (8 rows) I compiled my original patch and there were no problems with regression tests. The only time there was a problem when I set the const_transform_or_limit variable to 0 (I have 15), as you have in the patch. To be honest, diff appears there because you had a different plan, specifically the expressions "or" are replaced by ANY (see regression.diffs). Unfortunately, your patch version did not apply immediately, I did not understand the reasons, I applied it manually. At the moment, I'm not sure that the constant is the right number for applying transformations, so I'm in search of it, to be honest. I will post my observations on this issue later. If you don't mind, I'll leave the constant equal to 15 for now. Sorry, I don't understand well enough what is meant by points "Eliminate unnecessary variables" and "Eliminate unnecessary expressions". Can you explain in more detail? Regarding the patch, there was a Warning at the compilation stage. In file included from ../../../src/include/nodes/bitmapset.h:21, from ../../../src/include/nodes/parsenodes.h:26, from ../../../src/include/catalog/objectaddress.h:17, from ../../../src/include/catalog/pg_aggregate.h:24, from parse_expr.c:18: parse_expr.c: In function ‘transformBoolExprOr’: ../../../src/include/nodes/nodes.h:133:66: warning: ‘expr’ is used uninitialized [-Wuninitialized] 133 | #define nodeTag(nodeptr) (((const Node*)(nodeptr))->type) | ^~ parse_expr.c:116:29: note: ‘expr’ was declared here 116 | BoolExpr *expr; | ^~~~ I couldn't figure out how to fix it and went back to my original version. To be honest, I don't think anything needs to be changed here. Unfortunately, I didn't understand the reasons why, with the available or expressions, you don't even try to convert to ANY by calling transformBoolExpr, as I saw. I went back to my version. I think it's worth checking whether the or_statement variable is positive. I think it's worth leaving the use of the or_statement variable in its original form. switch (expr->boolop) { case AND_EXPR: opname = "AND"; break; case OR_EXPR: opname = "OR"; or_statement = true; break; case NOT_EXPR: opname = "NOT"; break; default: elog(ERROR, "unrecognized boolop: %d", (int) expr->boolop); opname = NULL; /* keep compiler quiet */ break; } if (!or_statement || list_length(expr->args) < const_transform_or_limit) return transformBoolExpr(pstate, (BoolExpr *)expr_orig); The current version of the patch also works and all tests pass. -- Regards, Alena Rybakina Postgres Professional From 06ae4f0887358eb38fe1b32287580a6996c8c9f6 Mon Sep 17 00:00:00 2001 From: Alena Rybakina Date: Thu, 29 Jun 2023 07:09:16 +0300 Subject: [PATCH] Replace clause (X=N1) OR (X=N2) ... with X = ANY(N1, N2) on the stage of the optimiser when we are still working with a tree expression. Firstly, we do not try to make a transformation for "non-or" expressions or inequalities and the creation of a relation with "or" expressions occurs according to the same scenario; secondly, we do not make transformations if there are less than 15 or expressions (here you can put another number, but during testing, already starting with 3 expressions, the execution time and planning time with transformed or were faster); thirdly, it is worth considering that we consider "or" expressions only at the current level. The transformation takes place according to the following scheme: first we define the groups on the left side, collect the constants in a list for each group, then considering each group we make the collected constants to one type, find a common type for array and using the make_scalar_array_op function we form ScalarArrayOpExpr, if possible. If it is not p
Re: MergeJoin beats HashJoin in the case of multiple hash clauses
t; Seq Scan on b (cost=0.00..1637.00 rows=10 width=20) (actual time=0.027..17.980 rows=10 loops=1) -> Hash (cost=10.00..10.00 rows=600 width=12) (actual time=2.046..2.047 rows=600 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 34kB -> Seq Scan on a (cost=0.00..10.00 rows=600 width=12) (actual time=0.022..0.315 rows=600 loops=1) Planning Time: 0.631 ms Execution Time: 95.730 ms (8 rows) SET QUERY PLAN -- Hash Join (cost=3387.00..8621.58 rows=8 width=32) (actual time=102.873..102.877 rows=0 loops=1) Hash Cond: (((a.x)::numeric = b.x) AND ((a.y)::numeric = b.y) AND ((a.z)::numeric = b.z)) -> Seq Scan on a (cost=0.00..10.00 rows=600 width=12) (actual time=0.014..0.131 rows=600 loops=1) -> Hash (cost=1637.00..1637.00 rows=10 width=20) (actual time=101.920..101.921 rows=10 loops=1) Buckets: 131072 Batches: 1 Memory Usage: 6474kB -> Seq Scan on b (cost=0.00..1637.00 rows=10 width=20) (actual time=0.013..16.349 rows=10 loops=1) Planning Time: 0.153 ms Execution Time: 103.518 ms (8 rows) I also give an improvement relative to the left external or right connection: DROP TABLE IF EXISTS a,b CASCADE; CREATE TABLE a AS SELECT ((3*gs) % 300) AS x, ((3*gs+1) % 300) AS y, ((3*gs+2) % 300) AS z FROM generate_series(1,600) AS gs; CREATE TABLE b AS SELECT gs % 90 AS x, gs % 49 AS y, gs %100 AS z, 'abc' || gs AS payload FROM generate_series(1,1e5) AS gs; ANALYZE a,b; SET enable_cost_size = 'on'; EXPLAIN ANALYZE SELECT * FROM a right join b on a.x=b.x AND a.y=b.y AND a.z=b.z; SET enable_cost_size = 'off'; EXPLAIN ANALYZE SELECT * FROM a right join b on a.x=b.x AND a.y=b.y AND a.z=b.z; QUERY PLAN Hash Left Join (cost=20.50..3157.58 rows=10 width=32) (actual time=1.846..102.264 rows=10 loops=1) Hash Cond: ((b.x = (a.x)::numeric) AND (b.y = (a.y)::numeric) AND (b.z = (a.z)::numeric)) -> Seq Scan on b (cost=0.00..1637.00 rows=10 width=20) (actual time=0.041..15.328 rows=10 loops=1) -> Hash (cost=10.00..10.00 rows=600 width=12) (actual time=1.780..1.781 rows=600 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 34kB -> Seq Scan on a (cost=0.00..10.00 rows=600 width=12) (actual time=0.031..0.252 rows=600 loops=1) Planning Time: 0.492 ms Execution Time: 107.609 ms (8 rows) SET QUERY PLAN -- Hash Right Join (cost=3387.00..8500.08 rows=10 width=32) (actual time=80.919..101.613 rows=10 loops=1) Hash Cond: (((a.x)::numeric = b.x) AND ((a.y)::numeric = b.y) AND ((a.z)::numeric = b.z)) -> Seq Scan on a (cost=0.00..10.00 rows=600 width=12) (actual time=0.017..0.084 rows=600 loops=1) -> Hash (cost=1637.00..1637.00 rows=10 width=20) (actual time=80.122..80.123 rows=10 loops=1) Buckets: 131072 Batches: 1 Memory Usage: 6474kB -> Seq Scan on b (cost=0.00..1637.00 rows=10 width=20) (actual time=0.015..11.819 rows=10 loops=1) Planning Time: 0.194 ms Execution Time: 104.662 ms (8 rows) -- Regards, Alena Rybakina Postgres Professional diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index ef475d95a18..31771dfba46 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -153,6 +153,7 @@ bool enable_parallel_hash = true; bool enable_partition_pruning = true; bool enable_presorted_aggregate = true; bool enable_async_append = true; +bool enable_cost_size = true; typedef struct { @@ -4033,11 +4034,22 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path, thismcvfreq = restrictinfo->left_mcvfreq; } + if (enable_cost_size) + { +innerbucketsize *= thisbucketsize; +innermcvfreq *= thismcvfreq; + } + else + { if (innerbucketsize > thisbucketsize) innerbucketsize = thisbucketsize; if (innermcvfreq > thismcvfreq) innermcvfreq = thismcvfreq; + } } + + if (enable_cost_size && innerbucketsize > virtualbuckets) + innerbucketsize = 1.0 / virtualbuckets; } /* diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index 71e27f8eb05..ded9ba3b7a9 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -1007,6 +1007,19 @@ struct config_bool ConfigureNamesBool[] = true, NULL, NULL, NULL }, + { + {"enable_cost_size", PGC_USERSET, QUERY_TUNING_OTHER, + gettext_noop("s
Re: Problems with estimating OR conditions, IS NULL on LEFT JOINs
2.00..2.00 rows=100 width=8) (actual time=0.034..0.036 rows=100 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 12kB -> Seq Scan on small (cost=0.00..2.00 rows=100 width=8) (actual time=0.008..0.016 rows=100 loops=1) Planning Time: 0.168 ms Execution Time: 209.883 ms (10 rows)* From e8653477fbe7321325122a2d5032798cd6a95a8b Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Mon, 26 Jun 2023 15:39:11 +0300 Subject: [PATCH] Fixed the case of calculating underestimated cardinality for an LEFT JOIN with the restriction "IS NULL" in the clause. This error is caused by an incorrect calculation of selectivity in the "IS NULL" clause, since it took into account only table-level statistics without zero values in the results of the join operation. This patch fixes this by calculating the fraction of zero values on the right side of the join without of matching row on left. Co-authored-by: Alena Rybakina --- src/backend/utils/adt/selfuncs.c | 35 +--- src/include/nodes/pathnodes.h| 3 +++ 2 files changed, 35 insertions(+), 3 deletions(-) diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index c4fcd0076ea..8e18aa1dd2b 100644 --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -153,7 +153,7 @@ static double eqjoinsel_inner(Oid opfuncoid, Oid collation, bool isdefault1, bool isdefault2, AttStatsSlot *sslot1, AttStatsSlot *sslot2, Form_pg_statistic stats1, Form_pg_statistic stats2, - bool have_mcvs1, bool have_mcvs2); + bool have_mcvs1, bool have_mcvs2, double *unmatched_frac); static double eqjoinsel_semi(Oid opfuncoid, Oid collation, VariableStatData *vardata1, VariableStatData *vardata2, double nd1, double nd2, @@ -1710,6 +1710,9 @@ nulltestsel(PlannerInfo *root, NullTestType nulltesttype, Node *arg, stats = (Form_pg_statistic) GETSTRUCT(vardata.statsTuple); freq_null = stats->stanullfrac; + if (sjinfo) + freq_null = freq_null + sjinfo->unmatched_frac - freq_null * sjinfo->unmatched_frac; + switch (nulltesttype) { case IS_NULL: @@ -2313,13 +2316,24 @@ eqjoinsel(PG_FUNCTION_ARGS) } /* We need to compute the inner-join selectivity in all cases */ + /* + * calculate fraction of right without of matching row on left + * + * FIXME Should be restricted to JOIN_LEFT, we should have similar logic + * for JOIN_FULL. + * + * XXX Probably should calculate unmatched as fraction of the join result, + * not of the relation on the right (because the matched part can have more + * matches per row and thus grow). Not sure. Depends on how it's used later. + */ selec_inner = eqjoinsel_inner(opfuncoid, collation, , , nd1, nd2, isdefault1, isdefault2, , , stats1, stats2, - have_mcvs1, have_mcvs2); + have_mcvs1, have_mcvs2, + >unmatched_frac); switch (sjinfo->jointype) { @@ -2407,7 +2421,7 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation, bool isdefault1, bool isdefault2, AttStatsSlot *sslot1, AttStatsSlot *sslot2, Form_pg_statistic stats1, Form_pg_statistic stats2, -bool have_mcvs1, bool have_mcvs2) +bool have_mcvs1, bool have_mcvs2, double *unmatched_frac) { double selec; @@ -2503,7 +2517,10 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation, } CLAMP_PROBABILITY(matchfreq1); CLAMP_PROBABILITY(unmatchfreq1); + + *unmatched_frac = unmatchfreq1; matchfreq2 = unmatchfreq2 = 0.0; + for (i = 0; i < sslot2->nvalues; i++) { if (hasmatch2[i]) @@ -2581,10 +2598,22 @@ eqjoinsel_inner(Oid opfuncoid, Oid collation, double nullfrac2 = stats2 ? stats2->stanullfrac : 0.0; selec = (1.0 - nullfrac1) * (1.0 - nullfrac2); + + /* + * XXX Should this look at nullfrac on either side? Probably depends on + * if we're calculating fraction of NULLs or fraction of unmatched rows. + */ + // unmatchfreq = (1.0 - nullfrac1) * (1.0 - nullfrac2); if (nd1 > nd2) + { selec /= nd1; + *unmatched_frac = (nd1 - nd2) * 1.0 / nd1; + } else + { selec /= nd2; + *unmatched_frac = 0.0; + } } return selec; diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index c17b53f7adb..6bc63e648e6 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -2853,6 +2853,9 @@ struct SpecialJoinInfo bool semi_can_hash; /* true if semi_operators are all hash */ List *semi_operators; /* OIDs of equality join operators */ List *semi_rhs_exprs; /* righthand-side expressions of these ops */ + + /* For outer join, fraction of rows without a match. */ + Selectivity unmatched_frac; }; /* -- 2.34.1
Re: POC, WIP: OR-clause support for indexes
Sorry, I wrote the last sentence in a confusing way, I meant that I formed transformations for any number of "or" expressions (const_transform_or_limit=1). in regression tests, I noticed only diff changes of transformations of "or" expressions to "any". I attach a file with diff. On 26.06.2023 04:47, Alena Rybakina wrote: Hi, all! Sorry I haven't written for a long time. I finished writing the code patch for transformation "Or" expressions to "Any" expressions. I didn't see any problems in regression tests, even when I changed the constant at which the minimum or expression is replaced by any at 0. I ran my patch on sqlancer and so far the code has never fallen. On 14.01.2023 18:45, Marcos Pegoraro wrote: I agree with your idea and try to implement it and will soon attach a patch with a solution. Additionally, if those OR constants repeat you'll see ... If all constants are the same value, fine explain select * from x where ((ID = 1) OR (ID = 1) OR (ID = 1)); Index Only Scan using x_id on x (cost=0.42..4.44 rows=1 width=4) Index Cond: (id = 1) if all values are almost the same, ops explain select * from x where ((ID = 1) OR (ID = 1) OR (ID = 1) OR (ID = 2)); Bitmap Heap Scan on x (cost=17.73..33.45 rows=4 width=4) Recheck Cond: ((id = 1) OR (id = 1) OR (id = 1) OR (id = 2)) -> BitmapOr (cost=17.73..17.73 rows=4 width=0) -> Bitmap Index Scan on x_id (cost=0.00..4.43 rows=1 width=0) Index Cond: (id = 1) -> Bitmap Index Scan on x_id (cost=0.00..4.43 rows=1 width=0) Index Cond: (id = 1) -> Bitmap Index Scan on x_id (cost=0.00..4.43 rows=1 width=0) Index Cond: (id = 1) -> Bitmap Index Scan on x_id (cost=0.00..4.43 rows=1 width=0) Index Cond: (id = 2) thanks Marcos -- Regards, Alena Rybakina diff -U3 /home/alena/postgrespro9/src/test/regress/expected/create_index.out /home/alena/postgrespro9/src/test/regress/results/create_index.out --- /home/alena/postgrespro9/src/test/regress/expected/create_index.out 2023-06-26 04:08:22.903294342 +0300 +++ /home/alena/postgrespro9/src/test/regress/results/create_index.out 2023-06-26 05:36:31.904205986 +0300 @@ -1838,18 +1838,11 @@ EXPLAIN (COSTS OFF) SELECT * FROM tenk1 WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR tenthous = 42); - QUERY PLAN -- - Bitmap Heap Scan on tenk1 - Recheck Cond: (((thousand = 42) AND (tenthous = 1)) OR ((thousand = 42) AND (tenthous = 3)) OR ((thousand = 42) AND (tenthous = 42))) - -> BitmapOr - -> Bitmap Index Scan on tenk1_thous_tenthous - Index Cond: ((thousand = 42) AND (tenthous = 1)) - -> Bitmap Index Scan on tenk1_thous_tenthous - Index Cond: ((thousand = 42) AND (tenthous = 3)) - -> Bitmap Index Scan on tenk1_thous_tenthous - Index Cond: ((thousand = 42) AND (tenthous = 42)) -(9 rows) + QUERY PLAN +-- + Index Scan using tenk1_thous_tenthous on tenk1 + Index Cond: ((thousand = 42) AND (tenthous = ANY ('{1,3,42}'::integer[]))) +(2 rows) SELECT * FROM tenk1 WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR tenthous = 42); @@ -1861,20 +1854,17 @@ EXPLAIN (COSTS OFF) SELECT count(*) FROM tenk1 WHERE hundred = 42 AND (thousand = 42 OR thousand = 99); - QUERY PLAN -- + QUERY PLAN + Aggregate -> Bitmap Heap Scan on tenk1 - Recheck Cond: ((hundred = 42) AND ((thousand = 42) OR (thousand = 99))) + Recheck Cond: ((hundred = 42) AND (thousand = ANY ('{42,99}'::integer[]))) -> BitmapAnd -> Bitmap Index Scan on tenk1_hundred Index Cond: (hundred = 42) - -> BitmapOr - -> Bitmap Index Scan on tenk1_thous_tenthous - Index Cond: (thousand = 42) - -> Bitmap Index Scan on tenk1_thous_tenthous - Index Cond: (thousand = 99) -(11 rows) + -> Bitmap Index Scan on tenk1_thous_tenthous + Index Cond: (thousand = ANY ('{42,99}'::integer[])) +(8 rows) SELECT count(*) FROM tenk1 WHERE hundred = 42
Re: POC, WIP: OR-clause support for indexes
Hi, all! Sorry I haven't written for a long time. I finished writing the code patch for transformation "Or" expressions to "Any" expressions. I didn't see any problems in regression tests, even when I changed the constant at which the minimum or expression is replaced by any at 0. I ran my patch on sqlancer and so far the code has never fallen. On 14.01.2023 18:45, Marcos Pegoraro wrote: I agree with your idea and try to implement it and will soon attach a patch with a solution. Additionally, if those OR constants repeat you'll see ... If all constants are the same value, fine explain select * from x where ((ID = 1) OR (ID = 1) OR (ID = 1)); Index Only Scan using x_id on x (cost=0.42..4.44 rows=1 width=4) Index Cond: (id = 1) if all values are almost the same, ops explain select * from x where ((ID = 1) OR (ID = 1) OR (ID = 1) OR (ID = 2)); Bitmap Heap Scan on x (cost=17.73..33.45 rows=4 width=4) Recheck Cond: ((id = 1) OR (id = 1) OR (id = 1) OR (id = 2)) -> BitmapOr (cost=17.73..17.73 rows=4 width=0) -> Bitmap Index Scan on x_id (cost=0.00..4.43 rows=1 width=0) Index Cond: (id = 1) -> Bitmap Index Scan on x_id (cost=0.00..4.43 rows=1 width=0) Index Cond: (id = 1) -> Bitmap Index Scan on x_id (cost=0.00..4.43 rows=1 width=0) Index Cond: (id = 1) -> Bitmap Index Scan on x_id (cost=0.00..4.43 rows=1 width=0) Index Cond: (id = 2) thanks Marcos -- Regards, Alena Rybakina From 56fba3befe4f6b041d097d8884815fe943fb21f9 Mon Sep 17 00:00:00 2001 From: Alena Rybakina Date: Mon, 26 Jun 2023 04:18:15 +0300 Subject: [PATCH] Replace clause (X=N1) OR (X=N2) ... with X = ANY(N1, N2) on the stage of the optimiser when we are still working with a tree expression. Firstly, we do not try to make a transformation for "non-or" expressions or inequalities and the creation of a relation with "or" expressions occurs according to the same scenario; secondly, we do not make transformations if there are less than 15 or expressions (here you can put another number, but during testing, already starting with 3 expressions, the execution time and planning time with transformed or were faster); thirdly, it is worth considering that we consider "or" expressions only at the current level. The transformation takes place according to the following scheme: first we define the groups on the left side, collect the constants in a list for each group, then considering each group we make the collected constants to one type, find a common type for array and using the make_scalar_array_op function we form ScalarArrayOpExpr, if possible. If it is not possible, then these constants both remain in the same expr as before the function call. All successful attempts (received ScalarArrayOpExpr together with unformulated Expr are combined via OR operation). --- src/backend/parser/parse_expr.c | 289 ++- src/tools/pgindent/typedefs.list | 1 + 2 files changed, 289 insertions(+), 1 deletion(-) diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index 346fd272b6d..c5f58aee9ec 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -95,6 +95,293 @@ static Expr *make_distinct_op(ParseState *pstate, List *opname, static Node *make_nulltest_from_distinct(ParseState *pstate, A_Expr *distincta, Node *arg); +typedef struct OrClauseGroupEntry +{ + Node *node; + List *consts; + Oidscalar_type; + Oidopno; + Expr *expr; +} OrClauseGroupEntry; + +int const_transform_or_limit = 15; + +static Node * +transformBoolExprOr(ParseState *pstate, Expr *expr_orig) +{ + List *or_list = NIL; + ListCell *lc_eargs, + *lc_args; + List *groups_list = NIL; + bool change_apply = false; + const char *opname; + Node *result; + bool or_statement=false; + BoolExpr *expr = (BoolExpr *)copyObject(expr_orig); + + Assert(IsA(expr, BoolExpr)); + + /* If this is not expression "Or", then will do it the old way. */ + switch (expr->boolop) + { + case AND_EXPR: + opname = "AND"; + break; + case OR_EXPR: + opname = "OR"; + or_statement = true; + break; + case NOT_EXPR: + opname = "NOT"; + break; + default: + elog(ERROR, "unrecognized boolop: %d", (int) expr->boolop); + opname = NULL; /* keep compiler quiet */ + break; + } + + if (!or_statement || list_length(expr->args) < const_transform_or_limit) + return transformBoolExpr(pstate, (BoolExpr *)expr_orig); + + /* + * NOTE: + * It is an OR-clause. So, rinfo->orclause is a BoolExpr node, contains + * a list of sub-restrictinfo args, and rinfo->clause - which is the + * same expression, made from bare clauses. To not break selectivity + * caches and other optimizations, use both: + * - use rinfos
Re: eqjoinsel_semi still sucks ...
get to impact the join size estimate at all, since set_joinrel_size_estimates is not going to factor the inner rel size into what it multiplies the join selectivity against. In short, I was mistakenly extrapolating from the observation that it helped to hack the ndistinct estimate for a semijoin's inner rel, to the conclusion that we should do that for all join input rels. * *...* *Yeah, those are estimating that all the outer rows have join partners, because there are more distinct values in the sub-select than there are in the outer relation. * In addition, the answer to the question why clamping is necessary is also in the comment to commit 97930cf5, and this not only reduced the number of rows coming out of a table and the number of possibly-distinct values of a join variable, but join restriction clauses that might have been applied at a lower level of join. *That patch accounted for baserel restriction clauses that reduced the number of rows coming out of a table (and hence the number of possibly-distinct values of a join variable), but not for join restriction clauses that might have been applied at a lower level of join. To account for the latter, look up the sizes of the min_lefthand and min_righthand inputs of the current join, and clamp with those in the same way as for the base relations.* 1. https://www.postgresql.org/message-id/flat/5156.1314829311%40sss.pgh.pa.us#86609b6ac6af405dec67316bfbe9868f 2. https://www.postgresql.org/message-id/raw/5156.1314829311%40sss.pgh.pa.us Regards, Alena Rybakina
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
I'm sorry I was unable to respond right away. On 09.05.2023 17:23, torikoshia wrote: You may already understand it, but these variable names are given in imitation of FREEZE and BINARY cases: --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -42,6 +42,7 @@ typedef struct CopyFormatOptions * -1 if not specified */ bool binary; /* binary format? */ bool freeze; /* freeze rows on loading? */ + bool ignore_datatype_errors; /* ignore rows with datatype errors */ --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate, bool format_specified = false; bool freeze_specified = false; bool header_specified = false; + bool ignore_datatype_errors_specified = false; I think it would be sane to align the names with the FREEZE and BINARY options. I agree with the name is too long and we once used the name 'ignore_errors'. However, current implementation does not ignore all errors but just data type error, so I renamed it. There may be a better name, but I haven't come up with one. Yes, you are right, I saw it. As far as I take a quick look at on PostgreSQL source code, there're few variable name with "_counter". It seems to be used for function names. Something like "ignored_errors_count" might be better. I noticed that many variables are named with the "_counter" postfix, and most of them are used as a counter. For example, PgStat_StatTabEntry or JitInstrumentation structures consisted of many such variables. Despite this, I agree with your suggested name, because I found many similar variables that are used in the program as a counter, but it seems to me that the most of them are still used by local variables in the function.
Fwd: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
I'm sorry I was unable to respond right away. On 09.05.2023 17:23, torikoshia wrote: You may already understand it, but these variable names are given in imitation of FREEZE and BINARY cases: --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -42,6 +42,7 @@ typedef struct CopyFormatOptions * -1 if not specified */ bool binary; /* binary format? */ bool freeze; /* freeze rows on loading? */ + bool ignore_datatype_errors; /* ignore rows with datatype errors */ --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate, bool format_specified = false; bool freeze_specified = false; bool header_specified = false; + bool ignore_datatype_errors_specified = false; I think it would be sane to align the names with the FREEZE and BINARY options. I agree with the name is too long and we once used the name 'ignore_errors'. However, current implementation does not ignore all errors but just data type error, so I renamed it. There may be a better name, but I haven't come up with one. Yes, you are right, I saw it. As far as I take a quick look at on PostgreSQL source code, there're few variable name with "_counter". It seems to be used for function names. Something like "ignored_errors_count" might be better. I noticed that many variables are named with the "_counter" postfix, and most of them are used as a counter. For example, PgStat_StatTabEntry or JitInstrumentation structures consisted of many such variables. Despite this, I agree with your suggested name, because I found many similar variables that are used in the program as a counter, but it seems to me that the most of them are still used by local variables in the function.
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Hi! Thank you, Damir, for your patch. It is very interesting to review it! It seemed to me that the names of variables are not the same everywhere. I noticed that you used /ignore_datatype_errors_specified/ variable in /copy.c/ , but guc has a short name /ignore_datatype_errors/. Also you used the short variable name in /CopyFormatOptions/ structure. Name used /ignore_datatype_errors_specified /is seemed very long to me, may be use a short version of it (/ignore_datatype_errors/) in /copy.c/ too? Besides, I noticed that you used /ignored_errors/ variable in /CopyFromStateData/ structure and it's name is strikingly similar to name (/ignore_datatype_error//s/), but they have different meanings. Maybe it will be better to rename it as /ignored_errors_counter/? I tested last version /v5-0001-Add-new-COPY-option-IGNORE_DATATYPE_ERRORS.patch/ with /bytea/ data type and transaction cases. Eventually, I didn't find any problem there. I described my steps in more detail, if I missed something. *First of all, I ran copy function with IGNORE_DATATYPE_ERRORS parameter being in transaction block.* / //File t2.csv exists:/ |id,b 769,\ 1,\6e 2,\x5 5,\x| /Test:/ CREATE TABLE t (id INT , b BYTEA) ; postgres=# BEGIN; copy t FROM '/home/alena/postgres/t2.csv' WITH (format 'csv', IGNORE_DATATYPE_ERRORS, delimiter ',', HEADER); SAVEPOINT my_savepoint; BEGIN WARNING: invalid input syntax for type bytea WARNING: invalid input syntax for type bytea WARNING: invalid hexadecimal data: odd number of digits WARNING: 3 rows were skipped due to data type incompatibility COPY 1 SAVEPOINT postgres=*# copy t FROM '/home/alena/postgres/t2.csv' WITH (format 'csv', IGNORE_DATATYPE_ERRORS, delimiter ',', HEADER); WARNING: invalid input syntax for type bytea WARNING: invalid input syntax for type bytea WARNING: invalid hexadecimal data: odd number of digits WARNING: 3 rows were skipped due to data type incompatibility COPY 1 postgres=*# ROLLBACK TO my_savepoint; ROLLBACK postgres=*# select * from t; id | b + 5 | \x (1 row) postgres=*# copy t FROM '/home/alena/postgres/t2.csv' WITH (format 'csv', IGNORE_DATATYPE_ERRORS, delimiter ',', HEADER); WARNING: invalid input syntax for type bytea WARNING: invalid input syntax for type bytea WARNING: invalid hexadecimal data: odd number of digits WARNING: 3 rows were skipped due to data type incompatibility COPY 1 postgres=*# select * from t; id | b + 5 | \x 5 | \x (2 rows) postgres=*# commit; COMMIT *I tried to use the similar test and moved transaction block in function:* CREATE FUNCTION public.log2() RETURNS void LANGUAGE plpgsql SECURITY DEFINER AS $function$ BEGIN; copy t FROM '/home/alena/postgres/t2.csv' WITH (format 'csv', IGNORE_DATATYPE_ERRORS, delimiter ',', HEADER); SAVEPOINT my_savepoint; END; $function$; postgres=# delete from t; postgres=# select 1 as t from log2(); WARNING: invalid input syntax for type bytea WARNING: invalid input syntax for type bytea WARNING: invalid hexadecimal data: odd number of digits WARNING: 3 rows were skipped due to data type incompatibility t --- 1 (1 row) *Secondly I checked function copy with bytea datatype. * /t1.csv consists:/ id,b 769,\x2d 1,\x6e 2,\x5c 5,\x /And I ran it:/ postgres=# delete from t; DELETE 4 postgres=# copy t FROM '/home/alena/postgres/t2.csv' WITH (format 'csv', IGNORE_DATATYPE_ERRORS, delimiter ',', HEADER); WARNING: invalid input syntax for type bytea WARNING: invalid input syntax for type bytea WARNING: invalid hexadecimal data: odd number of digits WARNING: 3 rows were skipped due to data type incompatibility COPY 1 postgres=# select * from t; id | b + 5 | \x (1 row) -- --- Alena Rybakina Postgres Professional
Re: POC, WIP: OR-clause support for indexes
ch I see for now: 1 Calculating cost. Right now it's just a simple transformation of costs computed for BitmapOr path. I'd like to hope that's possible and so index's estimation function could be non-touched. So, they could believe that all clauses are implicitly-ANDed 2 I'd like to add such support to btree but it seems that it should be a separated patch. Btree search algorithm doesn't use any kind of stack of pages and algorithm to walk over btree doesn't clear for me for now. 3 I could miss some places which still assumes implicitly-ANDed list of clauses although regression tests passes fine. I support such a cunning approach. But this specific case, you demonstrated above, could be optimized independently at an earlier stage. If to convert: (F(A) = ConstStableExpr_1) OR (F(A) = ConstStableExpr_2) to F(A) IN (ConstStableExpr_1, ConstStableExpr_2) it can be seen significant execution speedup. For example, using the demo.sql to estimate maximum positive effect we see about 40% of execution and 100% of planning speedup. To avoid unnecessary overhead, induced by the optimization, such transformation may be made at the stage of planning (we have cardinality estimations and have pruned partitions) but before creation of a relation scan paths. So, we can avoid planning overhead and non-optimal BitmapOr in the case of many OR's possibly aggravated by many indexes on the relation. For example, such operation can be executed in create_index_paths() before passing rel->indexlist. -- Alena Rybakina Postgres Professional
Re: RFC: Logging plan of the running query
Sorry, I wrote confusingly at that time. No, I suggested adding comment about the explanation of HandleLogQueryPlanInterrupt() only in the explain.h and not removing from the explain.c. I seemed to be necessary separating declaration function for 'explaining feature' of executed query from our logging plan of the running query interrupts function. But now, I doubt it. I'm not sure I understand your comment correctly, do you mean HandleLogQueryPlanInterrupt() should not be placed in explain.c? Thank you for having reminded about this function and I looked at ProcessLogMemoryContextInterrupt() declaration. I'm noticed comments in the memutils.h are missed tooю Description of this function is written only in mcxt.c. However, given that ProcesLogMemoryContextInterrupt(), which similarly handles interrupts for pg_log_backend_memory_contexts(), is located in mcxt.c, I also think current location might be acceptable. So I think you are right and the comment about the explanation of HandleLogQueryPlanInterrupt() written in explain.h is redundant. I feel this comment is unnecessary since the explanation of HandleLogQueryPlanInterrupt() is written in explain.c and no functions in explain.h have comments in it. Regards, -- Alena Rybakina Postgres Professional
Re: RFC: Logging plan of the running query
Ok, I get it. Since GetLockMethodLocalHash() is only used for assertions, this is only defined when USE_ASSERT_CHECKING is enabled. This patch uses GetLockMethodLocalHash() not only for the assertion purpose, so I removed "ifdef USE_ASSERT_CHECKING" for this function. I belive it does not lead to skip errors. Agree. Since these processes are needed for nested queries, not only for AbortSubTransaction[1], added comments on it. I also noticed it. However I also discovered that above function declarations to be aplied for explain command and used to be printed details of the executed query. We have a similar task to print the plan of an interrupted process making a request for a specific pid. In short, I think, this task is different and for separating these parts I added this comment. I feel this comment is unnecessary since the explanation of HandleLogQueryPlanInterrupt() is written in explain.c and no functions in explain.h have comments in it. Yes, I was worried about it. I understood it, thank for explaining. AFAIU both of them are processed since SendProcSignal flags ps_signalFlags for each signal. ``` SendProcSignal(pid_t pid, ProcSignalReason reason, BackendId backendId) { volatile ProcSignalSlot *slot; ...(snip)... 278 if (slot->pss_pid == pid) 279 { 280 /* Atomically set the proper flag */ 281 slot->pss_signalFlags[reason] = true; 282 /* Send signal */ 283 return kill(pid, SIGUSR1); ``` Comments of ProcSignalReason also say 'We can cope with concurrent signals for different reasons'. ```C /* * Reasons for signaling a Postgres child process (a backend or an auxiliary * process, like checkpointer). We can cope with concurrent signals for different * reasons. However, if the same reason is signaled multiple times in quick * succession, the process is likely to observe only one notification of it. * This is okay for the present uses. ... typedef enum { PROCSIG_CATCHUP_INTERRUPT, /* sinval catchup interrupt */ PROCSIG_NOTIFY_INTERRUPT, /* listen/notify interrupt */ PROCSIG_PARALLEL_MESSAGE, /* message from cooperating parallel backend */ PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */ PROCSIG_BARRIER, /* global barrier interrupt */ PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */ PROCSIG_LOG_QUERY_PLAN, /* ask backend to log plan of the current query */ ... } ProcSignalReason; ``` [1] https://www.postgresql.org/message-id/8b53b32f-26cc-0531-4ac0-27310e0bef4b%40oss.nttdata.com