Re: [Doc] Glossary Term Definitions Edits
On 2023-10-14 06:16 +0200, Andrew Atkinson write: >- When describing options for a command, changed to “option of” instead >of “option to” I think "option to" is not wrong (maybe less common). I've seen this in other texts and took it as "the X option [that applies] to Y". >- “system- or user-supplied”, removed the dash after system. Or I’d >suggest system-supplied or user-supplied, to hyphenate both. That's a suspended hyphen and is common usage. >- Changed “volume of records has been written” to “volume of records >were written” "Has been" means that something happened just now. This is perfectly fine when talking about checkpoints IMO. >- Many examples of “an SQL”. I changed those to “a SQL...”. For example >I changed “An SQL command which” to “A SQL command that”. I'm not an >English major so maybe I'm missing something here. Depends on how you pronounce SQL (ess-cue-el or sequel). "An SQL" is more common in the docs whereas "a SQL" is more common in code comments. -- Erik
Re: [Doc] Glossary Term Definitions Edits
On Sat, 14 Oct 2023, 5:20 pm Andrew Atkinson, wrote: > >- Many examples of “an SQL”. I changed those to “a SQL...”. For >example I changed “An SQL command which” to “A SQL command that”. I'm not >an English major so maybe I'm missing something here. > > It would depend on how you pronounce SQL. For those that say es-que-el, "An" is the correct article. If you say sequel then it's "a". We've standardised our docs to use "an SQL", so any changes we make would be the opposite way. David >
RE: [PoC] pg_upgrade: allow to upgrade publisher node
Dear hackers, Here is a new patch. Previously I wrote: > Based on above idea, I made new version patch which some functionalities were > exported from pg_resetwal. In this approach, pg_upgrade itself removed WALs > and > then create logical slots, then pg_resetwal would be called with new option > --no-switch, which avoid to switch a WAL segment file. The option is only used > for the upgrading purpose so it is not written in doc and usage(). This option > is not required if pg_resetwal -o does not discard WAL records. Please see the > fork thread [1]. But for now, these changes were reverted because changing pg_resetwal -o stuff may be a bit risky. This has been located more than ten years so that we should be more careful for modifying. Also, I cannot come up with problems if slots are created after the pg_resetwal. Background processes would not generate decodable changes (listed in [1]), and BGworkers by extensions could be ignored [2]. Based on the discussion on forked thread [3] and if it is accepted, we will apply again. Also. some comments and function name was improved. [1]: https://www.postgresql.org/message-id/TYAPR01MB58660273EACEFC5BF256B133F50DA%40TYAPR01MB5866.jpnprd01.prod.outlook.com [2]: https://www.postgresql.org/message-id/CAA4eK1L4JB%2BKH_4EQryDEhyaLBPW6V20LqjdzOxCWyL7rbxqsA%40mail.gmail.com [3]: https://www.postgresql.org/message-id/flat/CAA4eK1KRyPMiY4fW98qFofsYrPd87Oc83zDNxSeHfTYh_asdBg%40mail.gmail.com Best Regards, Hayato Kuroda FUJITSU LIMITED v50-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch Description: v50-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch
[Doc] Glossary Term Definitions Edits
Hello. I started reading through the Glossary[^1] terms to learn from the definitions, and to double check them against what I'd written elsewhere. I found myself making edits. :) I've put the edits together into a patch. My goal was to focus on wording simplifications that are smoother to read, without big changes. I realized I should check with others though, so this is a mid-point check-in. For now I went through terms from "A" through "I". Here's a recap of the changes: - Changed places like “to make” to use the verb directly, i.e. “make” - When describing options for a command, changed to “option of” instead of “option to” - “system- or user-supplied”, removed the dash after system. Or I’d suggest system-supplied or user-supplied, to hyphenate both. - Changed “will access” to “access” - Changed “helps to prevent” to “helps prevent” - Changed “volume of records has been written” to “volume of records were written” - “It is required that this user exist” changed to “This user is required to exist...” (I’d also suggest “This user must exist before”) as a simplification, but that’s a bigger difference from what’s there now. - Changed “operating-system” to remove the hyphen, which is how it’s written elsewhere in the Glossary. - Many examples of “an SQL”. I changed those to “a SQL...”. For example I changed “An SQL command which” to “A SQL command that”. I'm not an English major so maybe I'm missing something here. - I often thought “that” was easier to read than “which”, and there are several examples in the patch. For example “Space in data pages that…” replaced “Space in data pages which…” - Simplifications like: “There also exist two secondary forks” to “There are two secondary forks” I was able to build the documentation locally and preview the HTML version. If these types of changes are helpful, and can continue a consistent style through all the terms and provide a new (larger) v2 patch. Thanks for taking a look. Andrew Atkinson [^1]: https://www.postgresql.org/docs/current/glossary.html glossary_terms_grammar_edits_v1.patch Description: Binary data
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock
On Wed, Oct 11, 2023 at 4:35 PM Dilip Kumar wrote: > > The small size of the SLRU buffer pools can sometimes become a > performance problem because it’s not difficult to have a workload > where the number of buffers actively in use is larger than the > fixed-size buffer pool. However, just increasing the size of the > buffer pool doesn’t necessarily help, because the linear search that > we use for buffer replacement doesn’t scale, and also because > contention on the single centralized lock limits scalability. > > There is a couple of patches proposed in the past to address the > problem of increasing the buffer pool size, one of the patch [1] was > proposed by Thomas Munro where we make the size of the buffer pool > configurable. And, in order to deal with the linear search in the > large buffer pool, we divide the SLRU buffer pool into associative > banks so that searching in the buffer pool doesn’t get affected by the > large size of the buffer pool. This does well for the workloads which > are mainly impacted by the frequent buffer replacement but this still > doesn’t stand well with the workloads where the centralized control > lock is the bottleneck. > > So I have taken this patch as my base patch (v1-0001) and further > added 2 more improvements to this 1) In v1-0002, Instead of a > centralized control lock for the SLRU I have introduced a bank-wise > control lock 2)In v1-0003, I have removed the global LRU counter and > introduced a bank-wise counter. The second change (v1-0003) is in > order to avoid the CPU/OS cache invalidation due to frequent updates > of the single variable, later in my performance test I will show how > much gain we have gotten because of these 2 changes. > > Note: This is going to be a long email but I have summarised the main > idea above this point and now I am going to discuss more internal > information in order to show that the design idea is valid and also > going to show 2 performance tests where one is specific to the > contention on the centralized lock and other is mainly contention due > to frequent buffer replacement in SLRU buffer pool. We are getting ~2x > TPS compared to the head by these patches and in later sections, I am > going discuss this in more detail i.e. exact performance numbers and > analysis of why we are seeing the gain. > ... > > Performance Test: > Exp1: Show problems due to CPU/OS cache invalidation due to frequent > updates of the centralized lock and a common LRU counter. So here we > are running a parallel transaction to pgbench script which frequently > creates subtransaction overflow and that forces the visibility-check > mechanism to access the subtrans SLRU. > Test machine: 8 CPU/ 64 core/ 128 with HT/ 512 MB RAM / SSD > scale factor: 300 > shared_buffers=20GB > checkpoint_timeout=40min > max_wal_size=20GB > max_connections=200 > > Workload: Run these 2 scripts parallelly: > ./pgbench -c $ -j $ -T 600 -P5 -M prepared postgres > ./pgbench -c 1 -j 1 -T 600 -f savepoint.sql postgres > > savepoint.sql (create subtransaction overflow) > BEGIN; > SAVEPOINT S1; > INSERT INTO test VALUES(1) > ← repeat 70 times → > SELECT pg_sleep(1); > COMMIT; > > Code under test: > Head: PostgreSQL head code > SlruBank: The first patch applied to convert the SLRU buffer pool into > the bank (0001) > SlruBank+BankwiseLockAndLru: Applied 0001+0002+0003 > > Results: > Clients Head SlruBank SlruBank+BankwiseLockAndLru > 1 457 491 475 > 8 375338193782 > 32 14594 14328 17028 > 64 1560016243 25944 > 1281595716272 31731 > > So we can see that at 128 clients, we get ~2x TPS(with SlruBank + > BankwiseLock and bankwise LRU counter) as compared to HEAD. > This and other results shared by you look promising. Will there be any improvement in workloads related to clog buffer usage? BTW, I remember that there was also a discussion of moving SLRU into a regular buffer pool [1]. You have not provided any explanation as to whether that approach will have any merits after we do this or whether that approach is not worth pursuing at all. [1] - https://commitfest.postgresql.org/43/3514/ -- With Regards, Amit Kapila.
Re: JSON Path and GIN Questions
On 2023-10-09 01:13 +0200, David E. Wheeler write: > On Sep 12, 2023, at 21:00, Erik Wienhold wrote: > > >> I posted this question on Stack Overflow > >> (https://stackoverflow.com/q/77046554/79202), > >> and from the suggestion I got there, it seems that @@ expects a boolean to > >> be > >> returned by the path query, while @? wraps it in an implicit exists(). Is > >> that > >> right? > > > > That's also my understanding. We had a discussion about the docs on @@, > > @?, and > > jsonb_path_query on -general a while back [1]. Maybe it's useful also. > > Hi, finally getting back to this, still fiddling to figure out the > differences. From the thread you reference [1], is the point that @@ > and jsonb_path_match() can only be properly used with a JSON Path > expression that’s a predicate check? I think so. That's also supported by the existing docs which only mention "JSON path predicate" for @@ and jsonb_path_match(). > If so, as far as I can tell, only exists() around the entire path > query, or the deviation from the SQL standard that allows an > expression to be a predicate? Looks like that. But note that exists() is also a filter expression. So wrapping the entire jsonpath in exists() is also a deviation from the SQL standard which only allows predicates in filter expressions, i.e. ' ? ()'. > This suggest to me that the "Only the first item of the result is > taken into account” bit from the docs may not be quite right. Yes, this was also the issue in the referenced thread[1]. I think my suggesstion in [2] explains it (as far as I understand it). > Consider this example: > > david=# select jsonb_path_query('{"a":[false,true,false]}', '$.a ?(@[*] == > false)'); > jsonb_path_query > -- > false > false > (2 rows) > > david=# select jsonb_path_match('{"a":[false,true,false]}', '$.a ?(@[*] == > false)'); > ERROR: single boolean result is expected > > jsonb_path_match(), it turns out, only wants a single result. But > furthermore perhaps the use of a filter predicate rather than a > predicate expression for the entire path query is an error? Yes, I think @@ and jsonb_path_match() should not be used with filter expressions because the jsonpath returns whatever the path expression yields (which may be an actual boolean value in the jsonb). The filter expression only filters (as the name suggests) what the path expression yields. > Curiously, @@ seems okay with it: > > david=# select '{"a":[false,true,false]}'@@ '$.a ?(@[*] == false)'; > ?column? > -- > t > > Not a predicate query, and somehow returns true even though the first > item of the result is false? Is that how it should be? Your example does a text search equivalent to: select to_tsvector('{"a":[false,true,false]}') @@ plainto_tsquery('$.a ? (@[*] == true)') You forgot the cast to jsonb. jsonb @@ jsonpath actually returns null: test=# select '{"a":[false,true,false]}'::jsonb @@ '$.a ? (@[*] == false)'; ?column? -- (1 row) This matches the note right after the docs for @@: "The jsonpath operators @? and @@ suppress the following errors: missing object field or array element, unexpected JSON item type, datetime and numeric errors. The jsonpath-related functions described below can also be told to suppress these types of errors. This behavior might be helpful when searching JSON document collections of varying structure." That would be the silent argument of jsonb_path_match(): test=# select jsonb_path_match('{"a":[false,true,false]}', '$.a ? (@[*] == false)', silent => true); jsonb_path_match -- (1 row) [1] https://www.postgresql.org/message-id/CACJufxE01sxgvtG4QEvRZPzs_roggsZeVvBSGpjM5tzE5hMCLA%40mail.gmail.com [2] https://www.postgresql.org/message-id/880194083.579916.1680598906819%40office.mailbox.org -- Erik
Re: Included xid in restoring reorder buffer changes from disk log message
On Tue, 10 Oct 2023 at 06:59, Kyotaro Horiguchi wrote: > > At Fri, 6 Oct 2023 14:58:13 +0530, vignesh C wrote in > > On Fri, 30 Apr 2021 at 11:53, Dilip Kumar wrote: > > > It makes sense to include xid in the debug message, earlier I thought > > > that will it be a good idea to also print the toplevel_xid. But I > > > think it is for debug purposes and only we have the xid we can fetch > > > the other parameters so maybe adding xid is good enough. > > +1 > > > While having a look at the reorderbuffer code, I noticed that this > > changes were still not committed. > > Here is a rebased version of the patch. > > While this patch makes the following change on the de-serializing side; > > - elog(DEBUG2, "restored %u/%u changes from disk", > + elog(DEBUG2, "restored %u/%u changes of XID %u from > disk", > > the counter part ReorderBufferSerializeTXN() has the following > message. > > > elog(DEBUG2, "spill %u changes in XID %u to disk", > >(uint32) txn->nentries_mem, txn->xid); > > It might be preferable for those two messages to have a corresponding > appearance. We cannot include nentries in ReorderBufferSerializeTXN as the number of entries will not be known at that time. Modified to keep it consistent by changing it to "... changes in XID ...". Attached v3 patch has the changes for the same. Regards, Vignesh From bbd3ea2376b76a14d5e59e22e3f36c4637193cab Mon Sep 17 00:00:00 2001 From: Vignesh C Date: Fri, 6 Oct 2023 14:21:32 +0530 Subject: [PATCH v3] Include xid in restoring reorder buffer changes from disk log message. Include xid in restoring reorder buffer changes from disk log message. --- src/backend/replication/logical/reorderbuffer.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index 12edc5772a..6ebda5cc5f 100644 --- a/src/backend/replication/logical/reorderbuffer.c +++ b/src/backend/replication/logical/reorderbuffer.c @@ -1442,9 +1442,10 @@ ReorderBufferIterTXNNext(ReorderBuffer *rb, ReorderBufferIterTXNState *state) dlist_head_element(ReorderBufferChange, node, >txn->changes); - elog(DEBUG2, "restored %u/%u changes from disk", + elog(DEBUG2, "restored %u/%u changes in XID %u from disk", (uint32) entry->txn->nentries_mem, - (uint32) entry->txn->nentries); + (uint32) entry->txn->nentries, + entry->txn->xid); Assert(entry->txn->nentries_mem); /* txn stays the same */ -- 2.34.1
Re: Performance degradation on concurrent COPY into a single relation in PG16.
Hi, On 2023-10-13 11:30:35 -0700, Andres Freund wrote: > On 2023-10-13 10:39:10 -0700, Andres Freund wrote: > > On 2023-10-12 09:24:19 -0700, Andres Freund wrote: > > > I kind of had hoped somebody would comment on the approach. Given that > > > nobody > > > has, I'll push the minimal fix of resetting the state in > > > ReleaseBulkInsertStatePin(), even though I think architecturally that's > > > not > > > great. > > > > I spent some time working on a test that shows the problem more cheaply than > > the case upthread. I think it'd be desirable to have a test that's likely to > > catch an issue like this fairly quickly. We've had other problems in this > > realm before - there's only a single test that fails if I remove the > > ReleaseBulkInsertStatePin() call, and I don't think that's guaranteed at > > all. > > > > I'm a bit on the fence on how large to make the relation. For me the bug > > triggers when filling both relations up to the 3rd block, but how many rows > > that takes is somewhat dependent on space utilization on the page and stuff. > > > > Right now the test uses data/desc.data and ends up with 328kB and 312kB in > > two > > partitions. Alternatively I could make the test create a new file to load > > with > > copy that has fewer rows than data/desc.data - I didn't see another data > > file > > that works conveniently and has fewer rows. The copy is reasonably fast, > > even > > under something as expensive as rr (~60ms). So I'm inclined to just go with > > that? > > Patch with fix and test attached (0001). Pushed that. Greetings, Andres
Re: PostgreSQL domains and NOT NULL constraint
On 10/13/23 06:37, Tom Lane wrote: Vik Fearing writes: Regardless of what the spec may or may not say about v1.d, it still remains that nulls should not be allowed in a *base table* if the domain says nulls are not allowed. Not mentioned in this thread but the constraints are also applied when CASTing to the domain. Hmph. The really basic problem here, I think, is that the spec wants to claim that a domain is a data type, but then it backs off and limits where the domain's constraints need to hold. I don't think that is an accurate depiction of domains. First of all, I am not seeing where it says that a domain is a data type. It allows domains to be used in some places where a data type is used, but that is not equivalent to a domain /being/ a data type. Section 4.14 says, "A domain is a set of permissible values." and then goes on to say that that is a combination of a predefined type and zero or more search conditions. It can also have a default value, but it does not seem relevant to talk about that in this discussion. Section 4.25.4, "Domain constraints" has this to say (emphasis mine): - A domain constraint is satisfied by SQL-data *if and only if*, for every *table* T that has a column named C based on that domain, the applicable recorded in the appropriate domain constraint usage evaluates to True or Unknown. - A domain constraint is satisfied by the result of a specification> if and only if the specified template , with each occurrence of the VALUE replaced by that result, evaluates to True or Unknown. This tells me that the constraints should only be checked at those two points. Secondly, why are you so concerned about outer join nulls here and not for any other column marked NOT NULL? That's fundamentally inconsistent. It's like claiming that 'foobarbaz' is a valid value of type numeric as long as it's only in flight within a query and you haven't tried to store it into a table. It's like claiming that null is a valid value of type numeric as long as it's only in flight within a query and you haven't tried to store it into a table with that column marked NOT NULL. Practical problems with this include: * If a function declares its argument as being of a domain type, can it expect that the passed value obeys the constraints? * If a function declares its result as being of a domain type, is it required to return a result that obeys the constraints? (This has particular force for RETURNS NULL ON NULL INPUT functions, for which we just automatically return NULL given a NULL input without any consideration of whether the result type nominally prohibits that.) * If a plpgsql function has a variable that is declared to be of domain type, do we enforce the domain's constraints when assigning? Routines are not allowed to have domains in their parameters or result types. I am all for PostgreSQL expanding the spec wherever we can, but in the above cases we have to define things ourselves. * If a composite type has a column of a domain type, do we enforce the domain's constraints when assigning or casting to that? I don't see that a composite type is able to have a member of a domain. As for what PostgreSQL should do in this case, my opinion is "yes". AFAICS, the spec's position leaves all of these as judgment calls, or else you might claim that none of the above cases are even allowed to be declared per spec. I don't find either of those satisfactory, so I reiterate my position that the committee hasn't thought this through. My claim is indeed that these cases are not allowed per-spec and therefore the spec doesn't *need* to think about them. We do. As you know, I am more than happy to (try to) amend the spec where needed, but Erki's complaint of a null value being allowed in a base table is clearly a bug in our implementation regardless of what we do with views. I agree it's not a good behavior, but I still say it's traceable to schizophenia in the spec. If the result of a sub-select is nominally of a domain type, we should not have to recheck the domain constraints in order to assign it to a domain-typed target. Well, yes, we should. Allowing a null to be stored in a column where the user has specified NOT NULL, no matter how the user did that, is unacceptable and I am frankly surprised that you are defending it. If it's not nominally of a domain type, please cite chapter and verse that says it isn't. I don't see anything for or against this, I just see that the domain constraints are only checked on storage or casting. And therefore, I think with these definitions: CREATE DOMAIN dom AS INTEGER CHECK (VALUE >= 0); CREATE TABLE t (d dom); INSERT INTO t (d) VALUES (1); this should be valid according to the spec: SELECT -d FROM t; and this should error: SELECT CAST(-d AS dom) FROM t; -- Vik Fearing
Re: Fix output of zero privileges in psql
On 2023-10-09 22:34 +0200, David G. Johnston write: > On Mon, Oct 9, 2023 at 12:13 PM Tom Lane wrote: > > Yeah. There is a lot of attraction in having \pset null affect these > > displays just like all other ones. The fact that they act differently > > now is a wart, not something we should replace with a different special > > case behavior. > > > > Also, I'm fairly concerned about not changing the default behavior here. > > The fact that this behavior has stood for a couple dozen years without > > many complaints indicates that there's not all that big a problem to be > > solved. I doubt that a new default behavior will be well received, > > even if it's arguably better. > > > > My position is that the default behavior should be changed such that the > distinction these reports need to make between empty privileges and default > privileges is impossible for the user to remove. I suppose the best direct > solution given that goal is for the query to simply do away with any > reliance on NULL being an indicator. Output an i18n'd "no permissions > present" line in the rare empty permissions situation and leave the empty > string for built-in default. My initial patch does exactly that. > If the consensus is to not actually fix these views to make them > environment invariant in their accuracy then so be it. Having them obey > \pset null seems like a half-measure but it at least is an improvement over > the status quo such that users are capable of altering their system to make > the reports distinguish the two cases if they realize the need. I agree. -- Erik
Re: Fix output of zero privileges in psql
On 2023-10-09 10:29 +0200, Laurenz Albe write: > On Sun, 2023-10-08 at 19:58 -0700, David G. Johnston wrote: > > We probably should add the two terms to the glossary: > > > > empty privileges: all privileges explicitly revoked from the owner and > > PUBLIC > > (where applicable), and none otherwise granted. > > > > built-in default privileges: owner having all privileges and no privileges > > granted or removed via ALTER DEFAULT PRIVILEGES > > "Empty privileges" are too unimportant to warrant an index entry. > > I can see the value of an index entry > > > privilege > default > > > Done in the attached v5 of the patch, even though this is not really > the business of this patch. +1 -- Erik
Re: Fix output of zero privileges in psql
On 2023-10-09 09:54 +0200, Laurenz Albe write: > > I tinkered a bit with your documentation. For example, the suggestion to > "\pset null" seemed to be in an inappropriate place. Tell me what you think. +1 You're right to put that sentence right after the explanation of empty privileges. -- Erik
Re: On login trigger: take three
On Fri, Oct 13, 2023 at 11:26 AM Alexander Korotkov wrote: > On Fri, Oct 13, 2023 at 4:18 AM Robert Haas wrote: > > On Thu, Oct 12, 2023 at 6:54 PM Alexander Korotkov > > wrote: > > > On Thu, Oct 12, 2023 at 8:35 PM Robert Haas wrote: > > > > > > Doesn't that mean that if you create the first login trigger in a > > > > database and leave the transaction open, nobody can connect to that > > > > database until the transaction ends? > > > > > > It doesn't mean that, because when trying to reset the flag v44 does > > > conditional lock. So, if another transaction is holding the log we > > > will just skip resetting the flag. So, the flag will be cleared on > > > the first connection after that transaction ends. > > > > But in the scenario I am describing the flag is being set, not reset. > > Sorry, it seems I just missed some logical steps. Imagine, that > transaction A created the first login trigger and hangs open. Then > the new connection B sees no visible triggers yet, but dathasloginevt > flag is set. Therefore, connection B tries conditional lock but just > gives up because the lock is held by transaction A. > > Also, note that the lock has been just some lock with a custom tag. > It doesn't effectively block the database. You may think about it as > of custom advisory lock. I've revised the comments about the lock a bit. I've also run some tests regarding the connection time (5 runs). v45, event_triggers=on: avg=3.081ms, min=2.992ms, max=3.146ms v45, event_triggers=off: avg=3.132ms, min=3.048ms, max=3.186ms master: 3.080ms, min=3.023ms, max=3.167ms So, no measurable overhead (not surprising since no extra catalog lookup). I think this patch is in the commitable shape. Any objections? -- Regards, Alexander Korotkov From 1d59d7fe6f510a4606600b9bf41b5dbaa6d2b283 Mon Sep 17 00:00:00 2001 From: Alexander Korotkov Date: Mon, 9 Oct 2023 15:00:26 +0300 Subject: [PATCH] Add support event triggers on authenticated login This commit introduces trigger on login event, allowing to fire some actions right on the user connection. This can be useful for logging or connection check purposes as well as for some personalization of environment. Usage details are described in the documentation included, but shortly usage is the same as for other triggers: create function returning event_trigger and then create event trigger on login event. In order to prevent the connection time overhead when there are no triggers the commit introduces pg_database.dathasloginevt flag, which indicates database has active login triggers. This flag is set by CREATE/ALTER EVENT TRIGGER command, and unset at connection time when no active triggers found. Author: Konstantin Knizhnik, Mikhail Gribkov Discussion: https://postgr.es/m/0d46d29f-4558-3af9-9c85-7774e14a7709%40postgrespro.ru Reviewed-by: Pavel Stehule, Takayuki Tsunakawa, Greg Nancarrow, Ivan Panchenko Reviewed-by: Daniel Gustafsson, Teodor Sigaev, Robert Haas, Andres Freund Reviewed-by: Tom Lane, Andrey Sokolov, Zhihong Yu, Sergey Shinderuk Reviewed-by: Gregory Stark, Nikita Malakhov, Ted Yu --- doc/src/sgml/bki.sgml | 2 +- doc/src/sgml/catalogs.sgml| 13 ++ doc/src/sgml/ecpg.sgml| 2 + doc/src/sgml/event-trigger.sgml | 94 + src/backend/commands/dbcommands.c | 17 +- src/backend/commands/event_trigger.c | 179 +- src/backend/storage/lmgr/lmgr.c | 38 src/backend/tcop/postgres.c | 4 + src/backend/utils/cache/evtcache.c| 2 + src/backend/utils/init/globals.c | 2 + src/backend/utils/init/postinit.c | 1 + src/bin/pg_dump/pg_dump.c | 5 + src/bin/psql/tab-complete.c | 4 +- src/include/catalog/pg_database.dat | 2 +- src/include/catalog/pg_database.h | 3 + src/include/commands/event_trigger.h | 1 + src/include/miscadmin.h | 2 + src/include/storage/lmgr.h| 2 + src/include/tcop/cmdtaglist.h | 1 + src/include/utils/evtcache.h | 3 +- .../authentication/t/005_login_trigger.pl | 157 +++ src/test/recovery/t/001_stream_rep.pl | 23 +++ src/test/regress/expected/event_trigger.out | 45 + src/test/regress/sql/event_trigger.sql| 26 +++ 24 files changed, 608 insertions(+), 20 deletions(-) create mode 100644 src/test/authentication/t/005_login_trigger.pl diff --git a/doc/src/sgml/bki.sgml b/doc/src/sgml/bki.sgml index f71644e3989..315ba819514 100644 --- a/doc/src/sgml/bki.sgml +++ b/doc/src/sgml/bki.sgml @@ -184,7 +184,7 @@ descr => 'database\'s default template', datname => 'template1', encoding => 'ENCODING', datlocprovider => 'LOCALE_PROVIDER', datistemplate => 't', - datallowconn => 't', datconnlimit => '-1', datfrozenxid => '0', +
Re: LLVM 16 (opaque pointers)
On Sat, Oct 14, 2023 at 3:56 AM Andres Freund wrote: > On 2023-10-13 16:44:13 +0200, Dmitry Dolgov wrote: > > Here is what I had in mind (only this part in the second patch was changed). > > Makes sense to me. I think we'll likely eventually want to use a custom > pipeline anyway, and I think we should consider using an optimization level > inbetween "not at all" "as hard as possible"... Thanks Dmitry and Andres. I'm planning to commit these today if there are no further comments.
Re: LLVM 16 (opaque pointers)
On Wed, Oct 11, 2023 at 10:31 PM Ronan Dunklau wrote: > Le mercredi 11 octobre 2023, 10:59:50 CEST Thomas Munro a écrit : > > The back-patch to 12 was a little trickier than anticipated, but after > > taking a break and trying again I now have PG 12...17 patches that > > I've tested against LLVM 10...18 (that's 54 combinations), in every > > case only with the clang corresponding to LLVM. > > Thank you Thomas for those patches, and the extensive testing, I will run my > own and let you know. Thanks! No news is good news, I hope? I'm hoping to commit this today. > > I've attached only the patches for master, but the 12-16 versions are > > available at https://github.com/macdice/postgres/tree/llvm16-$N in > > case anyone has comments on those. > > For PG13 and PG12, it looks like the ExecEvalBoolSubroutineTemplate is not > used anywhere, as ExecEvalBoolSubroutine was introduced in PG14 if I'm not > mistaken. Right, looks like I can remove that in those branches.
Re: should frontend tools use syncfs() ?
On Mon, Oct 09, 2023 at 02:34:27PM -0500, Nathan Bossart wrote: > On Mon, Oct 09, 2023 at 11:14:39AM -0500, Nathan Bossart wrote: >> Thanks. I've made a couple of small changes, but otherwise I think this >> one is just about ready. > > I forgot to rename one thing. Here's a v13 with that fixed. Committed. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Questions about the new subscription parameter: password_required
On Fri, 2023-10-13 at 11:18 +0200, Benoit Lobréau wrote: > I tried adding a section in "Logical Replication > Subscription" with > the text you suggested and links in the CREATE / ALTER SUBSRIPTION > commands. > > Is it better ? Minor comments: * Use possessive "its" instead of the contraction, i.e. "before transferring its ownership". * I like that docs cover the case where a password is specified, but the remote server doesn't require one. But the warning is the wrong place to explain that, it should be in the main behavioral description in 31.2.2. * The warning feels like it has too many negatives and confused me at first. I struggled myself a bit to come up with something less confusing, but perhaps less is more: "Ensure that password_required is properly set before transferring ownership of a subscription to a non- superuser, otherwise the subscription may start to fail." * Missing space in the warning after "password_required = true" * Mention that a non-superuser-owned subscription with password_required = false is partially locked down, e.g. the owner can't change the connection string any more. * 31.2.2 could probably be in the CREATE SUBSCRIPTION docs instead, and linked from the ALTER docs. That's fairly normal for other commands and I'm not sure there needs to be a separate section in logical replication. I don't have a strong opinion here. I like the changes; this is a big improvement. I'll leave it to Robert to commit it, so that he can ensure it matches how he expected the feature to be used and sufficiently covers the behavioral aspects. Regards, Jeff Davis
Re: Performance degradation on concurrent COPY into a single relation in PG16.
Hi, On 2023-10-13 10:39:10 -0700, Andres Freund wrote: > On 2023-10-12 09:24:19 -0700, Andres Freund wrote: > > I kind of had hoped somebody would comment on the approach. Given that > > nobody > > has, I'll push the minimal fix of resetting the state in > > ReleaseBulkInsertStatePin(), even though I think architecturally that's not > > great. > > I spent some time working on a test that shows the problem more cheaply than > the case upthread. I think it'd be desirable to have a test that's likely to > catch an issue like this fairly quickly. We've had other problems in this > realm before - there's only a single test that fails if I remove the > ReleaseBulkInsertStatePin() call, and I don't think that's guaranteed at all. > > I'm a bit on the fence on how large to make the relation. For me the bug > triggers when filling both relations up to the 3rd block, but how many rows > that takes is somewhat dependent on space utilization on the page and stuff. > > Right now the test uses data/desc.data and ends up with 328kB and 312kB in two > partitions. Alternatively I could make the test create a new file to load with > copy that has fewer rows than data/desc.data - I didn't see another data file > that works conveniently and has fewer rows. The copy is reasonably fast, even > under something as expensive as rr (~60ms). So I'm inclined to just go with > that? Patch with fix and test attached (0001). Given how easy a missing ReleaseBulkInsertStatePin() can cause corruption (not due to this bug, but the issue fixed in b1ecb9b3fcf), I think we should consider adding an assertion along the lines of 0002 to HEAD. Perhaps adding a new bufmgr.c function to avoid having to get the fields in the buffer tag we don't care about. Or perhaps we should just promote the check to an elog, we already call BufferGetBlockNumber(), using BufferGetTag() instead doesn't cost much more. Greetings, Andres Freund >From e202cf5c3cae0551ae8fabd2629b4f9c6a0734d5 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Fri, 13 Oct 2023 10:54:16 -0700 Subject: [PATCH v1 1/2] Fix bulk table extension when copying into multiple partitions When COPYing into a partitioned table that does now permit the use of table_multi_insert(), we could error out with ERROR: could not read block NN in file "base/...": read only 0 of 8192 bytes because BulkInsertState->next_free was not reset between partitions. This problem occured only when not able to use table_multi_insert(), as a dedicated BulkInsertState for each partition is used in that case. The bug was introduced in 00d1e02be24, but it was hard to hit at that point as commonly bulk relation extension is not used when not using table_multi_insert(). It became more likely after 82a4edabd27, which expanded the use of bulk extension. To fix the bug, reset the bulk relation extension state in BulkInsertState in ReleaseBulkInsertStatePin(). That was added (in b1ecb9b3fcf) to tackle a very similar issue. Obviously the name is not quite correct, but there might be external callers, and bulk insert state needs to be reset in precisely in the situations that ReleaseBulkInsertStatePin() already needed to be called. Medium term the better fix likely is to disallow reusing BulkInsertState across relations. Add a test that, without the fix, reproduces #18130 in most configurations. The test also catches the problem fixed in b1ecb9b3fcf when run with small shared_buffers. Reported-by: Ivan Kolombet Analyzed-by: Tom Lane Analyzed-by: Andres Freund Bug: #18130 Discussion: https://postgr.es/m/18130-7a86a7356a75209d%40postgresql.org Discussion: https://postgr.es/m/257696.1695670946%40sss.pgh.pa.us Backpatch: 16- --- src/backend/access/heap/heapam.c | 11 + src/test/regress/expected/copy.out | 37 ++ src/test/regress/sql/copy.sql | 37 ++ 3 files changed, 85 insertions(+) diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 88a123d38a6..51eceaca6cf 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -1792,6 +1792,17 @@ ReleaseBulkInsertStatePin(BulkInsertState bistate) if (bistate->current_buf != InvalidBuffer) ReleaseBuffer(bistate->current_buf); bistate->current_buf = InvalidBuffer; + + /* + * Despite the name, we also release information about bulk + * inserts. Otherwise we can end up erroring out due to looking for free + * space in ->next_free of one partition, even though ->next_free was set + * when extending another partition. It's obviously also could be bad for + * efficiency to look at existing blocks at offsets from another + * partition, even if we don't error out. + */ + bistate->next_free = InvalidBlockNumber; + bistate->last_free = InvalidBlockNumber; } diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out index a5912c13a8c..b48365ec981 100644 --- a/src/test/regress/expected/copy.out +++
Re: Performance degradation on concurrent COPY into a single relation in PG16.
Hi, On 2023-10-12 09:24:19 -0700, Andres Freund wrote: > On 2023-10-12 11:44:09 -0400, Tom Lane wrote: > > Andres Freund writes: > > >> On 2023-09-25 15:42:26 -0400, Tom Lane wrote: > > >>> I just did a git bisect run to discover when the failure documented > > >>> in bug #18130 [1] started. And the answer is commit 82a4edabd. > > > > > Uh, huh. The problem is that COPY uses a single BulkInsertState for > > > multiple > > > partitions. Which to me seems to run counter to the following comment: > > > *The caller can also provide a BulkInsertState object to > > > optimize many > > > *insertions into the same relation. This keeps a pin on the > > > current > > > *insertion target page (to save pin/unpin cycles) and also > > > passes a > > > *BULKWRITE buffer selection strategy object to the buffer > > > manager. > > > *Passing NULL for bistate selects the default behavior. > > > > > The reason this doesn't cause straight up corruption due to reusing a pin > > > from > > > another relation is that b1ecb9b3fcfb added ReleaseBulkInsertStatePin() > > > and a > > > call to it. But I didn't make ReleaseBulkInsertStatePin() reset the bulk > > > insertion state, which is what leads to the errors from the bug report. > > > > > Resetting the relevant BulkInsertState fields fixes the problem. But I'm > > > not > > > sure that's the right fix. ISTM that independent of whether we fix this > > > via > > > ReleaseBulkInsertStatePin() resetting the fields or via not reusing > > > BulkInsertState, we should add assertions defending against future issues > > > like > > > this (e.g. by adding a relation field to BulkInsertState in cassert > > > builds, > > > and asserting that the relation is the same as in prior calls unless > > > ReleaseBulkInsertStatePin() has been called). > > > > Ping? We really ought to have a fix for this committed in time for > > 16.1. > > I kind of had hoped somebody would comment on the approach. Given that nobody > has, I'll push the minimal fix of resetting the state in > ReleaseBulkInsertStatePin(), even though I think architecturally that's not > great. I spent some time working on a test that shows the problem more cheaply than the case upthread. I think it'd be desirable to have a test that's likely to catch an issue like this fairly quickly. We've had other problems in this realm before - there's only a single test that fails if I remove the ReleaseBulkInsertStatePin() call, and I don't think that's guaranteed at all. I'm a bit on the fence on how large to make the relation. For me the bug triggers when filling both relations up to the 3rd block, but how many rows that takes is somewhat dependent on space utilization on the page and stuff. Right now the test uses data/desc.data and ends up with 328kB and 312kB in two partitions. Alternatively I could make the test create a new file to load with copy that has fewer rows than data/desc.data - I didn't see another data file that works conveniently and has fewer rows. The copy is reasonably fast, even under something as expensive as rr (~60ms). So I'm inclined to just go with that? Greetings, Andres Freund
Re: pg_stat_statements and "IN" conditions
On Tue, Jul 04, 2023 at 09:02:56PM +0200, Dmitry Dolgov wrote: >> On Mon, Jul 03, 2023 at 09:46:11PM -0700, Nathan Bossart wrote: >> Also, it seems counterintuitive that queries with fewer than 10 >> constants are not merged. > > Why? What would be your intuition using this feature? For the "powers" setting, I would've expected queries with 0-9 constants to be merged. Then 10-99, 100-999, 1000-, etc. I suppose there might be an argument for separating 0 from 1-9, too. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: BRIN minmax multi - incorrect distance for infinite timestamp/date
On Fri, 13 Oct 2023 at 13:17, Tomas Vondra wrote: > > I do plan to backpatch this, yes. I don't think there are many people > affected by this (few people are using infinite dates/timestamps, but > maybe the overflow could be more common). > OK, though I doubt that such values are common in practice. There's also an overflow problem in brin_minmax_multi_distance_interval() though, and I think that's worse because overflows there throw "interval out of range" errors, which can prevent index creation or inserts. There's a patch (0009 in [1]) as part of the infinite interval work, but that just uses interval_mi(), and so doesn't fix the interval-out-of-range errors, except for infinite intervals, which are treated as special cases, which I don't think is really necessary. I think this should be rewritten to compute delta from ia and ib without going via an intermediate Interval value. I.e., instead of computing "result", just do something like dayfraction = (ib->time % USECS_PER_DAY) - (ia->time % USECS_PER_DAY); days = (ib->time / USECS_PER_DAY) - (ia->time / USECS_PER_DAY); days += (int64) ib->day - (int64) ia->day; days += ((int64) ib->month - (int64) ia->month) * INT64CONST(30); then convert to double precision as it does now: delta = (double) days + dayfraction / (double) USECS_PER_DAY; So the first part is exact 64-bit integer arithmetic, with no chance of overflow, and it'll handle "infinite" intervals just fine, when that feature gets added. Regards, Dean [1] https://www.postgresql.org/message-id/CAExHW5u1JE7dxK=wlzqhcszntoxqzjdiermhrepw6r8w6kc...@mail.gmail.com
Re: LLVM 16 (opaque pointers)
On 2023-10-13 16:44:13 +0200, Dmitry Dolgov wrote: > Here is what I had in mind (only this part in the second patch was changed). Makes sense to me. I think we'll likely eventually want to use a custom pipeline anyway, and I think we should consider using an optimization level inbetween "not at all" "as hard as possible"... Greetings, Andres Freund
Re: LLVM 16 (opaque pointers)
Hi, On 2023-10-13 11:06:21 +0200, Dmitry Dolgov wrote: > > On Thu, Oct 12, 2023 at 04:31:20PM -0700, Andres Freund wrote: > > I also don't think we should add the mem2reg pass outside of -O0 - running > > it > > after a real optimization pipeline doesn't seem useful and might even make > > the > > code worse? mem2reg is included in default (and obviously also in O3). > > My understanding was that while mem2reg is included everywhere above > -O0, this set of passes won't hurt. But yeah, if you say it could > degrade the final result, it's better to not do this. I'll update this > part. It's indeed included anywhere above that, but adding it explicitly to the schedule means it's excuted twice: echo 'int foo(int a) { return a / 343; }' | clang-16 -emit-llvm -x c -c -o - -S -|sed -e 's/optnone//'|opt-17 -debug-pass-manager -passes='default,mem2reg' -o /dev/null 2>&1|grep Promote Running pass: PromotePass on foo (2 instructions) Running pass: PromotePass on foo (2 instructions) The second one is in a point in the pipeline where it doesn't help. It also requires another analysis pass to be executed unnecessarily. Greetings, Andres Freund
Re: LLVM 16 (opaque pointers)
> On Fri, Oct 13, 2023 at 11:06:21AM +0200, Dmitry Dolgov wrote: > > On Thu, Oct 12, 2023 at 04:31:20PM -0700, Andres Freund wrote: > > I don't think the "function(no-op-function),no-op-module" bit does something > > particularly useful? > > Right, looks like leftovers after verifying which passes were actually > applied. My bad, could be removed. > > > I also don't think we should add the mem2reg pass outside of -O0 - running > > it > > after a real optimization pipeline doesn't seem useful and might even make > > the > > code worse? mem2reg is included in default (and obviously also in O3). > > My understanding was that while mem2reg is included everywhere above > -O0, this set of passes won't hurt. But yeah, if you say it could > degrade the final result, it's better to not do this. I'll update this > part. Here is what I had in mind (only this part in the second patch was changed). >From 040121be6d150e8f18f154a64b409baef2b15ffb Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Mon, 9 May 2022 08:27:47 +1200 Subject: [PATCH v4 1/2] jit: Support opaque pointers in LLVM 16. Remove use of LLVMGetElementType() and provide the type of all pointers to LLVMBuildXXX() functions when emitting IR, as required by modern LLVM versions[1]. * For LLVM <= 14, we'll still use the old LLVMBuildXXX() functions. * For LLVM == 15, we'll continue to do the same, explicitly opting out of opaque pointer mode. * For LLVM >= 16, we'll use the new LLVMBuildXXX2() functions that take the extra type argument. The difference is hidden behind some new IR emitting wrapper functions l_load(), l_gep(), l_call() etc. The change is mostly mechanical, except that at each site the correct type had to be provided. In some places we needed to do some extra work to get functions types, including some new wrappers for C++ APIs that are not yet exposed by in LLVM's C API, and some new "example" functions in llvmjit_types.c because it's no longer possible to start from the function pointer type and ask for the function type. Back-patch to 12, because it's a little tricker in 11 and we agreed not to put the latest LLVM support into the upcoming final release of 11. [1] https://llvm.org/docs/OpaquePointers.html Reviewed-by: Dmitry Dolgov <9erthali...@gmail.com> Reviewed-by: Ronan Dunklau Reviewed-by: Andres Freund Discussion: https://postgr.es/m/CA%2BhUKGKNX_%3Df%2B1C4r06WETKTq0G4Z_7q4L4Fxn5WWpMycDj9Fw%40mail.gmail.com --- src/backend/jit/llvm/llvmjit.c| 59 ++-- src/backend/jit/llvm/llvmjit_deform.c | 119 src/backend/jit/llvm/llvmjit_expr.c | 401 -- src/backend/jit/llvm/llvmjit_types.c | 39 ++- src/backend/jit/llvm/llvmjit_wrap.cpp | 12 + src/backend/jit/llvm/meson.build | 2 +- src/include/jit/llvmjit.h | 7 + src/include/jit/llvmjit_emit.h| 106 +-- 8 files changed, 481 insertions(+), 264 deletions(-) diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c index 4dfaf797432..1c8e1ab3a76 100644 --- a/src/backend/jit/llvm/llvmjit.c +++ b/src/backend/jit/llvm/llvmjit.c @@ -64,8 +64,10 @@ LLVMTypeRef TypeStorageBool; LLVMTypeRef TypePGFunction; LLVMTypeRef StructNullableDatum; LLVMTypeRef StructHeapTupleData; +LLVMTypeRef StructMinimalTupleData; LLVMTypeRef StructTupleDescData; LLVMTypeRef StructTupleTableSlot; +LLVMTypeRef StructHeapTupleHeaderData; LLVMTypeRef StructHeapTupleTableSlot; LLVMTypeRef StructMinimalTupleTableSlot; LLVMTypeRef StructMemoryContextData; @@ -76,8 +78,11 @@ LLVMTypeRef StructExprState; LLVMTypeRef StructAggState; LLVMTypeRef StructAggStatePerGroupData; LLVMTypeRef StructAggStatePerTransData; +LLVMTypeRef StructPlanState; LLVMValueRef AttributeTemplate; +LLVMValueRef ExecEvalSubroutineTemplate; +LLVMValueRef ExecEvalBoolSubroutineTemplate; static LLVMModuleRef llvm_types_module = NULL; @@ -451,11 +456,7 @@ llvm_pg_var_type(const char *varname) if (!v_srcvar) elog(ERROR, "variable %s not in llvmjit_types.c", varname); - /* look at the contained type */ - typ = LLVMTypeOf(v_srcvar); - Assert(typ != NULL && LLVMGetTypeKind(typ) == LLVMPointerTypeKind); - typ = LLVMGetElementType(typ); - Assert(typ != NULL); + typ = LLVMGlobalGetValueType(v_srcvar); return typ; } @@ -467,12 +468,14 @@ llvm_pg_var_type(const char *varname) LLVMTypeRef llvm_pg_var_func_type(const char *varname) { - LLVMTypeRef typ = llvm_pg_var_type(varname); + LLVMValueRef v_srcvar; + LLVMTypeRef typ; + + v_srcvar = LLVMGetNamedFunction(llvm_types_module, varname); + if (!v_srcvar) + elog(ERROR, "function %s not in llvmjit_types.c", varname); - /* look at the contained type */ - Assert(LLVMGetTypeKind(typ) == LLVMPointerTypeKind); - typ = LLVMGetElementType(typ); - Assert(typ != NULL && LLVMGetTypeKind(typ) == LLVMFunctionTypeKind); + typ = LLVMGetFunctionType(v_srcvar); return typ; } @@ -502,7 +505,7 @@ llvm_pg_func(LLVMModuleRef mod, const char *funcname) v_fn = LLVMAddFunction(mod,
Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"
On 10/12/23 19:15, Michael Paquier wrote: On Thu, Oct 12, 2023 at 10:41:39AM -0400, David Steele wrote: After some more thought, I think we could massage the "pg_control in backup_label" method into something that could be back patched, with more advanced features (e.g. error on backup_label and pg_control both present on initial cluster start) saved for HEAD. I doubt that anything changed in this area would be in the backpatchable zone, particularly as it would involve protocol changes within the replication commands, so I'd recommend to focus on HEAD. I can't see why there would be any protocol changes, but perhaps I am missing something? One thing that does have to change, however, is the ordering of backup_label in the base tar file. Right now it is at the beginning but we need it to be at the end like pg_control is now. I'm working up a POC patch now and hope to have something today or tomorrow. I think it makes sense to at least have a look at an alternative solution before going forward. Regards, -David
Re: pg_stat_statements and "IN" conditions
> On Fri, Oct 13, 2023 at 05:07:00PM +0900, Michael Paquier wrote: > Now, it doesn't mean that this approach with the "powers" will never > happen, but based on the set of opinions I am gathering on this thread > I would suggest to rework the patch as follows: > - First implement an on/off switch that reduces the lists in IN and/or > ANY to one parameter. Simply. > - Second, refactor the powers routine. > - Third, extend the on/off switch, or just implement a threshold with > a second switch. Well, if it will help move this patch forward, why not. To clarify, I'm going to split the current implementation into three patches, one for each point you've mentioned. > When it comes to my opinion, I am not seeing any objections to the > feature as a whole, and I'm OK with the first step. I'm also OK to > keep the door open for more improvements in controlling how these > IN/ANY lists show up, but there could be more than just the number of > items as parameter (say the query size, different behaviors depending > on the number of clauses in queries, subquery context or CTEs/WITH, > etc. just to name a few things coming in mind). Interesting point, but now it's my turn to have troubles imagining the case, where list representation could be controlled depending on something else than the number of elements in it. Do you have any specific example in mind?
Re: PostgreSQL domains and NOT NULL constraint
Hello Equaling a domain with a type is really confusing because why, for instance, in this case the following is possible without defining any additional operators. CREATE DOMAIN d_name VARCHAR(50) NOT NULL; CREATE DOMAIN d_description VARCHAR(1000) NOT NULL; CREATE TABLE x(name d_name, description d_description); SELECT * FROM x WHERE name=description; Isn't it so that domains are not types and for this reason there are separate CREATE DOMAIN and CREATE TYPE statements?! In my opinion the Notes section of CREATE DOMAIN documentation should offer better examples. The two examples that I provided in my demonstration seemed very far fetched and artificial. Frankly, I have difficulties in imagining why someone would like to write statements like that in a production environment and how the proper enforcement of NOT NULL constraints of domains could break things. Lets say I have a column that I have declared mandatory by using a domain, but somehow I have added NULLs to the column, and if it is not possible any more, then things break down. If I want to permit NULLs, then ALTER DOMAIN d DROP NOT NULL; will fix it with one stroke. If I do not want to permit NULLs but I have registered NULLs, then this is a data quality issue that has to be addressed. Currently there is a feature (NOT NULL of domain) that the documentation explicitly suggests not to use. Isn't it in this case better to remove this feature completely?! If this would break something, then it would mean that systems actually rely on this constraint. Best regards Erki Eessaar From: Tom Lane Sent: Friday, October 13, 2023 08:37 To: Vik Fearing Cc: Erki Eessaar ; pgsql-hackers@lists.postgresql.org Subject: Re: PostgreSQL domains and NOT NULL constraint Vik Fearing writes: > Regardless of what the spec may or may not say about v1.d, it still > remains that nulls should not be allowed in a *base table* if the domain > says nulls are not allowed. Not mentioned in this thread but the > constraints are also applied when CASTing to the domain. Hmph. The really basic problem here, I think, is that the spec wants to claim that a domain is a data type, but then it backs off and limits where the domain's constraints need to hold. That's fundamentally inconsistent. It's like claiming that 'foobarbaz' is a valid value of type numeric as long as it's only in flight within a query and you haven't tried to store it into a table. Practical problems with this include: * If a function declares its argument as being of a domain type, can it expect that the passed value obeys the constraints? * If a function declares its result as being of a domain type, is it required to return a result that obeys the constraints? (This has particular force for RETURNS NULL ON NULL INPUT functions, for which we just automatically return NULL given a NULL input without any consideration of whether the result type nominally prohibits that.) * If a plpgsql function has a variable that is declared to be of domain type, do we enforce the domain's constraints when assigning? * If a composite type has a column of a domain type, do we enforce the domain's constraints when assigning or casting to that? AFAICS, the spec's position leaves all of these as judgment calls, or else you might claim that none of the above cases are even allowed to be declared per spec. I don't find either of those satisfactory, so I reiterate my position that the committee hasn't thought this through. > As you know, I am more than happy to (try to) amend the spec where > needed, but Erki's complaint of a null value being allowed in a base > table is clearly a bug in our implementation regardless of what we do > with views. I agree it's not a good behavior, but I still say it's traceable to schizophenia in the spec. If the result of a sub-select is nominally of a domain type, we should not have to recheck the domain constraints in order to assign it to a domain-typed target. If it's not nominally of a domain type, please cite chapter and verse that says it isn't. regards, tom lane
Re: [RFC] Add jit deform_counter
> On 12 Oct 2023, at 15:40, Daniel Gustafsson wrote: >> On 12 Oct 2023, at 15:37, Nazir Bilal Yavuz wrote: >> I realized that pg_stat_statements is bumped to 1.11 with this patch >> but oldextversions test is not updated. So, I attached a patch for >> updating oldextversions. > > Thanks for the patch, that was an oversight in the original commit for this. > From a quick look it seems correct, I'll have another look later today and > will > then apply it. Applied, thanks! -- Daniel Gustafsson
Re: BRIN minmax multi - incorrect distance for infinite timestamp/date
On 10/13/23 14:04, Dean Rasheed wrote: > On Fri, 13 Oct 2023 at 11:44, Tomas Vondra > wrote: >> >> On 10/13/23 11:21, Dean Rasheed wrote: >>> >>> Is this only inefficient? Or can it also lead to wrong query results? >> >> I don't think it can produce incorrect results. It only affects which >> values we "merge" into an interval when building the summaries. >> > > Ah, I get it now. These "distance" support functions are only used to > see how far apart 2 ranges are, for the purposes of the algorithm that > merges the 2 closest ranges. So if it gets it wrong, it only leads to > a poor choice of ranges to merge, making the query inefficient, but > still correct. > Right. > Presumably, that also makes this kind of change safe to back-patch > (not sure if you were planning to do that?), since it will only affect > range merging choices when inserting new values into existing indexes. > I do plan to backpatch this, yes. I don't think there are many people affected by this (few people are using infinite dates/timestamps, but maybe the overflow could be more common). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: BRIN minmax multi - incorrect distance for infinite timestamp/date
On Fri, 13 Oct 2023 at 11:44, Tomas Vondra wrote: > > On 10/13/23 11:21, Dean Rasheed wrote: > > > > Is this only inefficient? Or can it also lead to wrong query results? > > I don't think it can produce incorrect results. It only affects which > values we "merge" into an interval when building the summaries. > Ah, I get it now. These "distance" support functions are only used to see how far apart 2 ranges are, for the purposes of the algorithm that merges the 2 closest ranges. So if it gets it wrong, it only leads to a poor choice of ranges to merge, making the query inefficient, but still correct. Presumably, that also makes this kind of change safe to back-patch (not sure if you were planning to do that?), since it will only affect range merging choices when inserting new values into existing indexes. Regards, Dean
RE: pg_upgrade's interaction with pg_resetwal seems confusing
Dear hackers, > > > > > > I mean instead of resetwal directly modifying the control file we > > > > will modify that value in the server using the binary_upgrade function > > > > and then have that value flush to the disk by shutdown checkpoint. > > > > > > > > > > True, the alternative to consider is to let pg_upgrade update the > > > control file by itself with the required value of OID. The point I am > > > slightly worried about doing via server-side function is that some > > > online and or shutdown checkpoint can update other values in the > > > control file as well whereas if we do via pg_upgrade, we can have > > > better control over just updating the OID. > > > > Yeah, thats a valid point. > > > > But OTOH, just updating the control file via pg_upgrade may not be > sufficient because we should keep the shutdown checkpoint record also > updated with that value. See how pg_resetwal achieves it via > RewriteControlFile() and WriteEmptyXLOG(). So, considering both the > approaches it seems better to go with a server-side function as Robert > suggested. Based on these discussion, I implemented a server-side approach. An attached patch adds a upgrade function which sets ShmemVariableCache->nextOid. It is called at the end of the upgrade. Comments and name of issue_warnings_and_set_wal_level() is also updated because they become outdated. Best Regards, Hayato Kuroda FUJITSU LIMITED 0001-pg_upgrade-use-upgrade-function-to-restore-OID.patch Description: 0001-pg_upgrade-use-upgrade-function-to-restore-OID.patch
Re: BRIN minmax multi - incorrect distance for infinite timestamp/date
On 10/13/23 11:21, Dean Rasheed wrote: > On Thu, 12 Oct 2023 at 23:43, Tomas Vondra > wrote: >> >> Ashutosh Bapat reported me off-list a possible issue in how BRIN >> minmax-multi calculate distance for infinite timestamp/date values. >> >> The current code does this: >> >> if (TIMESTAMP_NOT_FINITE(dt1) || TIMESTAMP_NOT_FINITE(dt2)) >> PG_RETURN_FLOAT8(0); >> > > Yes indeed, that looks wrong. I noticed the same thing while reviewing > the infinite interval patch. > >> so means infinite values are "very close" to any other value, and thus >> likely to be merged into a summary range. That's exactly the opposite of >> what we want to do, possibly resulting in inefficient indexes. >> > > Is this only inefficient? Or can it also lead to wrong query results? > I don't think it can produce incorrect results. It only affects which values we "merge" into an interval when building the summaries. >> Attached is a patch fixing this >> > > I wonder if it's actually necessary to give infinity any special > handling at all for dates and timestamps. For those types, "infinity" > is actually just INT_MIN/MAX, which compares correctly with any finite > value, and will be much larger/smaller than any common value, so it > seems like it isn't necessary to give "infinite" values any special > treatment. That would be consistent with date_cmp() and > timestamp_cmp(). > Right, but > Something else that looks wrong about that BRIN code is that the > integer subtraction might lead to overflow -- it is subtracting two > integer values, then casting the result to float8. It should cast each > input before subtracting, more like brin_minmax_multi_distance_int8(). > ... it also needs to fix this, otherwise it overflows. Consider delta = dt2 - dt1; and assume dt1 is INT64_MIN, or that dt2 is INT64_MAX. > IOW, I think brin_minmax_multi_distance_date/timestamp could be made > basically identical to brin_minmax_multi_distance_int4/8. > Right. Attached is a patch doing it this way. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Companydiff --git a/src/backend/access/brin/brin_minmax_multi.c b/src/backend/access/brin/brin_minmax_multi.c index f8b2a3f9bc6..3d5e6d0c8f6 100644 --- a/src/backend/access/brin/brin_minmax_multi.c +++ b/src/backend/access/brin/brin_minmax_multi.c @@ -2084,10 +2084,13 @@ brin_minmax_multi_distance_date(PG_FUNCTION_ARGS) DateADT dateVal1 = PG_GETARG_DATEADT(0); DateADT dateVal2 = PG_GETARG_DATEADT(1); - if (DATE_NOT_FINITE(dateVal1) || DATE_NOT_FINITE(dateVal2)) - PG_RETURN_FLOAT8(0); + /* + * We know the values are range boundaries, but the range may be collapsed + * (i.e. single points), with equal values. + */ + Assert(dateVal1 <= dateVal2); - PG_RETURN_FLOAT8(dateVal1 - dateVal2); + PG_RETURN_FLOAT8((double) dateVal2 - (double) dateVal1); } /* @@ -2136,19 +2139,16 @@ brin_minmax_multi_distance_timetz(PG_FUNCTION_ARGS) Datum brin_minmax_multi_distance_timestamp(PG_FUNCTION_ARGS) { - float8 delta = 0; - Timestamp dt1 = PG_GETARG_TIMESTAMP(0); Timestamp dt2 = PG_GETARG_TIMESTAMP(1); - if (TIMESTAMP_NOT_FINITE(dt1) || TIMESTAMP_NOT_FINITE(dt2)) - PG_RETURN_FLOAT8(0); - - delta = dt2 - dt1; - - Assert(delta >= 0); + /* + * We know the values are range boundaries, but the range may be collapsed + * (i.e. single points), with equal values. + */ + Assert(dt1 <= dt2); - PG_RETURN_FLOAT8(delta); + PG_RETURN_FLOAT8((double) dt2 - (double) dt1); } /*
Re: pg_upgrade's interaction with pg_resetwal seems confusing
On Fri, Oct 13, 2023 at 2:03 PM Dilip Kumar wrote: > > On Fri, Oct 13, 2023 at 11:07 AM Amit Kapila wrote: > > > > > But is this a problem? basically, we will set the > > > ShmemVariableCache->nextOid counter to the value that we want in the > > > IsBinaryUpgrade-specific function. And then the shutdown checkpoint > > > will flush that value to the control file and that is what we want no? > > > > > > > I think that can work. Basically, we need to do something like what > > SetNextObjectId() does and then let the shutdown checkpoint update the > > actual value in the control file. > > Right. > > > > I mean instead of resetwal directly modifying the control file we > > > will modify that value in the server using the binary_upgrade function > > > and then have that value flush to the disk by shutdown checkpoint. > > > > > > > True, the alternative to consider is to let pg_upgrade update the > > control file by itself with the required value of OID. The point I am > > slightly worried about doing via server-side function is that some > > online and or shutdown checkpoint can update other values in the > > control file as well whereas if we do via pg_upgrade, we can have > > better control over just updating the OID. > > Yeah, thats a valid point. > But OTOH, just updating the control file via pg_upgrade may not be sufficient because we should keep the shutdown checkpoint record also updated with that value. See how pg_resetwal achieves it via RewriteControlFile() and WriteEmptyXLOG(). So, considering both the approaches it seems better to go with a server-side function as Robert suggested. -- With Regards, Amit Kapila.
Re: Oversight in reparameterize_path_by_child leading to executor crash
On 23/8/2023 12:37, Richard Guo wrote: To fix it we may need to modify RelOptInfos for Path, BitmapHeapPath, ForeignPath and CustomPath, and modify IndexOptInfos for IndexPath. It seems that that is not easily done without postponing reparameterization of paths until createplan.c. Attached is a patch which is v5 + fix for this new issue. Having looked into the patch for a while, I couldn't answer to myself for some stupid questions: 1. If we postpone parameterization until the plan creation, why do we still copy the path node in the FLAT_COPY_PATH macros? Do we really need it? 2. I see big switches on path nodes. May it be time to create a path_walker function? I recall some thread where such a suggestion was declined, but I don't remember why. 3. Clause changing is postponed. Why does it not influence the calc_joinrel_size_estimate? We will use statistics on the parent table here. Am I wrong? -- regards, Andrey Lepikhov Postgres Professional
Re: pg_logical_emit_message() misses a XLogFlush()
On Mon, Sep 11, 2023 at 5:13 PM Michael Paquier wrote: > > I'll need a bit more input from Fujii-san before doing anything about > his comments, still it looks like a doc issue to me that may need a > backpatch to clarify how the non-transactional case behaves. > I would prefer to associate the new parameter 'flush' with non-transactional messages as per the proposed patch. Few points for you to consider: 1. +CREATE OR REPLACE FUNCTION pg_catalog.pg_logical_emit_message( +transactional boolean, +prefix text, +message text, +flush boolean DEFAULT false) +RETURNS pg_lsn +LANGUAGE INTERNAL +VOLATILE STRICT +AS 'pg_logical_emit_message_text'; + +CREATE OR REPLACE FUNCTION pg_catalog.pg_logical_emit_message( +transactional boolean, +prefix text, +message bytea, +flush boolean DEFAULT false) +RETURNS pg_lsn +LANGUAGE INTERNAL +VOLATILE STRICT Is there a reason to make the functions strict now when they were not earlier? 2. +The flush parameter (default set to +false) controls if the message is immediately +flushed to WAL or not. flush has no effect +with transactional, as the message's WAL +record is flushed when its transaction is committed. The last part of the message sounds a bit too specific (".. as the message's WAL record is flushed when its transaction is committed.") because sometimes the WAL could be flushed by walwriter even before the commit. Can we say something along the lines: ".. as the message's WAL record is flushed along with its transaction."? 3. + /* + * Make sure that the message hits disk before leaving if not emitting a + * transactional message, if flush is requested. + */ + if (!transactional && flush) Two ifs in the above comment sound a bit odd but if we want to keep it like that then adding 'and' before the second if may slightly improve it. -- With Regards, Amit Kapila.
Re: Removing unneeded self joins
On 13.10.2023 12:03, Andrei Lepikhov wrote: On 13/10/2023 15:56, a.rybakina wrote: Also I've incorporated improvements from Alena Rybakina except one for skipping SJ removal when no SJ quals is found. It's not yet clear for me if this check fix some cases. But at least optimization got skipped in some useful cases (as you can see in regression tests). Agree. I wouldn't say I like it too. But also, I suggest skipping some unnecessary assertions proposed in that patch: Assert(toKeep->relid != -1); - quite strange. Why -1? Why not all the negative numbers, at least? Assert(is_opclause(orinfo->clause)); - above we skip clauses with rinfo->mergeopfamilies == NIL. Each mergejoinable clause is already checked as is_opclause. All these changes (see in the attachment) are optional. I don't mind about asserts, maybe I misunderstood something in the patch. About skipping SJ removal when no SJ quals is found, I assume it is about it: split_selfjoin_quals(root, restrictlist, , , inner->relid, outer->relid); + if (list_length(selfjoinquals) == 0) + { + /* + * XXX: + * we would detect self-join without quals like 'x==x' if we had + * an foreign key constraint on some of other quals and this join + * haven't any columns from the outer in the target list. + * But it is still complex task. + */ + continue; + } as far as I remember, this is the place where it is checked that the SJ list is empty and it is logical, in my opinion, that no transformations should be performed if no elements are found for them. You forget we have "Degenerate" case, as Alexander mentioned above. What if you have something like that: SELECT ... FROM A a1, A a2 WHERE a1.id=1 AND a2.id=1; In this case, uniqueness can be achieved by the baserestrictinfo "A.id=1", if we have an unique index on this column. Yes, sorry, I missed it. thanks again for the explanation
Re: BRIN minmax multi - incorrect distance for infinite timestamp/date
On Thu, 12 Oct 2023 at 23:43, Tomas Vondra wrote: > > Ashutosh Bapat reported me off-list a possible issue in how BRIN > minmax-multi calculate distance for infinite timestamp/date values. > > The current code does this: > > if (TIMESTAMP_NOT_FINITE(dt1) || TIMESTAMP_NOT_FINITE(dt2)) > PG_RETURN_FLOAT8(0); > Yes indeed, that looks wrong. I noticed the same thing while reviewing the infinite interval patch. > so means infinite values are "very close" to any other value, and thus > likely to be merged into a summary range. That's exactly the opposite of > what we want to do, possibly resulting in inefficient indexes. > Is this only inefficient? Or can it also lead to wrong query results? > Attached is a patch fixing this > I wonder if it's actually necessary to give infinity any special handling at all for dates and timestamps. For those types, "infinity" is actually just INT_MIN/MAX, which compares correctly with any finite value, and will be much larger/smaller than any common value, so it seems like it isn't necessary to give "infinite" values any special treatment. That would be consistent with date_cmp() and timestamp_cmp(). Something else that looks wrong about that BRIN code is that the integer subtraction might lead to overflow -- it is subtracting two integer values, then casting the result to float8. It should cast each input before subtracting, more like brin_minmax_multi_distance_int8(). IOW, I think brin_minmax_multi_distance_date/timestamp could be made basically identical to brin_minmax_multi_distance_int4/8. Regards, Dean
Re: Questions about the new subscription parameter: password_required
On 9/23/23 03:57, Jeff Davis wrote: IIUC there is really one use case here, which is for superuser to define a subscription including the connection, and then change the owner to a non-superuser to actually run it (without being able to touch the connection string itself). I'd just document that in its own section, and mention a few caveats / mistakes to avoid. For instance, when the superuser is defining the connection, don't forget to set password_required=false, so that when you reassign to a non-superuser then the connection doesn't break. Hi, I tried adding a section in "Logical Replication > Subscription" with the text you suggested and links in the CREATE / ALTER SUBSRIPTION commands. Is it better ? -- Benoit Lobréau Consultant http://dalibo.comFrom f3f1b0ce8617971b173ea901c9735d8357955aa2 Mon Sep 17 00:00:00 2001 From: benoit Date: Thu, 12 Oct 2023 16:45:11 +0200 Subject: [PATCH] Doc patch for password_required Add documentation regarding non-superuser subscriptions with password_required=true. --- doc/src/sgml/logical-replication.sgml | 32 +++ doc/src/sgml/ref/alter_subscription.sgml | 3 ++- doc/src/sgml/ref/create_subscription.sgml | 3 ++- 3 files changed, 36 insertions(+), 2 deletions(-) diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml index 3b2fa1129e..c3faaf88cd 100644 --- a/doc/src/sgml/logical-replication.sgml +++ b/doc/src/sgml/logical-replication.sgml @@ -329,6 +329,38 @@ + + Password required + + +password_required is a subscription parameter which specifies whether +connections to the publisher made as a result of this subscription must +use password authentication. This setting is ignored when the subscription +is owned by a superuser and set to true by default. + + + +If you want to have a subscription managed by a non-superuser with a connection string without +a password, you have to set password_required = false before transferring it's +ownership. In that case, only superusers can modify the subscription. + +test_pub=# CREATE SUBSCRIPTION test_sub CONNECTION 'host=somehost port=5432 user=repli dbname=tests_pub' PUBLICATION test_pub WITH (password_required=false); +CREATE SUBSCRIPTION +test_pub=# ALTER SUBSCRIPTION test_sub OWNER TO new_sub_owner; +ALTER SUBSCRIPTION + + + + + + If the connection string doesn't contain a password or the publication + side doesn't require a password during authentication and you have set + password_required = truebefore transferring ownership, + the subscription will start failing. + + + + Examples: Set Up Logical Replication diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml index a85e04e4d6..e061c96937 100644 --- a/doc/src/sgml/ref/alter_subscription.sgml +++ b/doc/src/sgml/ref/alter_subscription.sgml @@ -50,7 +50,8 @@ ALTER SUBSCRIPTION name RENAME TO < CREATE permission on the database. In addition, to alter the owner, you must be able to SET ROLE to the new owning role. If the subscription has - password_required=false, only superusers can modify it. + password_required=false, only superusers can modify it + (See ). diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml index c1bafbfa06..33ad3d12c7 100644 --- a/doc/src/sgml/ref/create_subscription.sgml +++ b/doc/src/sgml/ref/create_subscription.sgml @@ -361,7 +361,8 @@ CREATE SUBSCRIPTION subscription_nametrue. Only superusers can set - this value to false. + this value to false + (See ). -- 2.41.0
Re: LLVM 16 (opaque pointers)
> On Thu, Oct 12, 2023 at 04:31:20PM -0700, Andres Freund wrote: > Hi, > > On 2023-10-11 21:59:50 +1300, Thomas Munro wrote: > > +#else > > + LLVMPassBuilderOptionsRef options; > > + LLVMErrorRef err; > > + int compile_optlevel; > > + char *passes; > > + > > + if (context->base.flags & PGJIT_OPT3) > > + compile_optlevel = 3; > > + else > > + compile_optlevel = 0; > > + > > + passes = > > psprintf("default,mem2reg,function(no-op-function),no-op-module", > > + compile_optlevel); > > I don't think the "function(no-op-function),no-op-module" bit does something > particularly useful? Right, looks like leftovers after verifying which passes were actually applied. My bad, could be removed. > I also don't think we should add the mem2reg pass outside of -O0 - running it > after a real optimization pipeline doesn't seem useful and might even make the > code worse? mem2reg is included in default (and obviously also in O3). My understanding was that while mem2reg is included everywhere above -O0, this set of passes won't hurt. But yeah, if you say it could degrade the final result, it's better to not do this. I'll update this part.
Re: Removing unneeded self joins
On 13/10/2023 15:56, a.rybakina wrote: Also I've incorporated improvements from Alena Rybakina except one for skipping SJ removal when no SJ quals is found. It's not yet clear for me if this check fix some cases. But at least optimization got skipped in some useful cases (as you can see in regression tests). Agree. I wouldn't say I like it too. But also, I suggest skipping some unnecessary assertions proposed in that patch: Assert(toKeep->relid != -1); - quite strange. Why -1? Why not all the negative numbers, at least? Assert(is_opclause(orinfo->clause)); - above we skip clauses with rinfo->mergeopfamilies == NIL. Each mergejoinable clause is already checked as is_opclause. All these changes (see in the attachment) are optional. I don't mind about asserts, maybe I misunderstood something in the patch. About skipping SJ removal when no SJ quals is found, I assume it is about it: split_selfjoin_quals(root, restrictlist, , , inner->relid, outer->relid); + if (list_length(selfjoinquals) == 0) + { + /* + * XXX: + * we would detect self-join without quals like 'x==x' if we had + * an foreign key constraint on some of other quals and this join + * haven't any columns from the outer in the target list. + * But it is still complex task. + */ + continue; + } as far as I remember, this is the place where it is checked that the SJ list is empty and it is logical, in my opinion, that no transformations should be performed if no elements are found for them. You forget we have "Degenerate" case, as Alexander mentioned above. What if you have something like that: SELECT ... FROM A a1, A a2 WHERE a1.id=1 AND a2.id=1; In this case, uniqueness can be achieved by the baserestrictinfo "A.id=1", if we have an unique index on this column. -- regards, Andrey Lepikhov Postgres Professional
Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound
Hi, > Those all make sense to me. > > > [...] > > Of course. Your general approach seems wise. > > Thanks for working on this. I will be relieved once this is finally > taken care of. +1, and many thanks for your attention to the patchset and all the details! -- Best regards, Aleksander Alekseev
Re: Improve the log message output of basic_archive when basic_archive.archive_directory parameter is not set
> On 13 Oct 2023, at 04:25, Nathan Bossart wrote: > > On Tue, Sep 26, 2023 at 08:13:45AM +0200, Daniel Gustafsson wrote: >>> On 26 Sep 2023, at 00:20, Nathan Bossart wrote: >>> >>> On Thu, Sep 21, 2023 at 11:18:00AM +0900, bt23nguyent wrote: -basic_archive_configured(ArchiveModuleState *state) +basic_archive_configured(ArchiveModuleState *state, char **logdetail) >>> >>> Could we do something more like GUC_check_errdetail() instead to maintain >>> backward compatibility with v16? >> >> We'd still need something exported to call into which isn't in 16, so it >> wouldn't be more than optically backwards compatible since a module written >> for >> 17 won't compile for 16, or am I missing something? > > I only mean that a module written for v16 could continue to be used in v17 > without any changes. You are right that a module that uses this new > functionality wouldn't compile for v16. Sure, but that also means that few if any existing modules will be updated to provide this =). > But IMHO the interface is nicer, That's a more compelling reason IMO. I'm not sure if I prefer the GUC_check_errdetail-like approach better, I would for sure not be opposed to reviewing a version of the patch doing it that way. Tung Nguyen: are you interested in updating the patch along these lines suggested by Nathan? > since module authors wouldn't need to worry about allocating the space > for the string or formatting the message. Well, they still need to format it; and calling _errdetail(msg), pstrdup(msg) or psprintf(msg) isn't a world of difference. -- Daniel Gustafsson
Re: Removing unneeded self joins
Also I've incorporated improvements from Alena Rybakina except one for skipping SJ removal when no SJ quals is found. It's not yet clear for me if this check fix some cases. But at least optimization got skipped in some useful cases (as you can see in regression tests). Agree. I wouldn't say I like it too. But also, I suggest skipping some unnecessary assertions proposed in that patch: Assert(toKeep->relid != -1); - quite strange. Why -1? Why not all the negative numbers, at least? Assert(is_opclause(orinfo->clause)); - above we skip clauses with rinfo->mergeopfamilies == NIL. Each mergejoinable clause is already checked as is_opclause. All these changes (see in the attachment) are optional. I don't mind about asserts, maybe I misunderstood something in the patch. About skipping SJ removal when no SJ quals is found, I assume it is about it: split_selfjoin_quals(root, restrictlist, , , inner->relid, outer->relid); + if (list_length(selfjoinquals) == 0) + { + /* + * XXX: + * we would detect self-join without quals like 'x==x' if we had + * an foreign key constraint on some of other quals and this join + * haven't any columns from the outer in the target list. + * But it is still complex task. + */ + continue; + } as far as I remember, this is the place where it is checked that the SJ list is empty and it is logical, in my opinion, that no transformations should be performed if no elements are found for them. As for the cases where SJ did not work, maybe this is just right if there are no elements for processing these cases. I'll try to check or come up with tests for them. If I'm wrong, write. On 11.10.2023 06:51, Andrei Lepikhov wrote: On 11/10/2023 02:29, Alena Rybakina wrote: I have reviewed your patch and I noticed a few things. Thanks for your efforts, I have looked at the latest version of the code, I assume that the error lies in the replace_varno_walker function, especially in the place where we check the node by type Var, and does not form any NullTest node. It's not a bug, it's an optimization we discussed with Alexander above. Secondly, I added some code in some places to catch erroneous cases and added a condition when we should not try to apply the self-join-removal transformation due to the absence of an empty self-join list after searching for it and in general if there are no joins in the query. Besides, I added a query for testing and wrote about it above. I have attached my diff file. Ok, I will look at this In addition, I found a comment for myself that was not clear to me. I would be glad if you could explain it to me. You mentioned superior outer join in the comment, unfortunately, I didn't find anything about it in the PostgreSQL code, and this explanation remained unclear to me. Could you explain in more detail what you meant? I meant here that only clauses pushed by reconsider_outer_join_clauses() can be found in the joininfo list, and they are not relevant, as you can understand. Having written that, I realized that it was a false statement. ;) - joininfo can also contain results of previous SJE iterations, look: CREATE TABLE test (oid int PRIMARY KEY); CREATE UNIQUE INDEX ON test((oid*oid)); explain SELECT count(*) FROM test c1, test c2, test c3 WHERE c1.oid=c2.oid AND c1.oid*c2.oid=c3.oid*c3.oid; explain SELECT count(*) FROM test c1, test c2, test c3 WHERE c1.oid=c3.oid AND c1.oid*c3.oid=c2.oid*c2.oid; explain SELECT count(*) FROM test c1, test c2, test c3 WHERE c3.oid=c2.oid AND c3.oid*c2.oid=c1.oid*c1.oid; Ok, I understood. Thank you for explanation. -- Regards, Alena Rybakina
Re: A new strategy for pull-up correlated ANY_SUBLINK
On 13.10.2023 10:04, Andy Fan wrote: It seems to me that the expressions "=" and "IN" are equivalent here due to the fact that the aggregated subquery returns only one value, and the result with the "IN" operation can be considered as the intersection of elements on the left and right. In this query, we have some kind of set on the left, among which there will be found or not only one element on the right. Yes, they are equivalent at the final result, but there are some differences at the execution level. the '=' case will be transformed to a Subplan whose subPlanType is EXPR_SUBLINK, so if there is more than 1 rows is returned in the subplan, error will be raised. select * from tenk1 where ten = (select ten from tenk1 i where i.two = tenk1.two ); ERROR: more than one row returned by a subquery used as an expression However the IN case would not. select * from tenk1 where ten = (select ten from tenk1 i where i.two = tenk1.two ) is OK. I think the test case you added is not related to this feature. the difference is there even without the patch. so I kept the code you changed, but not for the test case. Yes, I understand and agree with you that we should delete the last queries, except to one. The query below have a different result compared to master, and it is correct. Without your patch: explain (costs off) +SELECT * FROM tenk1 A LEFT JOIN tenk2 B ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd); QUERY PLAN - Nested Loop Left Join -> Seq Scan on tenk1 a -> Materialize -> Seq Scan on tenk2 b Filter: (SubPlan 2) SubPlan 2 -> Result InitPlan 1 (returns $1) -> Limit -> Index Scan using tenk2_hundred on tenk2 c Index Cond: (hundred IS NOT NULL) Filter: (odd = b.odd) (12 rows) After your patch: postgres=# explain (costs off) SELECT * FROM tenk1 A LEFT JOIN tenk2 B ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd); QUERY PLAN -- Nested Loop Left Join -> Seq Scan on tenk1 a -> Materialize -> Nested Loop -> Seq Scan on tenk2 b *-> Subquery Scan on "ANY_subquery" Filter: (b.hundred = "ANY_subquery".min)* -> Aggregate -> Seq Scan on tenk2 c Filter: (odd = b.odd) (10 rows) I took the liberty of adding this to your patch and added myself as reviewer, if you don't mind. Sure, the patch after your modification looks better than the original. I'm not sure how the test case around "because of got one row" is relevant to the current changes. After we reach to some agreement on the above discussion, I think v4 is good for committer to review! Thank you!) I am ready to discuss it. Actually I meant to discuss the "Unfortunately, I found a request..", looks we have reached an agreement there:) Yes, we have) -- Regards, Alena Rybakina diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out index 17df6b5dc9c..e41b728df83 100644 --- a/src/test/regress/expected/subselect.out +++ b/src/test/regress/expected/subselect.out @@ -2028,3 +2028,27 @@ ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd); -> Seq Scan on tenk2 b (11 rows) +-- we can pull up the aggregate sublink into RHS of a left join. +explain (costs off) +SELECT * FROM tenk1 A LEFT JOIN tenk2 B +ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd); + QUERY PLAN +--- + Nested Loop Left Join + -> Seq Scan on tenk1 a + -> Materialize + -> Nested Loop + -> Seq Scan on tenk2 b + -> Memoize + Cache Key: b.hundred, b.odd + Cache Mode: binary + -> Subquery Scan on "ANY_subquery" + Filter: (b.hundred = "ANY_subquery".min) + -> Result + InitPlan 1 (returns $1) + -> Limit + -> Index Scan using tenk2_hundred on tenk2 c + Index Cond: (hundred IS NOT NULL) + Filter: (odd = b.odd) +(16 rows) + diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql
Re: pg_upgrade's interaction with pg_resetwal seems confusing
On Fri, Oct 13, 2023 at 11:07 AM Amit Kapila wrote: > > > But is this a problem? basically, we will set the > > ShmemVariableCache->nextOid counter to the value that we want in the > > IsBinaryUpgrade-specific function. And then the shutdown checkpoint > > will flush that value to the control file and that is what we want no? > > > > I think that can work. Basically, we need to do something like what > SetNextObjectId() does and then let the shutdown checkpoint update the > actual value in the control file. Right. > > I mean instead of resetwal directly modifying the control file we > > will modify that value in the server using the binary_upgrade function > > and then have that value flush to the disk by shutdown checkpoint. > > > > True, the alternative to consider is to let pg_upgrade update the > control file by itself with the required value of OID. The point I am > slightly worried about doing via server-side function is that some > online and or shutdown checkpoint can update other values in the > control file as well whereas if we do via pg_upgrade, we can have > better control over just updating the OID. Yeah, thats a valid point. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: On login trigger: take three
On Fri, Oct 13, 2023 at 4:18 AM Robert Haas wrote: > On Thu, Oct 12, 2023 at 6:54 PM Alexander Korotkov > wrote: > > On Thu, Oct 12, 2023 at 8:35 PM Robert Haas wrote: > > > > Doesn't that mean that if you create the first login trigger in a > > > database and leave the transaction open, nobody can connect to that > > > database until the transaction ends? > > > > It doesn't mean that, because when trying to reset the flag v44 does > > conditional lock. So, if another transaction is holding the log we > > will just skip resetting the flag. So, the flag will be cleared on > > the first connection after that transaction ends. > > But in the scenario I am describing the flag is being set, not reset. Sorry, it seems I just missed some logical steps. Imagine, that transaction A created the first login trigger and hangs open. Then the new connection B sees no visible triggers yet, but dathasloginevt flag is set. Therefore, connection B tries conditional lock but just gives up because the lock is held by transaction A. Also, note that the lock has been just some lock with a custom tag. It doesn't effectively block the database. You may think about it as of custom advisory lock. -- Regards, Alexander Korotkov
Re: pg_stat_statements and "IN" conditions
On Tue, Jul 04, 2023 at 09:02:56PM +0200, Dmitry Dolgov wrote: > On Mon, Jul 03, 2023 at 09:46:11PM -0700, Nathan Bossart wrote: >> IMHO this adds way too much complexity to something that most users would >> expect to be an on/off switch. > > This documentation is exclusively to be precise about how does it work. > Users don't have to worry about all this, and pretty much turn it > on/off, as you've described. I agree though, I could probably write this > text a bit differently. FWIW, I am going to side with Nathan on this one, but not completely either. I was looking at the patch and it brings too much complexity for a monitoring feature in this code path. In my experience, I've seen people complain about IN/ANY never strimmed down to a single parameter in pg_stat_statements but I still have to hear from somebody outside this thread that they'd like to reduce an IN clause depending on the number of items, or something else. >> If I understand Álvaro's suggestion [0] correctly, he's saying that in >> addition to allowing "on" and "off", it might be worth allowing >> something like "powers" to yield roughly the behavior described above. >> I don't think he's suggesting that this "powers" behavior should be >> the only available option. > > Independently of what Álvaro was suggesting, I find the "powers" > approach more suitable, because it answers my own concerns about the > previous implementation. Having "on"/"off" values means we would have to > scratch heads coming up with a one-size-fit-all default value, or to > introduce another option for the actual cut-off threshold. I would like > to avoid both of those options, that's why I went with "powers" only. Now, it doesn't mean that this approach with the "powers" will never happen, but based on the set of opinions I am gathering on this thread I would suggest to rework the patch as follows: - First implement an on/off switch that reduces the lists in IN and/or ANY to one parameter. Simply. - Second, refactor the powers routine. - Third, extend the on/off switch, or just implement a threshold with a second switch. When it comes to my opinion, I am not seeing any objections to the feature as a whole, and I'm OK with the first step. I'm also OK to keep the door open for more improvements in controlling how these IN/ANY lists show up, but there could be more than just the number of items as parameter (say the query size, different behaviors depending on the number of clauses in queries, subquery context or CTEs/WITH, etc. just to name a few things coming in mind). -- Michael signature.asc Description: PGP signature
Re: Tab completion for AT TIME ZONE
On Fri, Oct 13, 2023 at 08:01:08AM +0200, Vik Fearing wrote: > On 10/13/23 06:31, Michael Paquier wrote: >> but after also removing >> the completion for "ZONE" after typing "AT TIME" because AT would be >> completed by "TIME ZONE". > > Why? The user can tab at any point. IMO this leads to unnecessary bloat in tab-complete.c because we finish with the full completion as long as "TIME" is not fully typed. -- Michael signature.asc Description: PGP signature
Re: A new strategy for pull-up correlated ANY_SUBLINK
Hi Tom, Would you like to have a look at this? The change is not big and the optimization has also been asked for many times. The attached is the v5 version and I also try my best to write a good commit message. Here is the commit fest entry: https://commitfest.postgresql.org/45/4268/ v5-0001-Pull-up-ANY-SUBLINK-with-the-necessary-lateral-su.patch Description: Binary data
Re: New WAL record to detect the checkpoint redo location
On Tue, Oct 10, 2023 at 02:43:34PM -0400, Robert Haas wrote: > Here's a new patch set. I think I've incorporated the performance > fixes that you've suggested so far into this version. I also adjusted > a couple of other things: Now looking at 0002, where you should be careful about the code indentation or koel will complain. > - After further study of a point previously raised by Amit, I adjusted > CreateCheckPoint slightly to call WALInsertLockAcquireExclusive > significantly later than before. I think that there's no real reason > to do it so early and that the current coding is probably just a > historical leftover, but it would be good to have some review here. This makes the new code call LocalSetXLogInsertAllowed() and what we set for checkPoint.PrevTimeLineID after taking the insertion locks, which should be OK. > - I added a cross-check that when starting redo from a checkpoint > whose redo pointer points to an earlier LSN that the checkpoint > itself, the record we read from that LSN must an XLOG_CHECKPOINT_REDO > record. I've mentioned as well a test in pg_walinspect after one of the checkpoints generated there, but what you do here is enough for the online case. + /* +* XLogInsertRecord will have updated RedoRecPtr, but we need to copy +* that into the record that will be inserted when the checkpoint is +* complete. +*/ + checkPoint.redo = RedoRecPtr; For online checkpoints, a very important point is that XLogCtl->Insert.RedoRecPtr is also updated in XLogInsertRecord(). Perhaps that's worth an addition? I was a bit confused first that we do the following for shutdown checkpoints: RedoRecPtr = XLogCtl->Insert.RedoRecPtr = checkPoint.redo; Then repeat this pattern for non-shutdown checkpoints a few lines down without touching the copy of the redo LSN in XLogCtl->Insert, because of course we don't hold the WAL insert locks in an exclusive fashion here: checkPoint.redo = RedoRecPtr; My point is that this is not only about RedoRecPtr, but also about XLogCtl->Insert.RedoRecPtr here. The comment in ReserveXLogSwitch() says that. -- Michael signature.asc Description: PGP signature
Re: A new strategy for pull-up correlated ANY_SUBLINK
> > It seems to me that the expressions "=" and "IN" are equivalent here due > to the fact that the aggregated subquery returns only one value, and the > result with the "IN" operation can be considered as the intersection of > elements on the left and right. In this query, we have some kind of set on > the left, among which there will be found or not only one element on the > right. > Yes, they are equivalent at the final result, but there are some differences at the execution level. the '=' case will be transformed to a Subplan whose subPlanType is EXPR_SUBLINK, so if there is more than 1 rows is returned in the subplan, error will be raised. select * from tenk1 where ten = (select ten from tenk1 i where i.two = tenk1.two ); ERROR: more than one row returned by a subquery used as an expression However the IN case would not. select * from tenk1 where ten = (select ten from tenk1 i where i.two = tenk1.two ) is OK. I think the test case you added is not related to this feature. the difference is there even without the patch. so I kept the code you changed, but not for the test case. I took the liberty of adding this to your patch and added myself as >> reviewer, if you don't mind. >> > Sure, the patch after your modification looks better than the original. > I'm not sure how the test case around "because of got one row" is > relevant to the current changes. After we reach to some agreement > on the above discussion, I think v4 is good for committer to review! > > > Thank you!) I am ready to discuss it. > Actually I meant to discuss the "Unfortunately, I found a request..", looks we have reached an agreement there:) -- Best Regards Andy Fan
Re: [PATCH] Add support function for containment operators
On Tue, Aug 1, 2023 at 10:07 AM Laurenz Albe wrote: > > > > > > > I had an idea about this: > > > So far, you only consider constant ranges. But if we have a STABLE range > > > expression, you could use an index scan for "expr <@ range", for example > > > Index Cond (expr >= lower(range) AND expr < upper(range)). > > > The above part, not sure how to implement it, not sure it is doable. Refactor: drop SupportRequestIndexCondition and related code, since mentioned in upthread, it's dead code. refactor the regression test. (since data types with infinity cover more cases than int4range, so I deleted some tests). now there are 3 helper functions (build_bound_expr, find_simplified_clause, match_support_request), 2 entry functions (elem_contained_by_range_support, range_contains_elem_support) Collation problem seems solved. Putting the following test on the src/test/regress/sql/rangetypes.sql will not work. Maybe because of the order of the regression test, in SQL-ASCII encoding, I cannot create collation="cs-CZ-x-icu". drop type if EXISTS textrange1, textrange2; drop table if EXISTS collate_test1, collate_test2; CREATE TYPE textrange1 AS RANGE (SUBTYPE = text, collation="C"); create type textrange2 as range(subtype=text, collation="cs-CZ-x-icu"); CREATE TABLE collate_test1 (b text COLLATE "en-x-icu" NOT NULL); INSERT INTO collate_test1(b) VALUES ('a'), ('c'), ('d'), ('ch'); CREATE TABLE collate_test2 (b text NOT NULL); INSERT INTO collate_test2(b) VALUES ('a'), ('c'), ('d'), ('ch'); --should include 'ch' SELECT * FROM collate_test1 WHERE b <@ textrange1('a', 'd'); --should not include 'ch' SELECT * FROM collate_test1 WHERE b <@ textrange2('a', 'd'); --should include 'ch' SELECT * FROM collate_test2 WHERE b <@ textrange1('a', 'd'); --should not include 'ch' SELECT * FROM collate_test2 WHERE b <@ textrange2('a', 'd'); - From bcae7f8a0640b48b04f243660539e8670de43d41 Mon Sep 17 00:00:00 2001 From: pgaddict Date: Fri, 13 Oct 2023 12:43:41 +0800 Subject: [PATCH v2 1/1] Optimize qual cases like Expr <@ RangeConst and RangeConst @> Expr Previously these two quals will be processed as is. With this patch, adding prosupport function to range_contains_elem, elem_contained_by_range. So cases like Expr <@ rangeCOnst, rangeConst @> expr will be rewritten to "expr opr range_lower_bound and expr opr range_upper_bound". Expr <@ rangeConst will be logically equivalent to rangeConst @> Expr, if rangeConst is the same. added some tests to validate the generated plan. --- src/backend/utils/adt/rangetypes.c | 199 +++- src/backend/utils/adt/rangetypes_selfuncs.c | 6 +- src/include/catalog/pg_proc.dat | 12 +- src/test/regress/expected/rangetypes.out| 194 +++ src/test/regress/sql/rangetypes.sql | 101 ++ 5 files changed, 505 insertions(+), 7 deletions(-) diff --git a/src/backend/utils/adt/rangetypes.c b/src/backend/utils/adt/rangetypes.c index d65e5625..647bd5e4 100644 --- a/src/backend/utils/adt/rangetypes.c +++ b/src/backend/utils/adt/rangetypes.c @@ -31,13 +31,19 @@ #include "postgres.h" #include "access/tupmacs.h" +#include "access/stratnum.h" #include "common/hashfn.h" #include "lib/stringinfo.h" #include "libpq/pqformat.h" #include "miscadmin.h" +#include "nodes/supportnodes.h" +#include "nodes/makefuncs.h" +#include "nodes/nodeFuncs.h" +#include "nodes/pg_list.h" #include "nodes/miscnodes.h" #include "port/pg_bitutils.h" #include "utils/builtins.h" +#include "utils/fmgroids.h" #include "utils/date.h" #include "utils/lsyscache.h" #include "utils/rangetypes.h" @@ -69,7 +75,11 @@ static Size datum_compute_size(Size data_length, Datum val, bool typbyval, char typalign, int16 typlen, char typstorage); static Pointer datum_write(Pointer ptr, Datum datum, bool typbyval, char typalign, int16 typlen, char typstorage); - +static Expr *build_bound_expr(Oid opfamily, TypeCacheEntry *typeCache, + bool isLowerBound, bool isInclusive, + Datum val, Expr *otherExpr, Oid rng_collation); +static Node *find_simplified_clause(Const *rangeConst, Expr *otherExpr); +static Node *match_support_request(Node *rawreq); /* *-- @@ -558,7 +568,6 @@ elem_contained_by_range(PG_FUNCTION_ARGS) PG_RETURN_BOOL(range_contains_elem_internal(typcache, r, val)); } - /* range, range -> bool functions */ /* equality (internal version) */ @@ -2173,6 +2182,29 @@ make_empty_range(TypeCacheEntry *typcache) return make_range(typcache, , , true, NULL); } +/* + * Planner support function for elem_contained_by_range operator + */ +Datum +elem_contained_by_range_support(PG_FUNCTION_ARGS) +{ + Node *rawreq = (Node *) PG_GETARG_POINTER(0); + Node *ret = match_support_request(rawreq); + + PG_RETURN_POINTER(ret); +} + +/* + * Planner support function for range_contains_elem operator + */ +Datum +range_contains_elem_support(PG_FUNCTION_ARGS)
Re: Pro et contra of preserving pg_proc oids during pg_upgrade
On Thu, 2023-10-12 at 19:56 +0300, Nikita Malakhov wrote: > Say, we have data processed by some user function and we want to keep > reference to this function > in our data. In this case we have two ways - first - store string output of > regprocedure, which is not > very convenient, and the second - store its OID, which requires slight > modification of pg_upgrade > (pg_dump and func/procedure creation function). So far, we have lived quite well with the rule "don't store any system OIDs in the database if you want to pg_upgrade" (views on system objects, reg* data types, ...). What is inconvenient about storing the output of regprocedure? Yours, Laurenz Albe
Re: Tab completion for AT TIME ZONE
On 10/13/23 06:31, Michael Paquier wrote: On Fri, Oct 13, 2023 at 03:07:25AM +0200, Vik Fearing wrote: The SQL committee already has another operator starting with AT which is AT LOCAL. The other patch was the reason why I looked at this one. Thank you for updating and committing this patch! but after also removing the completion for "ZONE" after typing "AT TIME" because AT would be completed by "TIME ZONE". Why? The user can tab at any point. -- Vik Fearing