Re: Introduce XID age and inactive timeout based replication slot invalidation
On Thu, Apr 4, 2024 at 10:48 AM Amit Kapila wrote: > > The v33-0001 looks good to me. I have made minor changes in the > comments/commit message and removed one part of the test which was a > bit confusing and didn't seem to add much value. Let me know what you > think of the attached? Thanks for the changes. v34-0001 LGTM. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
On Thu, Apr 4, 2024 at 10:53 AM Ajin Cherian wrote: > > On Wed, Mar 6, 2024 at 1:29 AM Давыдов Виталий > wrote: >> >> In usual work, the subscription has two_phase = on. I have to change this >> option at catchup stage only, but this parameter can not be altered. There >> was a patch proposal in past to implement altering of two_phase option, but >> it was rejected. I think, the recreation of the subscription with two_phase >> = off will not work. >> >> > > The altering of two_phase was restricted because if there was a previously > prepared transaction on the subscriber when the two_phase was on, and then it > was turned off, the apply worker on the subscriber would re-apply the > transaction a second time and this might result in an inconsistent replica. > Here's a patch that allows toggling two_phase option provided that there are > no pending uncommitted prepared transactions on the subscriber for that > subscription. > I think this would probably be better than the current situation but can we think of a solution to allow toggling the value of two_phase even when prepared transactions are present? Can you please summarize the reason for the problems in doing that and the solutions, if any? -- With Regards, Amit Kapila.
Re: Fix for timestamp lag issue from emit_log_hook when GUC log_line_prefix has '%m'
On Wed, Apr 03, 2024 at 03:13:13PM +0900, Michael Paquier wrote: > While looking again at that. there were two more comments that missed > a refresh about JSON in get_formatted_log_time() and > get_formatted_start_time(). It's been a few weeks since the last > update, but I'll be able to wrap that tomorrow, updating these > comments on the way. And done with 2a217c371799, before the feature freeze. -- Michael signature.asc Description: PGP signature
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
On Wed, Mar 6, 2024 at 1:29 AM Давыдов Виталий wrote: > In usual work, the subscription has two_phase = on. I have to change this > option at catchup stage only, but this parameter can not be altered. There > was a patch proposal in past to implement altering of two_phase option, but > it was rejected. I think, the recreation of the subscription with two_phase > = off will not work. > > > The altering of two_phase was restricted because if there was a previously prepared transaction on the subscriber when the two_phase was on, and then it was turned off, the apply worker on the subscriber would re-apply the transaction a second time and this might result in an inconsistent replica. Here's a patch that allows toggling two_phase option provided that there are no pending uncommitted prepared transactions on the subscriber for that subscription. Thanks to Kuroda-san for working on the patch. regards, Ajin Cherian Fujitsu Australia v1-0001-Allow-altering-of-two_phase-option-in-subscribers.patch Description: Binary data
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Wed, Apr 3, 2024 at 8:28 PM Bharath Rupireddy wrote: > > On Wed, Apr 3, 2024 at 6:46 PM Bertrand Drouvot > wrote: > > > > Just one comment on v32-0001: > > > > +# Synced slot on the standby must get its own inactive_since. > > +is( $standby1->safe_psql( > > + 'postgres', > > + "SELECT '$inactive_since_on_primary'::timestamptz <= > > '$inactive_since_on_standby'::timestamptz AND > > + '$inactive_since_on_standby'::timestamptz <= > > '$slot_sync_time'::timestamptz;" > > + ), > > + "t", > > + 'synchronized slot has got its own inactive_since'); > > + > > > > By using <= we are not testing that it must get its own inactive_since (as > > we > > allow them to be equal in the test). I think we should just add some > > usleep() > > where appropriate and deny equality during the tests on inactive_since. > > Thanks. It looks like we can ignore the equality in all of the > inactive_since comparisons. IIUC, all the TAP tests do run with > primary and standbys on the single BF animals. And, it looks like > assigning the inactive_since timestamps to perl variables is giving > the microseconds precision level > (./tmp_check/log/regress_log_040_standby_failover_slots_sync:inactive_since > 2024-04-03 14:30:09.691648+00). FWIW, we already have some TAP and SQL > tests relying on stats_reset timestamps without equality. So, I've > left the equality for the inactive_since tests. > > > Except for the above, v32-0001 LGTM. > > Thanks. Please see the attached v33-0001 patch after removing equality > on inactive_since TAP tests. > The v33-0001 looks good to me. I have made minor changes in the comments/commit message and removed one part of the test which was a bit confusing and didn't seem to add much value. Let me know what you think of the attached? -- With Regards, Amit Kapila. v34-0001-Allow-synced-slots-to-have-their-inactive_since.patch Description: Binary data
Re: Improve eviction algorithm in ReorderBuffer
On Thu, 2024-04-04 at 09:31 +0900, Masahiko Sawada wrote: > IIUC, with your suggestion, sift_{up|down} needs to update the > heap_index field as well. Does it mean that the caller needs to pass > the address of heap_index down to sift_{up|down}? I'm not sure quite how binaryheap should be changed. Bringing the heap implementation into reorderbuffer.c would obviously work, but that would be more code. Another option might be to make the API of binaryheap look a little more like simplehash, where some #defines control optional behavior and can tell the implementation where to find fields in the structure. Perhaps it's not worth the effort though, if performance is already good enough? Regards, Jeff Davis
Re: CREATE TABLE creates a composite type corresponding to the table row, which is and is not there
On 2024-04-04 03:29 +0200, David G. Johnston wrote: > On Thu, Mar 28, 2024 at 8:02 PM Erik Wienhold wrote: > > > Thanks, that sounds better. I incorporated that with some minor edits > > in the attached v3. > > > > You added my missing ( but dropped the comma after "i.e." Thanks, fixed in v4. Looks like American English prefers that comma and it's also more common in our docs. -- Erik >From 8b29c5852762bacb637fab021a06b12ab5cd7f93 Mon Sep 17 00:00:00 2001 From: Erik Wienhold Date: Fri, 8 Mar 2024 04:21:56 +0100 Subject: [PATCH v4] Document that typed tables rely on CREATE TYPE CREATE TABLE OF requires a stand-alone composite type. Clarify that in the error message. Also reword the docs to better explain the connection between created table and stand-alone composite type. Reworded docs provided by David G. Johnston. --- doc/src/sgml/ref/create_table.sgml| 18 +- src/backend/commands/tablecmds.c | 2 +- src/test/regress/expected/typed_table.out | 2 +- 3 files changed, 11 insertions(+), 11 deletions(-) diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml index dfb7822985..586ccb190b 100644 --- a/doc/src/sgml/ref/create_table.sgml +++ b/doc/src/sgml/ref/create_table.sgml @@ -249,19 +249,19 @@ WITH ( MODULUS numeric_literal, REM OF type_name - Creates a typed table, which takes its - structure from the specified composite type (name optionally - schema-qualified). A typed table is tied to its type; for - example the table will be dropped if the type is dropped - (with DROP TYPE ... CASCADE). + Creates a typed table, which takes its structure + from an existing (name optionally schema-qualified) stand-alone composite + type (i.e., created using ) though it + still produces a new composite type as well. The table will have + a dependency on the referenced type such that cascaded alter and drop + actions on the type will propagate to the table. - When a typed table is created, then the data types of the - columns are determined by the underlying composite type and are - not specified by the CREATE TABLE command. + A typed table always has the same column names and data types as the + type it is derived from, and you cannot specify additional columns. But the CREATE TABLE command can add defaults - and constraints to the table and can specify storage parameters. + and constraints to the table, as well as specify storage parameters. diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 317b89f67c..d756d2b200 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -6994,7 +6994,7 @@ check_of_type(HeapTuple typetuple) if (!typeOk) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), -errmsg("type %s is not a composite type", +errmsg("type %s is not a stand-alone composite type", format_type_be(typ->oid; } diff --git a/src/test/regress/expected/typed_table.out b/src/test/regress/expected/typed_table.out index 2e47ecbcf5..745fbde811 100644 --- a/src/test/regress/expected/typed_table.out +++ b/src/test/regress/expected/typed_table.out @@ -89,7 +89,7 @@ drop cascades to function get_all_persons() drop cascades to table persons2 drop cascades to table persons3 CREATE TABLE persons5 OF stuff; -- only CREATE TYPE AS types may be used -ERROR: type stuff is not a composite type +ERROR: type stuff is not a stand-alone composite type DROP TABLE stuff; -- implicit casting CREATE TYPE person_type AS (id int, name text); -- 2.44.0
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Thu, Apr 4, 2024 at 9:42 AM Masahiko Sawada wrote: > > @@ -1368,6 +1416,7 @@ ShutDownSlotSync(void) > if (SlotSyncCtx->pid == InvalidPid) > { > SpinLockRelease(>mutex); > + update_synced_slots_inactive_since(); > return; > } > SpinLockRelease(>mutex); > @@ -1400,6 +1449,8 @@ ShutDownSlotSync(void) > } > > SpinLockRelease(>mutex); > + > + update_synced_slots_inactive_since(); > } > > Why do we want to update all synced slots' inactive_since values at > shutdown in spite of updating the value every time when releasing the > slot? It seems to contradict the fact that inactive_since is updated > when releasing or restoring the slot. It is to get the inactive_since right for the cases where the standby is promoted without a restart similar to when a standby is promoted with restart in which case the inactive_since is set to current time in RestoreSlotFromDisk. Imagine the slot is synced last time at time t1 and then a few hours passed, the standby is promoted without a restart. If we don't set inactive_since to current time in this case in ShutDownSlotSync, the inactive timeout invalidation mechanism can kick in immediately. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Wed, Apr 3, 2024 at 11:58 PM Bharath Rupireddy wrote: > > > Please find the attached v33 patches. @@ -1368,6 +1416,7 @@ ShutDownSlotSync(void) if (SlotSyncCtx->pid == InvalidPid) { SpinLockRelease(>mutex); + update_synced_slots_inactive_since(); return; } SpinLockRelease(>mutex); @@ -1400,6 +1449,8 @@ ShutDownSlotSync(void) } SpinLockRelease(>mutex); + + update_synced_slots_inactive_since(); } Why do we want to update all synced slots' inactive_since values at shutdown in spite of updating the value every time when releasing the slot? It seems to contradict the fact that inactive_since is updated when releasing or restoring the slot. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Streaming read-ready sequential scan code
On Thu, 4 Apr 2024 at 14:38, Melanie Plageman wrote: > Looking at it in the code, I am wondering if we should call > heap_page_prep() heap_scan_page_prep(). Just wondering if it is clear > that it is prepping a page to be scanned. You choose whatever you > think is best. I ended up calling it heap_prepare_pagescan() as I started to think prep/prepare should come first. I don't think it's perfect as the intended meaning is heap_prepare_page_for_scanning_in_pagemode(), but that's obviously too long. I've pushed the v9-0001 with that rename done. David
Re: Popcount optimization using AVX512
On Thu, 4 Apr 2024 at 11:50, Nathan Bossart wrote: > If we can verify this approach won't cause segfaults and can stomach the > regression between 8 and 16 bytes, I'd happily pivot to this approach so > that we can avoid the function call dance that I have in v25. > > Thoughts? If we're worried about regressions with some narrow range of byte values, wouldn't it make more sense to compare that to cc4826dd5~1 at the latest rather than to some version that's already probably faster than PG16? David
Re: Extension Enhancement: Buffer Invalidation in pg_buffercache
On Fri, Mar 8, 2024 at 6:20 AM Maxim Orlov wrote: > Quite an interesting patch, in my opinion. I've decided to work on it a bit, > did some refactoring (sorry) and add > basic tests. Also, I try to take into account as much as possible notes on > the patch, mentioned by Cédric Villemain. Thanks! Unfortunately I don't think it's possible to include a regression test that looks at the output, because it'd be non-deterministic. Any other backend could pin or dirty the buffer you try to evict, changing the behaviour. > > and maybe better to go with FlushOneBuffer() ? > It's a good idea, but I'm not sure at the moment. I'll try to dig some > deeper into it. At least, FlushOneBuffer does > not work for a local buffers. So, we have to decide whatever > pg_buffercache_invalidate should or should not > work for local buffers. For now, I don't see why it should not. Maybe I > miss something? I think it's OK to ignore local buffers for now. pg_buffercache generally doesn't support/show them so I don't feel inclined to support them for this. I removed a few traces of local support. It didn't seem appropriate to use the pg_monitor role for this, so I made it superuser-only. I don't think it makes much sense to use this on any kind of production system so I don't think we need a new role for it, and existing roles don't seem too appropriate. pageinspect et al use the same approach. I added a VOLATILE qualifier to the function. I added some documentation. I changed the name to pg_buffercache_evict(). I got rid of the 'force' flag which was used to say 'I only want to evict this buffer it is clean'. I don't really see the point in that, we might as well keep it simple. You could filter buffers on "isdirty" if you want. I added comments to scare anyone off using EvictBuffer() for anything much, and marking it as something for developer convenience. (I am aware of an experimental patch that uses this same function as part of a buffer pool resizing operation, but that has other infrastructure to make that safe and would adjust those remarks accordingly.) I wondered whether it should really be testing for BM_TAG_VALID rather than BM_VALID. Arguably, but it doesn't seem important for now. The distinction would arise if someone had tried to read in a buffer, got an I/O error and abandoned ship, leaving a buffer with a valid tag but not valid contents. Anyone who tries to ReadBuffer() it will then try to read it again, but in the meantime this function won't be able to evict it (it'll just return false). Doesn't seem that obvious to me that this obscure case needs to be handled. That doesn't happen *during* a non-error case, because then it's pinned and we already return false in this code for pins. I contemplated whether InvalidateBuffer() or InvalidateVictimBuffer() would be better here and realised that Andres's intuition was probably right when he suggested the latter up-thread. It is designed with the right sort of arbitrary concurrent activity in mind, where the former assumes things about locking and dropping, which could get us into trouble if not now maybe in the future. I ran the following diabolical buffer blaster loop while repeatedly running installcheck: do $$ begin loop perform pg_buffercache_evict(bufferid) from pg_buffercache where random() <= 0.25; end loop; End; $$; The only ill-effect was a hot laptop. Thoughts, objections, etc? Very simple example of use: create or replace function uncache_relation(name text) returns boolean begin atomic; select bool_and(pg_buffercache_evict(bufferid)) from pg_buffercache where reldatabase = (select oid from pg_database where datname = current_database()) and relfilenode = pg_relation_filenode(name); end; More interesting for those of us hacking on streaming I/O stuff was the ability to evict just parts of things and see how the I/O merging and I/O depth react. v5-0001-Add-pg_buffercache_evict-function-for-testing.patch Description: Binary data
Re: Is it safe to cache data by GiST consistent function
> On 3 Apr 2024, at 19:02, Tom Lane wrote: > > =?utf-8?Q?Micha=C5=82_K=C5=82eczek?= writes: > >> pg_trgm consistent caches tigrams but it has some logic to make sure cached >> values are recalculated: > >> cache = (gtrgm_consistent_cache *) fcinfo->flinfo->fn_extra; >> if (cache == NULL || >> cache->strategy != strategy || >> VARSIZE(cache->query) != querysize || >> memcmp((char *) cache->query, (char *) query, querysize) != 0) > >> What I don’t understand is if it is necessary or it is enough to check >> fn_extra==NULL. > > Ah, I didn't think to search contrib. Yes, you need to validate the > cache entry. In this example, a rescan could insert a new query > value. In general, an opclass support function could get called using > a pretty long-lived FunctionCallInfo (e.g. one in the index's relcache > entry), so it's unwise to assume that cached data is relevant to the > current call without checking. This actually sounds scary - looks like there is no way to perform cache clean-up after rescan then? Do you think it might be useful to introduce a way for per-rescan caching (ie. setting up a dedicated memory context in gistrescan and passing it to support functions)? — Michal
Re: On disable_cost
On Thu, 4 Apr 2024 at 10:15, David Rowley wrote: > In short, I don't find it strange that disabling one node type results > in considering another type that we'd otherwise not consider in cases > where we assume that the disabled node type is always superior and > should always be used when it is possible. In addition to what I said earlier, I think the current enable_indexonlyscan is implemented in a way that has the planner do what it did before IOS was added. I think that goal makes sense with any patch that make the planner try something new. We want to have some method to get the previous behaviour for the cases where the planner makes a dumb choice or to avoid some bug in the new feature. I think using that logic, the current scenario with enable_indexscan and enable_indexonlyscan makes complete sense. I mean, including enable_indexscan=0 adding disable_cost to IOS Paths. David
Re: Using the %m printf format more
On Fri, Mar 22, 2024 at 01:58:24PM +, Dagfinn Ilmari Mannsåker wrote: > Here's an updated patch that adds such a comment. I'll add it to the > commitfest later today unless someone commits it before then. I see no problem to do that now rather than later. So, done to make pg_regress able to use %m. -- Michael signature.asc Description: PGP signature
Re: PostgreSQL 17 Release Management Team & Feature Freeze
On Mon, 18 Mar 2024 at 15:50, Michael Paquier wrote: > Additionally, the RMT has set the feature freeze to be **April 8, 2024 > at 0:00 AoE** (see [1]). This is the last time to commit features for > PostgreSQL 17. In other words, no new PostgreSQL 17 feature can be > committed after April 8, 2024 at 0:00 AoE. As mentioned last year in > [2], this uses the "standard" feature freeze date/time. Someone asked me about this, so thought it might be useful to post here. To express this as UTC, It's: postgres=# select '2024-04-08 00:00-12:00' at time zone 'UTC'; timezone - 2024-04-08 12:00:00 Or, time remaining, relative to now: select '2024-04-08 00:00-12:00' - now(); David > [1]: https://en.wikipedia.org/wiki/Anywhere_on_Earth > [2]: > https://www.postgresql.org/message-id/9fbe60ec-fd1b-6ee0-240d-af7fc4442...@postgresql.org
Re: Refactoring of pg_resetwal/t/001_basic.pl
On Tue, Mar 26, 2024 at 02:53:35PM +0300, Svetlana Derevyanko wrote: > What do you think? > > +use constant SLRU_PAGES_PER_SEGMENT => 32; Well, I disagree with what you are doing here, adding a hardcoded dependency between the test code and the backend code. I would suggest to use a more dynamic approach and retrieve such values directly from the headers. See scan_server_header() in 039_end_of_wal.pl as one example. 7b5275eec3a5 is newer than bae868caf222, so the original commit could have used that, as well. -- Michael signature.asc Description: PGP signature
Re: Parent/child context relation in pg_get_backend_memory_contexts()
On Thu, 4 Apr 2024 at 12:34, Michael Paquier wrote: > I've been re-reading the patch again to remember what this is about, > and I'm OK with having this "path" column in the catalog. However, > I'm somewhat confused by the choice of having a temporary number that > shows up in the catalog representation, because this may not be > constant across multiple calls so this still requires a follow-up > temporary ID <-> name mapping in any SQL querying this catalog. A > second thing is that array does not show the hierarchy of the path; > the patch relies on the order of the elements in the output array > instead. My view on this is that there are a couple of things with the patch which could be considered separately: 1. Should we have a context_id in the view? 2. Should we also have an array of all parents? My view is that we really need #1 as there's currently no reliable way to determine a context's parent as the names are not unique. I do see that Melih has mentioned this is temporary in: + + Current context id. Note that the context id is a temporary id and may + change in each invocation + For #2, I'm a bit less sure about this. I know Andres would like to see this array added, but equally WITH RECURSIVE would work. Does the array of parents completely eliminate the need for recursive queries? I think the array works for anything that requires all parents or some fixed (would be) recursive level, but there might be some other condition to stop recursion other than the recursion level that someone needs to do.What I'm trying to get at is; do we need to document the WITH RECURSIVE stuff anyway? and if we do, is it still worth having the parents array? David
Re: Add new error_action COPY ON_ERROR "log"
On Tue, Apr 02, 2024 at 09:53:57AM +0900, Masahiko Sawada wrote: > Pushed. Thanks for following up with this thread. -- Michael signature.asc Description: PGP signature
Re: Streaming read-ready sequential scan code
On Wed, Apr 3, 2024 at 9:08 PM David Rowley wrote: > > On Thu, 4 Apr 2024 at 06:03, Melanie Plageman > wrote: > > Attached v8 is rebased over current master (which now has the > > streaming read API). > > I've looked at the v8-0001 patch. Thanks for taking a look! > I wasn't too keen on heapbuildvis() as a function name for an external > function. Since it also does pruning work, it seemed weird to make it > sound like it only did visibility work. Per our offline discussion > about names, I've changed it to what you suggested which is > heap_page_prep(). Looking at it in the code, I am wondering if we should call heap_page_prep() heap_scan_page_prep(). Just wondering if it is clear that it is prepping a page to be scanned. You choose whatever you think is best. > Aside from that, there was an outdated comment: "In pageatatime mode, > heapgetpage() already did visibility checks," which was no longer true > as that's done in heapbuildvis() (now heap_page_prep()). > > I also did a round of comment adjustments as there were a few things I > didn't like, e.g: > > + * heapfetchbuf - subroutine for heapgettup() > > This is also used in heapgettup_pagemode(), so I thought it was a bad > idea for a function to list places it thinks it's being called. I > also thought it redundant to write "This routine" in the function head > comment. I think "this routine" is implied by the context. I ended up > with: > > /* > * heapfetchbuf - read and pin the given MAIN_FORKNUM block number. > * > * Read the specified block of the scan relation into a buffer and pin that > * buffer before saving it in the scan descriptor. > */ > > I'm happy with your changes to heapam_scan_sample_next_block(). I did > adjust the comment above CHECK_FOR_INTERRUPTS() so it was effectively > the same as the seqscan version, just with "seqscan" swapped for > "sample scan". That all is fine with me. > The only other thing I adjusted there was to use "blockno" in some > places where you were using hscan->rs_cblock. These all come after > the "hscan->rs_cblock = blockno;" line. My thoughts here are that this > is more likely to avoid reading the value from the struct again if the > compiler isn't happy that the two values are still equivalent for some > reason. Even if the compiler is happy today, it would only take a > code change to pass hscan to some external function for the compiler > to perhaps be uncertain if that function has made an adjustment to > rs_cblock and go with the safe option of pulling the value out the > struct again which is a little more expensive as it requires some > maths to figure out the offset. > > I've attached v9-0001 and a delta of just my changes from v8. All sounds good and LGTM - Melanie
Re: CREATE TABLE creates a composite type corresponding to the table row, which is and is not there
On Thu, Mar 28, 2024 at 8:02 PM Erik Wienhold wrote: > Thanks, that sounds better. I incorporated that with some minor edits > in the attached v3. > Looks good. You added my missing ( but dropped the comma after "i.e." diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml index dc69a3f5dc..b2e9e97b93 100644 --- a/doc/src/sgml/ref/create_table.sgml +++ b/doc/src/sgml/ref/create_table.sgml @@ -251,7 +251,7 @@ WITH ( MODULUS numeric_literal, REM Creates a typed table, which takes its structure from an existing (name optionally schema-qualified) stand-alone composite - type (i.e. created using ) though it + type (i.e., created using ) though it still produces a new composite type as well. The table will have a dependency on the referenced type such that cascaded alter and drop actions on the type will propagate to the table. David J.
Re: Statistics Import and Export
On Mon, Apr 01, 2024 at 01:21:53PM -0400, Tom Lane wrote: > I'm not sure. I think if we put our heads down we could finish > the changes I'm suggesting and resolve the other issues this week. > However, it is starting to feel like the sort of large, barely-ready > patch that we often regret cramming in at the last minute. Maybe > we should agree that the first v18 CF would be a better time to > commit it. There are still 4 days remaining, so there's still time, but my overall experience on the matter with my RMT hat on is telling me that we should not rush this patch set. Redesigning portions close to the end of a dev cycle is not a good sign, I am afraid, especially if the sub-parts of the design don't fit well in the global picture as that could mean more maintenance work on stable branches in the long term. Still, it is very good to be aware of the problems because you'd know what to tackle to reach the goals of this patch set. -- Michael signature.asc Description: PGP signature
Re: Streaming read-ready sequential scan code
On Thu, 4 Apr 2024 at 06:03, Melanie Plageman wrote: > Attached v8 is rebased over current master (which now has the > streaming read API). I've looked at the v8-0001 patch. I wasn't too keen on heapbuildvis() as a function name for an external function. Since it also does pruning work, it seemed weird to make it sound like it only did visibility work. Per our offline discussion about names, I've changed it to what you suggested which is heap_page_prep(). Aside from that, there was an outdated comment: "In pageatatime mode, heapgetpage() already did visibility checks," which was no longer true as that's done in heapbuildvis() (now heap_page_prep()). I also did a round of comment adjustments as there were a few things I didn't like, e.g: + * heapfetchbuf - subroutine for heapgettup() This is also used in heapgettup_pagemode(), so I thought it was a bad idea for a function to list places it thinks it's being called. I also thought it redundant to write "This routine" in the function head comment. I think "this routine" is implied by the context. I ended up with: /* * heapfetchbuf - read and pin the given MAIN_FORKNUM block number. * * Read the specified block of the scan relation into a buffer and pin that * buffer before saving it in the scan descriptor. */ I'm happy with your changes to heapam_scan_sample_next_block(). I did adjust the comment above CHECK_FOR_INTERRUPTS() so it was effectively the same as the seqscan version, just with "seqscan" swapped for "sample scan". The only other thing I adjusted there was to use "blockno" in some places where you were using hscan->rs_cblock. These all come after the "hscan->rs_cblock = blockno;" line. My thoughts here are that this is more likely to avoid reading the value from the struct again if the compiler isn't happy that the two values are still equivalent for some reason. Even if the compiler is happy today, it would only take a code change to pass hscan to some external function for the compiler to perhaps be uncertain if that function has made an adjustment to rs_cblock and go with the safe option of pulling the value out the struct again which is a little more expensive as it requires some maths to figure out the offset. I've attached v9-0001 and a delta of just my changes from v8. David From 90bfc63097c556d0921d8f9165731fb07ec04167 Mon Sep 17 00:00:00 2001 From: Melanie Plageman Date: Sat, 27 Jan 2024 18:39:37 -0500 Subject: [PATCH v9] Split heapgetpage into two parts heapgetpage(), a per-block utility function used in heap scans, read a passed-in block into a buffer and then, if page-at-a-time processing was enabled, pruned the page and built an array of its visible tuples. This was used for sequential and sample scans. Future commits will add support for read streams. The streaming read API will read in the blocks specified by a callback, but any significant per-page processing should be done synchronously on the buffer yielded by the streaming read API. To support this, separate the logic in heapgetpage() to read a block into a buffer from that which prunes the page and builds an array of the visible tuples. The former is now heapfetchbuf() and the latter is now heapbuildvis(). Future commits will push the logic for selecting the next block into heapfetchbuf() in cases when streaming reads are not supported (such as backwards sequential scans). Because this logic differs for sample scans and sequential scans, inline the code to read the block into a buffer for sample scans. This has the added benefit of allowing for a bit of refactoring in heapam_scan_sample_next_block(), including unpinning the previous buffer before invoking the callback to select the next block. --- src/backend/access/heap/heapam.c | 78 ++-- src/backend/access/heap/heapam_handler.c | 38 src/include/access/heapam.h | 2 +- 3 files changed, 74 insertions(+), 44 deletions(-) diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index a9d5b109a5..6524fc44a5 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -360,17 +360,17 @@ heap_setscanlimits(TableScanDesc sscan, BlockNumber startBlk, BlockNumber numBlk } /* - * heapgetpage - subroutine for heapgettup() + * heap_page_prep - Prepare the current scan page to be scanned in pagemode * - * This routine reads and pins the specified page of the relation. - * In page-at-a-time mode it performs additional work, namely determining - * which tuples on the page are visible. + * Preparation currently consists of 1. prune the scan's rs_cbuf page, and 2. + * fill the rs_vistuples array with the OffsetNumbers of visible tuples. */ void -heapgetpage(TableScanDesc sscan, BlockNumber block) +heap_page_prep(TableScanDesc sscan) { HeapScanDesc scan = (HeapScanDesc) sscan; - Buffer buffer; + Buffer buffer = scan->rs_cbuf; + BlockNumber block =
Re: Allow non-superuser to cancel superuser tasks.
On Tue, Apr 02, 2024 at 04:35:28PM +0500, Andrey M. Borodin wrote: > We can add tests just like [0] with injection points. > I mean replace that "sleep 1" with something like > "$node->wait_for_event('autovacuum worker', 'autocauum-runing');". > Currently we have no infrastructure to wait for autovacuum of > particular table, but I think it's doable. > Also I do not like that test is changing system-wide autovac > settings, AFAIR these settings can be set for particular table. Yeah, hardcoded sleeps are not really acceptable. On fast machines they eat in global runtime making the whole slower, impacting the CI. On slow machines, that's not going to be stable and we have a lot of buildfarm animals starved on CPU, like the ones running valgrind or just because their environment is slow (one of my animals runs on a RPI, for example). Note that slow machines have a lot of value because they're usually better at catching race conditions. Injection points would indeed make the tests more deterministic by controlling the waits and wakeups you'd like to have in the patch's tests. eeefd4280f6e would be a good example of how to implement a test. -- Michael signature.asc Description: PGP signature
Re: Reports on obsolete Postgres versions
On Tue, Apr 2, 2024 at 1:47 PM Bruce Momjian wrote: > On Tue, Apr 2, 2024 at 11:34:46AM +0200, Magnus Hagander wrote: > > Okay, I changed "superseded" to "old", and changed "latest" to > "current", patch attached. > > I took a pass at this and found a few items of note. Changes on top of Bruce's patch. diff --git a/templates/support/versioning.html b/templates/support/versioning.html index 0ed79f4f..d4762967 100644 --- a/templates/support/versioning.html +++ b/templates/support/versioning.html @@ -21,9 +21,9 @@ a release available outside of the minor release roadmap. The PostgreSQL Global Development Group supports a major version for 5 years -after its initial release. After its five year anniversary, a major version will -have one last minor release containing any fixes and will be considered -end-of-life (EOL) and no longer supported. +after its initial release. After its fifth anniversary, a major version will +have one last minor release and will be considered +end-of-life (EOL), meaning no new bug fixes will be written for it. # "fifth anniversary "seems more correct than "five year anniversary". # The fact it gets a minor release implies it receives fixes. # I've always had an issue with our use of the phrasing "no longer supported". It seems vague when practically it just means we stop writing patches for it. Whether individual community members refrain from answering questions or helping people on these older releases is not a project decision but a personal one. Also, since we already say it is supported for 5 years it seems a bit redundant to then also say "after 5 years it is unsupported". Version Numbering @@ -45,11 +45,12 @@ number, e.g. 9.5.3 to 9.5.4. Upgrading -Major versions usually change the internal format of the system tables. -These changes are often complex, so we do not maintain backward -compatibility of all stored data. A dump/reload of the database or use of the -pg_upgrade module is required -for major upgrades. We also recommend reading the +Major versions need the data directory to be initialized so that the system tables +specific to that version can be created. There are two options to then +migrate the user data from the old directory to the new one: a dump/reload +of the database or using the +pg_upgrade module. +We also recommend reading the upgrading section of the major version you are planning to upgrade to. You can upgrade from one major version to another without upgrading to intervening versions, but we recommend reading @@ -58,14 +59,15 @@ versions prior to doing so. # This still talked about "stored data" when really that isn't the main concern and if it was pg_upgrade wouldn't work as an option. # The choice to say "data directory" here seems reasonable if arguable. But it implies the location of user data and we state it also contains version-specific system tables. It can go unsaid that they are version-specific because the collection as a whole and the layout of individual tables can and almost certainly will change between versions. -Minor release upgrades do not require a dump and restore; you simply stop +Minor releases upgrades do not impact the data directory, only the binaries. +You simply stop the database server, install the updated binaries, and restart the server. -Such upgrades might require manual changes to complete so always read +However, some upgrades might require manual changes to complete so always read the release notes first. # The fact minor releases don't require dump/reload doesn't directly preclude them from needing pg_upgrade and writing "Such upgrades" seems like it could lead someone to think that. # Data Directory seems generic enough to be understood here and since I mention it in the Major Version as to why data migration is needed, mentioning here # as something not directly altered and thus precluding the data migration has a nice symmetry. The potential need for manual changes becomes clearer as well. -Minor releases only fix frequently-encountered bugs, security issues, and data corruption problems, so such upgrades are very low risk. The community considers performing minor upgrades to be less risky than continuing to # Reality mostly boils down to trusting our judgement when it comes to bugs as each one is evaluated individually. We do not promise to leave simply buggy behavior alone in minor releases which is the only policy that would nearly eliminate upgrade risk. We back-patch 5 year old bugs quite often which by definition are not frequently encountered. I cannot think of a good adjective to put there nor does one seem necessary even if I agree it reads a bit odd otherwise. I think that has more to do with this being just the wrong structure to get our point across. Can we pick out some statistics from our long history of publishing minor releases to present an objective reality to the reader to convince them to trust our track-record in this matter? David J.
Re: Silence Meson warning on HEAD
On Mon, Apr 01, 2024 at 06:46:01PM -0500, Tristan Partin wrote: > 1. version-check.diff > > Wrap the offending line in a Meson version check. > > 2. perl-string.diff > > Pass the perl program as a string via its .path() method. > > 3. meson-bump.diff > > Bump the minimum meson version from 0.54 to 0.55, at least. Among these three options, 2) seems like the most appealing to me at this stage of the development cycle. Potential breakages and more backpatch fuzz is never really appealing even if meson is a rather new thing, particularly considering that there is a simple solution that does not require a minimal version bump. > Seems like as good an opportunity to bump Meson to 0.56 for Postgres 17, > which I have found to exist in the EPEL for RHEL 7. I don't know what > version exists in the base repo. Perhaps it is 0.55, which would still get > rid of the aforementioned warning. Perhaps we could shave even more versions for PG18 if it helps with future development? -- Michael signature.asc Description: PGP signature
Re: Allow non-superuser to cancel superuser tasks.
Update - the condition should be && if (pgstat_get_backend_type(proc->backendId) == B_AUTOVAC_WORKER) { if (!has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_AUTOVACUUM) && !superuser()) return SIGNAL_BACKEND_NOAUTOVACUUM; } Thanks -- Anthony Leung Amazon Web Services: https://aws.amazon.com
Re: Improve eviction algorithm in ReorderBuffer
Hi, On Thu, Apr 4, 2024 at 2:32 AM Jeff Davis wrote: > > On Wed, 2024-04-03 at 01:45 -0700, Jeff Davis wrote: > > I suggest that you add a "heap_index" field to ReorderBufferTXN that > > would point to the index into the heap's array (the same as > > bh_nodeidx_entry.index in your patch). Each time an element moves > > within the heap array, just follow the pointer to the > > ReorderBufferTXN > > object and update the heap_index -- no hash lookup required. > > It looks like my email was slightly too late, as the work was already > committed. Thank you for the suggestions! I should have informed it earlier. > > My suggestion is not required for 17, and so it's fine if this waits > until the next CF. If it turns out to be a win we can consider > backporting to 17 just to keep the code consistent, otherwise it can go > in 18. IIUC, with your suggestion, sift_{up|down} needs to update the heap_index field as well. Does it mean that the caller needs to pass the address of heap_index down to sift_{up|down}? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Allow non-superuser to cancel superuser tasks.
> I don't think we should rely on !OidIsValid(proc->roleId) for signaling > autovacuum workers. That might not always be true, and I don't see any > need to rely on that otherwise. IMHO we should just add a few lines before > the existing code, which doesn't need to be changed at all: > > if (pg_stat_is_backend_autovac_worker(proc->backendId) && > !has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_AUTOVACUUM)) > return SIGNAL_BACKEND_NOAUTOVACUUM; I tried to add them above the existing code. When I test it locally, a user without pg_signal_autovacuum will actually fail at this block because the user is not superuser and !OidIsValid(proc->roleId) is also true in the following: /* * Only allow superusers to signal superuser-owned backends. Any process * not advertising a role might have the importance of a superuser-owned * backend, so treat it that way. */ if ((!OidIsValid(proc->roleId) || superuser_arg(proc->roleId)) && !superuser()) return SIGNAL_BACKEND_NOSUPERUSER; This is what Im planning to do - If the backend is autovacuum worker and the user is not superuser or has pg_signal_autovacuum role, we return the new value and provide the relevant error message /* * If the backend is autovacuum worker, allow user with privileges of the * pg_signal_autovacuum role to signal the backend. */ if (pgstat_get_backend_type(proc->backendId) == B_AUTOVAC_WORKER) { if (!has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_AUTOVACUUM) || !superuser()) return SIGNAL_BACKEND_NOAUTOVACUUM; } /* * Only allow superusers to signal superuser-owned backends. Any process * not advertising a role might have the importance of a superuser-owned * backend, so treat it that way. */ else if ((!OidIsValid(proc->roleId) || superuser_arg(proc->roleId)) && !superuser()) { return SIGNAL_BACKEND_NOSUPERUSER; } /* Users can signal backends they have role membership in. */ else if (!has_privs_of_role(GetUserId(), proc->roleId) && !has_privs_of_role(GetUserId(), ROLE_PG_SIGNAL_BACKEND)) { return SIGNAL_BACKEND_NOPERMISSION; } > We can add tests just like [0] with injection points. > I mean replace that "sleep 1" with something like > "$node->wait_for_event('autovacuum worker', 'autocauum-runing');". > Currently we have no infrastructure to wait for autovacuum of particular > table, but I think it's doable. > Also I do not like that test is changing system-wide autovac settings, AFAIR > these settings can be set for particular table. Thanks for the suggestion. I will take a look at this. Let me also separate the test into a different patch file. -- Anthony Leung Amazon Web Services: https://aws.amazon.com
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
On Thu, Apr 4, 2024 at 11:51 AM Peter Eisentraut wrote: > On 30.03.24 22:27, Thomas Munro wrote: > > Hmm, OK so it doesn't have 3 available in parallel from base repos. > > But it's also about to reach end of "full support" in 2 months[1], so > > if we applied the policies we discussed in the LLVM-vacuuming thread > > (to wit: build farm - EOL'd OSes), then... One question I'm unclear > > on is whether v17 will be packaged for RHEL8. > > The rest of the thread talks about the end of support of RHEL 7, but you > are here talking about RHEL 8. It is true that "full support" for RHEL > 8 ended in May 2024, but that is the not the one we are tracking. We > are tracking the 10-year one, which I suppose is now called "maintenance > support". I might have confused myself with the two EOLs and some wishful thinking. I am a lot less worked up about this general topic now that RHEL has moved to "rolling" LLVM updates in minor releases, removing a physical-pain-inducing 10-year vacuuming horizon (that's 20 LLVM major releases and they only fix bugs in one...). I will leave openssl discussions to those more knowledgeable about that. > So if the above package list is correct, then we ought to keep > supporting openssl 1.1.* until 2029. That's a shame. But it sounds like the developer burden isn't so different from 1.1.1 to 3.x, so maybe it's not such a big deal from our point of view. (I have no opinion on the security ramifications of upstream's EOL, but as a layman it sounds completely bonkers to use it. I wonder why the packaging community wouldn't just arrange to have a supported-by-upstream 3.x package in their RPM repo when they supply the newest PostgreSQL versions for the oldest RHEL, but again not my area so I'll shut up).
Re: Parent/child context relation in pg_get_backend_memory_contexts()
On Wed, Apr 03, 2024 at 04:20:39PM +0300, Melih Mutlu wrote: > Michael Paquier , 14 Şub 2024 Çar, 10:23 tarihinde > şunu yazdı: >> I was reading the patch, and using int[] as a representation of the >> path of context IDs up to the top-most parent looks a bit strange to >> me, with the relationship between each parent -> child being >> preserved, visibly, based on the order of the elements in this array >> made of temporary IDs compiled on-the-fly during the function >> execution. Am I the only one finding that a bit strange? Could it be >> better to use a different data type for this path and perhaps switch >> to the names of the contexts involved? > > Do you find having the path column strange all together? Or only using > temporary IDs to generate that column? The reason why I avoid using context > names is because there can be multiple contexts with the same name. This > makes it difficult to figure out which context, among those with that > particular name, is actually included in the path. I couldn't find any > other information that is unique to each context. I've been re-reading the patch again to remember what this is about, and I'm OK with having this "path" column in the catalog. However, I'm somewhat confused by the choice of having a temporary number that shows up in the catalog representation, because this may not be constant across multiple calls so this still requires a follow-up temporary ID <-> name mapping in any SQL querying this catalog. A second thing is that array does not show the hierarchy of the path; the patch relies on the order of the elements in the output array instead. -- Michael signature.asc Description: PGP signature
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
On Wed, Apr 03, 2024 at 01:38:50PM -0400, Tom Lane wrote: > The discussion we had last year concluded that we were OK with > dropping 1.0.1 support when RHEL6 goes out of extended support > (June 2024 per this thread, I didn't check it). Seems like we > should have the same policy for RHEL7. Also, calling Photon 3 > dead because it went EOL three days ago seems over-hasty. Yeah. A bunch of users of Photon are VMware (or you could say Broadcom) product appliances, and I'd suspect that quite a lot of them rely on Photon 3 for their base OS image. Upgrading that stuff is not easy work in my experience because they need to cope with a bunch of embedded services. > Bottom line for me is that pulling 1.0.1 support now is OK, > but I think pulling 1.0.2 is premature. Yeah, I guess so. At least that seems like the safest conclusion currently here. The build-time check on X509_get_signature_info() would still be required. I'd love being able to rip out the internal locking logic currently in libpq as LibreSSL has traces of CRYPTO_lock(), as far as I've checked, and we rely on its existence. -- Michael signature.asc Description: PGP signature
Re: Built-in CTYPE provider
On Tue, 2024-03-26 at 08:14 +0100, Peter Eisentraut wrote: > > Full vs. simple case mapping is more of a legacy compatibility > question, > in my mind. There is some expectation/precedent that C.UTF-8 uses > simple case mapping, but beyond that, I don't see a reason why > someone > would want to explicitly opt for simple case mapping, other than if > they > need length preservation or something, but if they need that, then > they > are going to be in a world of pain in Unicode anyway. I mostly agree, though there are some other purposes for the simple mapping: * a substitute for case folding: lower() with simple case mapping will work better for that purpose than lower() with full case mapping (after we have casefold(), this won't be a problem) * simple case mapping is conceptually simpler, and that's a benefit by itself in some situations -- maybe the 1:1 assumption exists other places in their application > > There's also another reason to consider it an argument rather than > > a > > collation property, which is that it might be dependent on some > > other > > field in a row. I could imagine someone wanting to do: > > > > SELECT > > UPPER(some_field, > > full => true, > > dotless_i => CASE other_field WHEN ...) > > FROM ... > > Can you index this usefully? It would only work if the user query > matches exactly this pattern? In that example, UPPER is used in the target list -- the WHERE clause might be indexable. The UPPER is just used for display purposes, and may depend on some locale settings stored in another table associated with a particular user. Regards, Jeff Davis
Re: Security lessons from liblzma - libsystemd
On 03.04.24 23:19, Magnus Hagander wrote: When the code is this simple, we should definitely consider carrying it ourselves. At least if we don't expect to need *other* functionality from the same library in the future, which I doubt we will from libsystemd. Well, I've long had it on my list to do some integration to log directly to the journal, so you can preserve metadata better. I'm not sure right now whether this would use libsystemd, but it's not like there is absolutely no other systemd-related functionality that could be added. Personally, I think this proposed change is trying to close a barndoor after a horse has bolted. There are many more interesting and scary libraries in the dependency tree of "postgres", so just picking off one right now doesn't really accomplish anything. The next release of libsystemd will drop all the compression libraries as hard dependencies, so the issue in that sense is gone anyway. Also, fun fact: liblzma is also a dependency via libxml2.
Re: pg_combinebackup --copy-file-range
Hi, Here's a much more polished and cleaned up version of the patches, fixing all the issues I've been aware of, and with various parts merged into much more cohesive parts (instead of keeping them separate to make the changes/evolution more obvious). I decided to reorder the changes like this: 1) block alignment - As I said earlier, I think we should definitely do this, even if only to make future improvements possible. After chatting about this with Robert off-list a bit, he pointed out I actually forgot to not align the headers for files without any blocks, so this version fixes that. 2) add clone/copy_file_range for the case that copies whole files. This is pretty simple, but it also adds the --clone/copy-file-range options, and similar infrastructure. The one slightly annoying bit is that we now have the ifdef stuff in two places - when parsing the options, and then in the copy_file_XXX methods, and the pg_fatal() calls should not be reachable in practice. But that doesn't seem harmful, and might be a useful protection against someone calling function that does nothing. This also merges the original two parts, where the first one only did this cloning/CoW stuff when checksum did not need to be calculated, and the second extended it to those cases too (by also reading the data, but still doing the copy the old way). I think this is the right way - if that's not desisable, it's easy to either add --no-manifest or not use the CoW options. Or just not change the checksum type. There's no other way. 3) add copy_file_range to write_reconstructed_file, by using roughly the same logic/reasoning as (2). If --copy-file-range is specified and a checksum should be calculated, the data is read for the checksum, but the copy is done using copy_file_range. I did rework the flow write_reconstructed_file() flow a bit, because tracking what exactly needs to be read/written in each of the cases (which copy method, zeroed block, checksum calculated) made the flow really difficult to follow. Instead I introduced a function to read/write a block, and call them from different places. I think this is much better, and it also makes the following part dealing with batches of blocks much easier / smaller change. 4) prefetching - This is mostly unrelated to the CoW stuff, but it has tremendous benefits, especially for ZFS. I haven't been able to tune ZFS to get decent performance, and ISTM it'd be rather unusable for backup purposes without this. 5) batching in write_reconstructed_file - This benefits especially the copy_file_range case, where I've seen it to yield massive speedups (see the message from Monday for better data). 6) batching for prefetch - Similar logic to (5), but for fadvise. This is the only part where I'm not entirely sure whether it actually helps or not. Needs some more analysis, but I'm including it for completeness. I do think the parts (1)-(4) are in pretty good shape, good enough to make them committable in a day or two. I see it mostly a matter of testing and comment improvements rather than code changes. (5) is in pretty good shape too, but I'd like to maybe simplify and refine the write_reconstructed_file changes a bit more. I don't think it's broken, but it feels a bit too cumbersome. Not sure about (6) yet. I changed how I think about this a bit - I don't really see the CoW copy methods as necessary faster than the regular copy (even though it can be with (5)). I think the main benefits are the space savings, enabled by patches (1)-(3). If (4) and (5) get it, that's a bonus, but even without that I don't think the performance is an issue - everything has a cost. On 4/3/24 15:39, Jakub Wartak wrote: > On Mon, Apr 1, 2024 at 9:46 PM Tomas Vondra > wrote: >> >> Hi, >> >> I've been running some benchmarks and experimenting with various stuff, >> trying to improve the poor performance on ZFS, and the regression on XFS >> when using copy_file_range. And oh boy, did I find interesting stuff ... > > [..] > > Congratulations on great results! > >> 4) after each pg_combinebackup run to pg_verifybackup, start the cluster >> to finish recovery, run pg_checksums --check (to check the patches don't >> produce something broken) > > I've performed some follow-up small testing on all patches mentioned > here (1..7), with the earlier developed nano-incremental-backup-tests > that helped detect some issues for Robert earlier during original > development. They all went fine in both cases: > - no special options when using pg_combinebackup > - using pg_combinebackup --copy-file-range --manifest-checksums=NONE > > Those were: > test_across_wallevelminimal.sh > test_full_pri__incr_stby__restore_on_pri.sh > test_full_pri__incr_stby__restore_on_stby.sh > test_full_stby__incr_stby__restore_on_pri.sh > test_full_stby__incr_stby__restore_on_stby.sh > test_incr_after_timelineincrease.sh > test_incr_on_standby_after_promote.sh > test_many_incrementals_dbcreate_duplicateOID.sh >
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
On 30.03.24 22:27, Thomas Munro wrote: On Sun, Mar 31, 2024 at 9:59 AM Tom Lane wrote: Thomas Munro writes: I was reminded of this thread by ambient security paranoia. As it stands, we require 1.0.2 (but we very much hope that package maintainers and others in control of builds don't decide to use it). Should we skip 1.1.1 and move to requiring 3 for v17? I'd be kind of sad if I couldn't test SSL stuff anymore on my primary workstation, which has $ rpm -q openssl openssl-1.1.1k-12.el8_9.x86_64 I think it's probably true that <=1.0.2 is not in any distro that we still need to pay attention to, but I reject the contention that RHEL8 is not in that set. Hmm, OK so it doesn't have 3 available in parallel from base repos. But it's also about to reach end of "full support" in 2 months[1], so if we applied the policies we discussed in the LLVM-vacuuming thread (to wit: build farm - EOL'd OSes), then... One question I'm unclear on is whether v17 will be packaged for RHEL8. The rest of the thread talks about the end of support of RHEL 7, but you are here talking about RHEL 8. It is true that "full support" for RHEL 8 ended in May 2024, but that is the not the one we are tracking. We are tracking the 10-year one, which I suppose is now called "maintenance support". So if the above package list is correct, then we ought to keep supporting openssl 1.1.* until 2029.
Re: Popcount optimization using AVX512
On Tue, Apr 02, 2024 at 11:30:39PM +0300, Ants Aasma wrote: > On Tue, 2 Apr 2024 at 00:31, Nathan Bossart wrote: >> On Tue, Apr 02, 2024 at 12:11:59AM +0300, Ants Aasma wrote: >> > What about using the masking capabilities of AVX-512 to handle the >> > tail in the same code path? Masked out portions of a load instruction >> > will not generate an exception. To allow byte level granularity >> > masking, -mavx512bw is needed. Based on wikipedia this will only >> > disable this fast path on Knights Mill (Xeon Phi), in all other cases >> > VPOPCNTQ implies availability of BW. >> >> Sounds promising. IMHO we should really be sure that these kinds of loads >> won't generate segfaults and the like due to the masked-out portions. I >> searched around a little bit but haven't found anything that seemed >> definitive. > > After sleeping on the problem, I think we can avoid this question > altogether while making the code faster by using aligned accesses. > Loads that straddle cache line boundaries run internally as 2 load > operations. Gut feel says that there are enough out-of-order resources > available to make it not matter in most cases. But even so, not doing > the extra work is surely better. Attached is another approach that > does aligned accesses, and thereby avoids going outside bounds. > > Would be interesting to see how well that fares in the small use case. > Anything that fits into one aligned cache line should be constant > speed, and there is only one branch, but the mask setup and folding > the separate popcounts together should add up to about 20-ish cycles > of overhead. I tested your patch in comparison to v25 and saw the following: bytes v25 v25+ants 21108.205 1033.132 41311.227 1289.373 81927.954 2360.113 162281.091 2365.408 323856.992 2390.688 643648.72 3242.498 1284108.549 3607.148 2564910.076 4496.852 For 2 bytes and 4 bytes, the inlining should take effect, so any difference there is likely just noise. At 8 bytes, we are calling the function pointer, and there is a small regression with the masking approach. However, by 16 bytes, the masking approach is on par with v25, and it wins for all larger buffers, although the gains seem to taper off a bit. If we can verify this approach won't cause segfaults and can stomach the regression between 8 and 16 bytes, I'd happily pivot to this approach so that we can avoid the function call dance that I have in v25. Thoughts? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
> On 4 Apr 2024, at 00:06, Tom Lane wrote: > > Daniel Gustafsson writes: >> On 3 Apr 2024, at 19:38, Tom Lane wrote: >>> Bottom line for me is that pulling 1.0.1 support now is OK, >>> but I think pulling 1.0.2 is premature. > >> Is Red Hat building and and shipping v17 packages for RHEL7 ELS customers? >> If >> not then it seems mostly academical to tie our dependencies to RHEL ELS >> unless >> I'm missing something. > > True, they won't be doing that, and neither will Devrim. So maybe > we can leave RHEL7 out of the discussion, in which case there's > not a lot of reason to keep 1.0.2 support. We'll need to notify > buildfarm owners to adjust their configurations. The patch will also need to be adjusted to work with LibreSSL, but I know Jacob was looking into that so ideally we should have something to review before the weekend. -- Daniel Gustafsson
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
> On 3 Apr 2024, at 21:48, Andrew Dunstan wrote: > On 2024-04-03 We 15:12, Daniel Gustafsson wrote: >> The >> fact that very few animals run the ssl tests is a pet peeve of mine, it would >> be nice if we could get broader coverage there. > > Well, the only reason for that is that the SSL tests need to be listed in > PG_TEST_EXTRA, and the only reason for that is that there's a possible > hazard on multi-user servers. But I bet precious few buildfarm animals run in > such an environment. Mine don't - I'm the only user. > > Maybe we could send out an email to the buildfarm-owners list asking people > to consider allowing the ssl tests. I think that sounds like a good idea. -- Daniel Gustafsson
Re: Security lessons from liblzma - libsystemd
Hi, On 2024-04-03 17:58:55 -0400, Tom Lane wrote: > Magnus Hagander writes: > > On Wed, Apr 3, 2024 at 7:57 PM Andres Freund wrote: > >> Openssh has now integrated [1] a patch to remove the dependency on > >> libsystemd > >> for triggering service manager readyness notifications, by inlining the > >> necessary function. That's not hard, the protocol is pretty simple. > >> I suspect we should do the same. We're not even close to being a target as > >> attractive as openssh, but still, it seems unnecessary. > > > +1. > > I didn't read the patch, but if it's short and stable enough then this > seems like a good idea. It's basically just checking for an env var, opening the unix socket indicated by that, writing a string to it and closing the socket again. > (If openssh and we are using such a patch, that will probably be a big > enough stake in the ground to prevent somebody deciding to change the > protocol ...) One version of the openssh patch to remove liblzma was submitted by one of the core systemd devs, so I think they agree that it's a stable API. The current protocol supports adding more information by adding attributes, so it should be extensible enough anyway. > >> An argument could be made to instead just remove support, but I think it's > >> quite valuable to have intra service dependencies that can rely on the > >> server actually having started up. > > > If we remove support we're basically just asking most of our linux > > packagers to add it back in, and they will add it back in the same way we > > did it. I think we do everybody a disservice if we do that. It's useful > > functionality. > > Yeah, that idea seems particularly silly in view of the desire > expressed earlier in this thread to reduce the number of patches > carried by packagers. People packaging for systemd-using distros > will not consider that this functionality is optional. Yep. Greetings, Andres Freund
Re: Streaming read-ready sequential scan code
Hi, On 2024-04-04 09:27:35 +1300, Thomas Munro wrote: > Anecdotally by all reports I've seen so far, all-in-cache seems to be > consistently a touch faster than master if anything, for streaming seq > scan and streaming ANALYZE. That's great!, but it doesn't seem to be > due to intentional changes. No efficiency is coming from batching > anything. Perhaps it has to do with CPU pipelining effects: though > it's doing the same work as ReadBuffer()-when-it-gets-a-hit, the work > itself is cut up into stages in a kind of pipeline: > read_stream_next_buffer() chooses page n + 2, pins page n + 1 and > returns page n each time you call it, so maybe we get more CPU > parallelism due to spreading the data dependencies out? Another theory is that it's due to the plain ReadBuffer() path needing to do RelationGetSmgr(reln) on every call, whereas the streaming read path doesn't need to. > > On the topic of BAS_BULKREAD buffer access strategy, I think the least > > we could do is add an assert like this to read_stream_begin_relation() > > after calculating max_pinned_buffers. > > > > Assert(GetAccessStrategyBufferCount(strategy) > max_pinned_buffers); > > > > Perhaps we should do more? I think with a ring size of 16 MB, large > > SELECTs are safe for now. But I know future developers will make > > changes and it would be good not to assume they will understand that > > pinning more buffers than the size of the ring effectively invalidates > > the ring. > > Yeah I think we should clamp max_pinned_buffers if we see a strategy. > What do you think about: > > if (strategy) > { > int strategy_buffers = GetAccessStrategyBufferCount(strategy); > max_pinned_buffers = Min(strategy_buffers / 2, max_pinned_buffers); > } > I just don't know where to get that '2'. The idea would be to > hopefully never actually be constrained by it in practice, except in > tiny/toy setups, so we can't go too wrong with our number '2' there. The / 2 is to avoid causing unnecessarily frequent WAL flushes, right? If so, should we apply that only if the ring the strategy doesn't use the StrategyRejectBuffer() logic? I think it's fine to add a handwavy justification for the 2, saying that we want to balance readahead distance and reducing WAL write frequency, and that at some point more sophisticated logic will be needed. > Then we should increase the default ring sizes for BAS_BULKREAD and > BAS_VACUUM to make the previous statement true. I'm not sure it's right tying them together. The concerns for both are fairly different. > The size of main memory and L2 cache have increased dramatically since those > strategies were invented. I think we should at least double them, and more > likely quadruple them. I realise you already made them configurable per > command in commit 1cbbee033, but I mean the hard coded default 256 in > freelist.c. It's not only to get more space for our prefetching plans, it's > also to give the system more chance of flushing WAL asynchronously/in some > other backend before you crash into dirty data; as you discovered, > prefetching accidentally makes that effect worse in.a BAS_VACUUM strategy, > by taking away space that is effectively deferring WAL flushes, so I'd at > least like to double the size for if we use "/ 2" above. I think for VACUUM we should probably go a bit further. There's no comparable L1/L2 issue, because the per-buffer processing + WAL insertion is a lot more expensive, compared to a seqscan. I'd go or at lest 4x-8x. Greetings, Andres Freund
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
Daniel Gustafsson writes: > On 3 Apr 2024, at 19:38, Tom Lane wrote: >> Bottom line for me is that pulling 1.0.1 support now is OK, >> but I think pulling 1.0.2 is premature. > Is Red Hat building and and shipping v17 packages for RHEL7 ELS customers? If > not then it seems mostly academical to tie our dependencies to RHEL ELS unless > I'm missing something. True, they won't be doing that, and neither will Devrim. So maybe we can leave RHEL7 out of the discussion, in which case there's not a lot of reason to keep 1.0.2 support. We'll need to notify buildfarm owners to adjust their configurations. regards, tom lane
Re: Security lessons from liblzma - libsystemd
Magnus Hagander writes: > On Wed, Apr 3, 2024 at 7:57 PM Andres Freund wrote: >> Openssh has now integrated [1] a patch to remove the dependency on >> libsystemd >> for triggering service manager readyness notifications, by inlining the >> necessary function. That's not hard, the protocol is pretty simple. >> I suspect we should do the same. We're not even close to being a target as >> attractive as openssh, but still, it seems unnecessary. > +1. I didn't read the patch, but if it's short and stable enough then this seems like a good idea. (If openssh and we are using such a patch, that will probably be a big enough stake in the ground to prevent somebody deciding to change the protocol ...) >> An argument could be made to instead just remove support, but I think it's >> quite valuable to have intra service dependencies that can rely on the >> server actually having started up. > If we remove support we're basically just asking most of our linux > packagers to add it back in, and they will add it back in the same way we > did it. I think we do everybody a disservice if we do that. It's useful > functionality. Yeah, that idea seems particularly silly in view of the desire expressed earlier in this thread to reduce the number of patches carried by packagers. People packaging for systemd-using distros will not consider that this functionality is optional. regards, tom lane
Re: Detoasting optionally to make Explain-Analyze less misleading
Matthias van de Meent writes: > On Tue, 2 Apr 2024 at 17:47, Tom Lane wrote: >> IIUC, it's not possible to use the SERIALIZE option when explaining >> CREATE TABLE AS, because you can't install the instrumentation tuple >> receiver when the IntoRel one is needed. I think that's fine because >> no serialization would happen in that case anyway, but should we >> throw an error if that combination is requested? Blindly reporting >> that zero bytes were serialized seems like it'd confuse people. > I think it's easily explained as no rows were transfered to the > client. If there is actual confusion, we can document it, but > confusing disk with network is not a case I'd protect against. See > also: EXPLAIN (ANALYZE, SERIALIZE) INSERT without the RETURNING > clause. Fair enough. There were a couple of spots in the code where I thought this was important to comment about, though. >> However, isn't the bottom half of serializeAnalyzeStartup doing >> exactly what the comment above it says we don't do, that is accounting >> for the RowDescription message? Frankly I agree with the comment that >> it's not worth troubling over, so I'd just drop that code rather than >> finding a solution for the magic-number problem. > I'm not sure I agree with not including the size of RowDescription > itself though, as wide results can have a very large RowDescription > overhead; up to several times the returned data in cases where few > rows are returned. Meh --- if we're rounding off to kilobytes, you're unlikely to see it. In any case, if we start counting overhead messages, where shall we stop? Should we count the eventual CommandComplete or ReadyForQuery, for instance? I'm content to say that this measures data only; that seems to jibe with other aspects of EXPLAIN's behavior. > Removed. I've instead added buffer usage, as I realised that wasn't > covered yet, and quite important to detect excessive detoasting (it's > not covered at the top-level scan). Duh, good catch. I've pushed this after a good deal of cosmetic polishing -- for example, I spent some effort on making serializeAnalyzeReceive look as much like printtup as possible, in hopes of making it easier to keep the two functions in sync in future. regards, tom lane
Re: Streaming read-ready sequential scan code
On Wed, Apr 3, 2024 at 4:28 PM Thomas Munro wrote: > > On Thu, Apr 4, 2024 at 6:03 AM Melanie Plageman > wrote: > > On the topic of BAS_BULKREAD buffer access strategy, I think the least > > we could do is add an assert like this to read_stream_begin_relation() > > after calculating max_pinned_buffers. > > > > Assert(GetAccessStrategyBufferCount(strategy) > max_pinned_buffers); > > > > Perhaps we should do more? I think with a ring size of 16 MB, large > > SELECTs are safe for now. But I know future developers will make > > changes and it would be good not to assume they will understand that > > pinning more buffers than the size of the ring effectively invalidates > > the ring. > > Yeah I think we should clamp max_pinned_buffers if we see a strategy. > What do you think about: > > if (strategy) > { > int strategy_buffers = GetAccessStrategyBufferCount(strategy); > max_pinned_buffers = Min(strategy_buffers / 2, max_pinned_buffers); > } > > I just don't know where to get that '2'. The idea would be to > hopefully never actually be constrained by it in practice, except in > tiny/toy setups, so we can't go too wrong with our number '2' there. Yea, I don't actually understand why not just use strategy_buffers - 1 or something. 1/2 seems like a big limiting factor for those strategies with small rings. I don't really think it will come up for SELECT queries since they rely on readahead and not prefetching. It does seem like it could easily come up for analyze. But I am on board with the idea of clamping for sequential scan/TID range scan. For vacuum, we might have to think harder. If the user specifies buffer_usage_limit and io_combine_limit and they are never reaching IOs of io_combine_limit because of their buffer_usage_limit value, then we should probably inform them. - Melanie
Re: [HACKERS] make async slave to wait for lsn to be replayed
On Wed, Apr 3, 2024 at 10:04 PM Pavel Borisov wrote: > On Wed, 3 Apr 2024 at 22:18, Alexander Korotkov wrote: >> >> On Wed, Apr 3, 2024 at 7:55 PM Alvaro Herrera >> wrote: >> > >> > On 2024-Apr-03, Alexander Korotkov wrote: >> > >> > > Regarding the shmem data structure for LSN waiters. I didn't pick >> > > LWLock or ConditionVariable, because I needed the ability to wake up >> > > only those waiters whose LSN is already replayed. In my experience >> > > waking up a process is way slower than scanning a short flat array. >> > >> > I agree, but I think that's unrelated to what I was saying, which is >> > just the patch I attach here. >> >> Oh, sorry for the confusion. I'd re-read your message. Indeed you >> meant this very clearly! >> >> I'm good with the patch. Attached revision contains a bit of a commit >> message. >> >> > > However, I agree that when the number of waiters is very high and flat >> > > array may become a problem. It seems that the pairing heap is not >> > > hard to use for shmem structures. The only memory allocation call in >> > > paritingheap.c is in pairingheap_allocate(). So, it's only needed to >> > > be able to initialize the pairing heap in-place, and it will be fine >> > > for shmem. >> > >> > Ok. >> > >> > With the code as it stands today, everything in WaitLSNState apart from >> > the pairing heap is accessed without any locking. I think this is at >> > least partly OK because each backend only accesses its own entry; but it >> > deserves a comment. Or maybe something more, because WaitLSNSetLatches >> > does modify the entry for other backends. (Admittedly, this could only >> > happens for backends that are already sleeping, and it only happens >> > with the lock acquired, so it's probably okay. But clearly it deserves >> > a comment.) >> >> Please, check 0002 patch attached. I found it easier to move two >> assignments we previously moved out of lock, into the lock; then claim >> WaitLSNState.procInfos is also protected by WaitLSNLock. > > Could you re-attach 0002. Seems it failed to attach to the previous message. I actually forgot both! -- Regards, Alexander Korotkov v2-0001-Use-an-LWLock-instead-of-a-spinlock-in-waitlsn.c.patch Description: Binary data v2-0002-Clarify-what-is-protected-by-WaitLSNLock.patch Description: Binary data
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Alvaro Herrera writes: > Great, thanks for looking. Pushed now, I'll be closing the commitfest > entry shortly. On my machine, headerscheck does not like this: $ src/tools/pginclude/headerscheck --cplusplus In file included from /tmp/headerscheck.4gTaW5/test.cpp:3: ./src/include/libpq/libpq-be-fe-helpers.h: In function 'char* libpqsrv_cancel(PGconn*, TimestampTz)': ./src/include/libpq/libpq-be-fe-helpers.h:393:10: warning: ISO C++ forbids converting a string constant to 'char*' [-Wwrite-strings] return "out of memory"; ^~~ ./src/include/libpq/libpq-be-fe-helpers.h:421:13: warning: ISO C++ forbids converting a string constant to 'char*' [-Wwrite-strings] error = "cancel request timed out"; ^~ The second part of that could easily be fixed by declaring "error" as "const char *". As for the first part, can we redefine the whole function as returning "const char *"? (If not, this coding is very questionable anyway.) regards, tom lane
Re: Security lessons from liblzma - libsystemd
On Wed, Apr 3, 2024 at 7:57 PM Andres Freund wrote: > Hi, > > As most will know by now, the way xz debacle was able to make sshd > vulnerable > was through a dependency from sshd to libsystemd and then from libsystemd > to > liblzma. One lesson from this is that unnecessary dependencies can still > increase risk. > Yeah, I think that's something to consider for every dependency added. I think we're fairly often protected against "adding too many libraries" because many libraries simply don't exist for all the platforms we want to build on. But it's nevertheless something to think about each time. It's worth noting that we have an optional dependency on libsystemd as well. > > Openssh has now integrated [1] a patch to remove the dependency on > libsystemd > for triggering service manager readyness notifications, by inlining the > necessary function. That's not hard, the protocol is pretty simple. > > I suspect we should do the same. We're not even close to being a target as > attractive as openssh, but still, it seems unnecessary. > +1. When the code is this simple, we should definitely consider carrying it ourselves. At least if we don't expect to need *other* functionality from the same library in the future, which I doubt we will from libsystemd. An argument could be made to instead just remove support, but I think it's > quite valuable to have intra service dependencies that can rely on the > server > actually having started up. > > If we remove support we're basically just asking most of our linux packagers to add it back in, and they will add it back in the same way we did it. I think we do everybody a disservice if we do that. It's useful functionality. //Magnus
Re: On disable_cost
On Thu, 4 Apr 2024 at 08:21, Robert Haas wrote: > I wanted to further explore the idea of just not generating plans of > types that are currently disabled. I looked into doing this for > enable_indexscan and enable_indexonlyscan. As a first step, I > investigated how those settings work now, and was horrified. I don't > know whether I just wasn't paying attention back when the original > index-only scan work was done -- I remember discussing > enable_indexonlyscan with you at the time -- or whether it got changed > subsequently. Anyway, the current behavior is: > > [A] enable_indexscan=false adds disable_cost to the cost of every > Index Scan path *and also* every Index-Only Scan path. So disabling > index-scans also in effect discourages the use of index-only scans, > which would make sense if we didn't have a separate setting called > enable_indexonlyscan, but we do. Given that, I think this is > completely and utterly wrong. > > [b] enable_indexonlyscan=false causes index-only scan paths not to be > generated at all, but instead, we generate index-scan paths to do the > same thing that we would not have generated otherwise. FWIW, I think changing this is a bad idea and I don't think the behaviour that's in your patch is useful. With your patch, if I SET enable_indexonlyscan=false, any index that *can* support an IOS for my query will now not be considered at all! With your patch applied, I see: -- default enable_* GUC values. postgres=# explain select oid from pg_class order by oid; QUERY PLAN --- Index Only Scan using pg_class_oid_index on pg_class (cost=0.27..22.50 rows=415 width=4) (1 row) postgres=# set enable_indexonlyscan=0; -- no index scan? SET postgres=# explain select oid from pg_class order by oid; QUERY PLAN - Sort (cost=36.20..37.23 rows=415 width=4) Sort Key: oid -> Seq Scan on pg_class (cost=0.00..18.15 rows=415 width=4) (3 rows) postgres=# set enable_seqscan=0; -- still no index scan! SET postgres=# explain select oid from pg_class order by oid; QUERY PLAN Sort (cost=136.20..137.23 rows=415 width=4) Sort Key: oid -> Seq Scan on pg_class (cost=100.00..118.15 rows=415 width=4) (3 rows) postgres=# explain select oid from pg_class order by oid,relname; -- now an index scan?! QUERY PLAN - Incremental Sort (cost=0.43..79.50 rows=415 width=68) Sort Key: oid, relname Presorted Key: oid -> Index Scan using pg_class_oid_index on pg_class (cost=0.27..60.82 rows=415 width=68) (4 rows) I don't think this is good as pg_class_oid_index effectively won't be used as long as the particular query could use that index with an index *only* scan. You can see above that as soon as I adjust the query slightly so that IOS isn't possible, the index can be used again. I think an Index Scan would have been a much better option for the 2nd query than the seq scan and sort. I think if I do SET enable_indexonlyscan=0; the index should still be used with an Index Scan and it definitely shouldn't result in Index Scan also being disabled if that index happens to contain all the columns required to support an IOS. FWIW, I'm fine with the current behaviour. It looks like we've assumed that, when possible, IOS are always superior to Index Scan, so there's no point in generating an Index Scan path when we can generate an IOS path. I think this makes sense. For that not to be true, checking the all visible flag would have to be more costly than visiting the heap. Perhaps that could be true if the visibility map page had to come from disk and the heap page was cached and the disk was slow, but I don't think that scenario is worthy of considering both Index scan and IOS path types when IOS is possible. We've no way to accurately cost that anyway. This all seems similar to enable_sort vs enable_incremental_sort. For a while, we did consider both an incremental sort and a sort when an incremental sort was possible, but it seemed to me that an incremental sort would always be better when it was possible, so I changed that in 4a29eabd1. I've not seen anyone complain. I made it so that when an incremental sort is possible but is disabled, we do a sort instead. That seems fairly similar to how master handles enable_indexonlyscan=false. In short, I don't find it strange that disabling one node type results in considering another type that we'd otherwise not consider in cases where we assume that the disabled node type is always superior and should always be used when it is possible. I do
Re: Use streaming read API in ANALYZE
On Wed, Apr 03, 2024 at 10:25:01PM +0300, Nazir Bilal Yavuz wrote: > > I realized the same error while working on Jakub's benchmarking results. > > Cause: I was using the nblocks variable to check how many blocks will > be returned from the streaming API. But I realized that sometimes the > number returned from BlockSampler_Init() is not equal to the number of > blocks that BlockSampler_Next() will return as BlockSampling algorithm > decides how many blocks to return on the fly by using some random > seeds. > > There are a couple of solutions I thought of: > > 1- Use BlockSampler_HasMore() instead of nblocks in the main loop in > the acquire_sample_rows(): > > Streaming API uses this function to prefetch block numbers. > BlockSampler_HasMore() will reach to the end first as it is used while > prefetching, so it will start to return false while there are still > buffers to return from the streaming API. That will cause some buffers > at the end to not be processed. > > 2- Expose something (function, variable etc.) from the streaming API > to understand if the read is finished and there is no buffer to > return: > > I think this works but I am not sure if the streaming API allows > something like that. > > 3- Check every buffer returned from the streaming API, if it is > invalid stop the main loop in the acquire_sample_rows(): > > This solves the problem but there will be two if checks for each > buffer returned, > - in heapam_scan_analyze_next_block() to check if the returned buffer is > invalid > - to break main loop in acquire_sample_rows() if > heapam_scan_analyze_next_block() returns false > One of the if cases can be bypassed by moving > heapam_scan_analyze_next_block()'s code to the main loop in the > acquire_sample_rows(). > > I implemented the third solution, here is v6. I've reviewed the patches inline below and attached a patch that has some of my ideas on top of your patch. > From 8d396a42186325f920d5a05e7092d8e1b66f3cdf Mon Sep 17 00:00:00 2001 > From: Nazir Bilal Yavuz > Date: Wed, 3 Apr 2024 15:14:15 +0300 > Subject: [PATCH v6] Use streaming read API in ANALYZE > > ANALYZE command gets random tuples using BlockSampler algorithm. Use > streaming reads to get these tuples by using BlockSampler algorithm in > streaming read API prefetch logic. > --- > src/include/access/heapam.h | 6 +- > src/backend/access/heap/heapam_handler.c | 22 +++--- > src/backend/commands/analyze.c | 85 > 3 files changed, 42 insertions(+), 71 deletions(-) > > diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h > index a307fb5f245..633caee9d95 100644 > --- a/src/include/access/heapam.h > +++ b/src/include/access/heapam.h > @@ -25,6 +25,7 @@ > #include "storage/bufpage.h" > #include "storage/dsm.h" > #include "storage/lockdefs.h" > +#include "storage/read_stream.h" > #include "storage/shm_toc.h" > #include "utils/relcache.h" > #include "utils/snapshot.h" > @@ -388,9 +389,8 @@ extern bool HeapTupleIsSurelyDead(HeapTuple htup, > struct > GlobalVisState *vistest); > > /* in heap/heapam_handler.c*/ > -extern void heapam_scan_analyze_next_block(TableScanDesc scan, > - >BlockNumber blockno, > - >BufferAccessStrategy bstrategy); > +extern bool heapam_scan_analyze_next_block(TableScanDesc scan, > + >ReadStream *stream); > extern bool heapam_scan_analyze_next_tuple(TableScanDesc scan, > >TransactionId OldestXmin, > >double *liverows, double *deadrows, > diff --git a/src/backend/access/heap/heapam_handler.c > b/src/backend/access/heap/heapam_handler.c > index 0952d4a98eb..d83fbbe6af3 100644 > --- a/src/backend/access/heap/heapam_handler.c > +++ b/src/backend/access/heap/heapam_handler.c > @@ -1054,16 +1054,16 @@ heapam_relation_copy_for_cluster(Relation OldHeap, > Relation NewHeap, > } > > /* > - * Prepare to analyze block `blockno` of `scan`. The scan has been started > - * with SO_TYPE_ANALYZE option. > + * Prepare to analyze block returned from streaming object. If the block > returned > + * from streaming object is valid, true is returned; otherwise false is > returned. > + * The scan has been started with SO_TYPE_ANALYZE option. > * > * This routine holds a buffer pin and lock on the heap page. They are held > * until heapam_scan_analyze_next_tuple() returns false. That is until all > the > * items of the heap page are analyzed. > */ > -void > -heapam_scan_analyze_next_block(TableScanDesc scan, BlockNumber blockno, > -
RE: Can't compile PG 17 (master) from git under Msys2 autoconf
> Hi Regina, > > On 2024-Mar-27, Regina Obe wrote: > > > The error is > > > > rm -f '../../src/include/storage/lwlocknames.h' > > cp -pR ../../backend/storage/lmgr/lwlocknames.h > > '../../src/include/storage/lwlocknames.h' > > cp: cannot stat '../../backend/storage/lmgr/lwlocknames.h': No such > > file or directory > > make[1]: *** [Makefile:143: ../../src/include/storage/lwlocknames.h] > > Error 1 > > make[1]: Leaving directory '/projects/postgresql/postgresql- > git/src/backend' > > make: *** [../../src/Makefile.global:382: submake-generated-headers] > > Error 2 > > Hmm, I changed these rules again in commit da952b415f44, maybe your > problem is with that one? I wonder if changing the references to > ../include/storage/lwlocklist.h to $(topdir)/src/include/storage/lwlocklist.h > (and similar things in > src/backend/storage/lmgr/Makefile) would fix it. > > Do you run configure in the source directory or a separate build one? > > -- > Álvaro HerreraBreisgau, Deutschland — > https://www.EnterpriseDB.com/ > "If it is not right, do not do it. > If it is not true, do not say it." (Marcus Aurelius, Meditations) I run in the source directory, but tried doing in a separate build directory and ran into the same issue. Let me look at that commit and get back to you if that makes a difference. Thanks, Regina
Re: Streaming read-ready sequential scan code
On Thu, Apr 4, 2024 at 6:03 AM Melanie Plageman wrote: > On Tue, Apr 2, 2024 at 1:10 PM Heikki Linnakangas wrote: > > On 01/04/2024 22:58, Melanie Plageman wrote: > > > Attached v7 has version 14 of the streaming read API as well as a few > > > small tweaks to comments and code. > > > > I saw benchmarks in this thread to show that there's no regression when > > the data is in cache, but I didn't see any benchmarks demonstrating the > > benefit of this. So I ran this quick test: > > Good point! It would be good to show why we would actually want this > patch. Attached v8 is rebased over current master (which now has the > streaming read API). Anecdotally by all reports I've seen so far, all-in-cache seems to be consistently a touch faster than master if anything, for streaming seq scan and streaming ANALYZE. That's great!, but it doesn't seem to be due to intentional changes. No efficiency is coming from batching anything. Perhaps it has to do with CPU pipelining effects: though it's doing the same work as ReadBuffer()-when-it-gets-a-hit, the work itself is cut up into stages in a kind of pipeline: read_stream_next_buffer() chooses page n + 2, pins page n + 1 and returns page n each time you call it, so maybe we get more CPU parallelism due to spreading the data dependencies out? (Makes me wonder what happens if you insert a memory prefetch for the page header into that production line, and if there are more opportunities to spread dependencies out eg hashing the buffer tag ahead of time.) > On the topic of BAS_BULKREAD buffer access strategy, I think the least > we could do is add an assert like this to read_stream_begin_relation() > after calculating max_pinned_buffers. > > Assert(GetAccessStrategyBufferCount(strategy) > max_pinned_buffers); > > Perhaps we should do more? I think with a ring size of 16 MB, large > SELECTs are safe for now. But I know future developers will make > changes and it would be good not to assume they will understand that > pinning more buffers than the size of the ring effectively invalidates > the ring. Yeah I think we should clamp max_pinned_buffers if we see a strategy. What do you think about: if (strategy) { int strategy_buffers = GetAccessStrategyBufferCount(strategy); max_pinned_buffers = Min(strategy_buffers / 2, max_pinned_buffers); } I just don't know where to get that '2'. The idea would be to hopefully never actually be constrained by it in practice, except in tiny/toy setups, so we can't go too wrong with our number '2' there. Then we should increase the default ring sizes for BAS_BULKREAD and BAS_VACUUM to make the previous statement true. The size of main memory and L2 cache have increased dramatically since those strategies were invented. I think we should at least double them, and more likely quadruple them. I realise you already made them configurable per command in commit 1cbbee033, but I mean the hard coded default 256 in freelist.c. It's not only to get more space for our prefetching plans, it's also to give the system more chance of flushing WAL asynchronously/in some other backend before you crash into dirty data; as you discovered, prefetching accidentally makes that effect worse in.a BAS_VACUUM strategy, by taking away space that is effectively deferring WAL flushes, so I'd at least like to double the size for if we use "/ 2" above.
Re: LogwrtResult contended spinlock
On Wed, 2024-04-03 at 13:19 +0200, Alvaro Herrera wrote: > So what I do in the attached 0001 is stop using the XLogwrtResult > struct > in XLogCtl and replace it with separate Write and Flush values, and > add > the macro XLogUpdateLocalLogwrtResult() that copies the values of > Write > and Flush from the shared XLogCtl to the local variable given as > macro > argument. +1 > 0002 then adds pg_atomic_monotonic_advance_u64. (I don't add the > _u32 > variant, because I don't think it's a great idea to add dead code. > If > later we see a need for it we can put it in.) It also changes the > two > new members to be atomics, changes the macro to use atomic read, and > XLogWrite now uses monotonic increment. A couple of other places can > move the macro calls to occur outside the spinlock. Also, XLogWrite > gains the invariant checking that involves Write and Flush. To maintain the invariant that Write >= Flush, I believe you need to always store to Write first, then Flush; and load from Flush first, then from Write. So RefreshXLogWriteResult should load Flush then load Write, and the same for the Assert code. And in 0003, loading from Copy should happen last. Also, should pg_atomic_monotonic_advance_u64() return currval? I don't think it's important for the current patches, but I could imagine a caller wanting the most up-to-date value possible, even if it's beyond what the caller requested. Returning void seems slightly wasteful. Other than that, it looks good to me. > Finally, 0003 adds the Copy pointer to XLogCtl alongside Write and > Flush, and updates WALReadFromBuffers to test that instead of the > Write > pointer, and adds in XLogWrite the invariant checks that involve the > Copy pointer. Looks good to me. Regards, Jeff Davis
Re: Popcount optimization using AVX512
On Wed, Apr 03, 2024 at 12:41:27PM -0500, Nathan Bossart wrote: > I committed v23-0001. Here is a rebased version of the remaining patches. > I intend to test the masking idea from Ants next. 0002 was missing a cast that is needed for the 32-bit builds. I've fixed that in v25. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From fe001e38b3f209c2fe615a2c4c64109d5e4d3da1 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Wed, 27 Mar 2024 16:39:24 -0500 Subject: [PATCH v25 1/2] AVX512 popcount support --- config/c-compiler.m4 | 58 ++ configure| 252 +++ configure.ac | 51 ++ meson.build | 87 + src/Makefile.global.in | 5 + src/include/pg_config.h.in | 12 ++ src/include/port/pg_bitutils.h | 15 ++ src/makefiles/meson.build| 4 +- src/port/Makefile| 11 ++ src/port/meson.build | 6 +- src/port/pg_bitutils.c | 29 ++- src/port/pg_popcount_avx512.c| 49 ++ src/port/pg_popcount_avx512_choose.c | 71 src/test/regress/expected/bit.out| 24 +++ src/test/regress/sql/bit.sql | 4 + 15 files changed, 673 insertions(+), 5 deletions(-) create mode 100644 src/port/pg_popcount_avx512.c create mode 100644 src/port/pg_popcount_avx512_choose.c diff --git a/config/c-compiler.m4 b/config/c-compiler.m4 index 3268a780bb..5fb60775ca 100644 --- a/config/c-compiler.m4 +++ b/config/c-compiler.m4 @@ -694,3 +694,61 @@ if test x"$Ac_cachevar" = x"yes"; then fi undefine([Ac_cachevar])dnl ])# PGAC_LOONGARCH_CRC32C_INTRINSICS + +# PGAC_XSAVE_INTRINSICS +# - +# Check if the compiler supports the XSAVE instructions using the _xgetbv +# intrinsic function. +# +# An optional compiler flag can be passed as argument (e.g., -mxsave). If the +# intrinsic is supported, sets pgac_xsave_intrinsics and CFLAGS_XSAVE. +AC_DEFUN([PGAC_XSAVE_INTRINSICS], +[define([Ac_cachevar], [AS_TR_SH([pgac_cv_xsave_intrinsics_$1])])dnl +AC_CACHE_CHECK([for _xgetbv with CFLAGS=$1], [Ac_cachevar], +[pgac_save_CFLAGS=$CFLAGS +CFLAGS="$pgac_save_CFLAGS $1" +AC_LINK_IFELSE([AC_LANG_PROGRAM([#include ], + [return _xgetbv(0) & 0xe0;])], + [Ac_cachevar=yes], + [Ac_cachevar=no]) +CFLAGS="$pgac_save_CFLAGS"]) +if test x"$Ac_cachevar" = x"yes"; then + CFLAGS_XSAVE="$1" + pgac_xsave_intrinsics=yes +fi +undefine([Ac_cachevar])dnl +])# PGAC_XSAVE_INTRINSICS + +# PGAC_AVX512_POPCNT_INTRINSICS +# - +# Check if the compiler supports the AVX512 POPCNT instructions using the +# _mm512_setzero_si512, _mm512_loadu_si512, _mm512_popcnt_epi64, +# _mm512_add_epi64, and _mm512_reduce_add_epi64 intrinsic functions. +# +# An optional compiler flag can be passed as argument +# (e.g., -mavx512vpopcntdq). If the intrinsics are supported, sets +# pgac_avx512_popcnt_intrinsics and CFLAGS_POPCNT. +AC_DEFUN([PGAC_AVX512_POPCNT_INTRINSICS], +[define([Ac_cachevar], [AS_TR_SH([pgac_cv_avx512_popcnt_intrinsics_$1])])dnl +AC_CACHE_CHECK([for _mm512_popcnt_epi64 with CFLAGS=$1], [Ac_cachevar], +[pgac_save_CFLAGS=$CFLAGS +CFLAGS="$pgac_save_CFLAGS $1" +AC_LINK_IFELSE([AC_LANG_PROGRAM([#include ], + [const char buf@<:@sizeof(__m512i)@:>@; + PG_INT64_TYPE popcnt = 0; + __m512i accum = _mm512_setzero_si512(); + const __m512i val = _mm512_loadu_si512((const __m512i *) buf); + const __m512i cnt = _mm512_popcnt_epi64(val); + accum = _mm512_add_epi64(accum, cnt); + popcnt = _mm512_reduce_add_epi64(accum); + /* return computed value, to prevent the above being optimized away */ + return popcnt == 0;])], + [Ac_cachevar=yes], + [Ac_cachevar=no]) +CFLAGS="$pgac_save_CFLAGS"]) +if test x"$Ac_cachevar" = x"yes"; then + CFLAGS_POPCNT="$1" + pgac_avx512_popcnt_intrinsics=yes +fi +undefine([Ac_cachevar])dnl +])# PGAC_AVX512_POPCNT_INTRINSICS diff --git a/configure b/configure index 36feeafbb2..b48ed7f271 100755 --- a/configure +++ b/configure @@ -647,6 +647,9 @@ MSGFMT_FLAGS MSGFMT PG_CRC32C_OBJS CFLAGS_CRC +PG_POPCNT_OBJS +CFLAGS_POPCNT +CFLAGS_XSAVE LIBOBJS OPENSSL ZSTD @@ -17404,6 +17407,40 @@ $as_echo "#define HAVE__GET_CPUID 1" >>confdefs.h fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for __get_cpuid_count" >&5 +$as_echo_n "checking for __get_cpuid_count... " >&6; } +if ${pgac_cv__get_cpuid_count+:} false; then : + $as_echo_n "(cached) " >&6 +else + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +#include +int +main () +{ +unsigned int exx[4] = {0, 0, 0, 0}; + __get_cpuid_count(7, 0, [0], [1], [2], [3]); + + ; + return 0; +} +_ACEOF +if ac_fn_c_try_link "$LINENO"; then : + pgac_cv__get_cpuid_count="yes" +else + pgac_cv__get_cpuid_count="no" +fi +rm -f core conftest.err conftest.$ac_objext \ +conftest$ac_exeext conftest.$ac_ext +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result:
Re: Opportunistically pruning page before update
On Fri, Jan 26, 2024 at 8:33 PM James Coleman wrote: > > On Tue, Jan 23, 2024 at 2:46 AM Dilip Kumar wrote: > > > > On Tue, Jan 23, 2024 at 7:18 AM James Coleman wrote: > > > > > > On Mon, Jan 22, 2024 at 8:21 PM James Coleman wrote: > > > > > > > > See rebased patch attached. > > > > > > I just realized I left a change in during the rebase that wasn't > > > necessary. > > > > > > v4 attached. > > > > I have noticed that you are performing the opportunistic pruning after > > we decided that the updated tuple can not fit in the current page and > > then we are performing the pruning on the new target page. Why don't > > we first perform the pruning on the existing page of the tuple itself? > > Or this is already being done before this patch? I could not find > > such existing pruning so got this question because such pruning can > > convert many non-hot updates to the HOT update right? > > First off I noticed that I accidentally sent a different version of > the patch I'd originally worked on. Here's the one from the proper > branch. It's still similar, but I want to make sure the right one is > being reviewed. I finally got back around to looking at this. Sorry for the delay. I don't feel confident enough to say at a high level whether or not it is a good idea in the general case to try pruning every block RelationGetBufferForTuple() considers as a target block. But, I did have a few thoughts on the implementation: heap_page_prune_opt() checks PageGetHeapFreeSpace() twice. You will have already done that in RelationGetBufferForTuple(). And you don't need even need to do it both of those times because you have a lock (which is why heap_page_prune_opt() does it twice). This all seems a bit wasteful. And then, you do it again after pruning. This made me think, vacuum cares how much space heap_page_prune() (now heap_page_prune_and_freeze()) freed up. Now if we add another caller who cares how much space pruning freed up, perhaps it is worth calculating this while pruning and returning it. I know PageGetHeapFreeSpace() isn't the most expensive function in the world, but it also seems like a useful, relevant piece of information to inform the caller of. You don't have to implement the above, it was just something I was thinking about. Looking at the code also made me wonder if it is worth changing RelationGetBufferForTuple() to call PageGetHeapFreeSpace() before taking a lock (which won't give a totally accurate result, but that's probably okay) and then call heap_page_prune_opt() without a lock when PageGetHeapFreeSpace() says there isn't enough space. Also do we want to do GetVisibilityMapPins() before calling heap_page_prune_opt()? I don't quite get why we do that before knowing if we are going to actually use the target block in the current code. Anyway, I'm not sure I like just adding a parameter to heap_page_prune_opt() to indicate it already has an exclusive lock. It does a bunch of stuff that was always done without a lock and now you are doing it with an exclusive lock. - Melanie
Re: On disable_cost
On Wed, Apr 3, 2024 at 3:21 PM Robert Haas wrote: > It's also pretty clear to me that the fact that enable_indexscan > and enable_indexonlyscan work completely differently from each other > is surprising at best, wrong at worst, but here again, what this patch > does about that is not above reproach. Yes, that is wrong, surely there is a reason we have two vars. Thanks for digging into this: if nothing else, the code will be better for this discussion, even if we do nothing for now with disable_cost. Cheers, Greg
Re: Statistics Import and Export
Corey Huinker writes: >> As far as that goes, it shouldn't be that hard to deal with, at least >> not for "soft" errors which hopefully cover most input-function >> failures these days. You should be invoking array_in via >> InputFunctionCallSafe and passing a suitably-set-up ErrorSaveContext. >> (Look at pg_input_error_info() for useful precedent.) > Ah, my understanding may be out of date. I was under the impression that > that mechanism relied on the the cooperation of the per-element input > function, so even if we got all the builtin datatypes to play nice with > *Safe(), we were always going to be at risk with a user-defined input > function. That's correct, but it's silly not to do what we can. Also, I imagine that there is going to be high evolutionary pressure on UDTs to support soft error mode for COPY, so over time the problem will decrease --- as long as we invoke the soft error mode. > 1. NULL input => Return NULL. (because strict). > 2. Actual error (permissions, cache lookup not found, etc) => Raise ERROR > (thus ruining binary upgrade) > 3. Call values are so bad (examples: attname not found, required stat > missing) that nothing can recover => WARN, return FALSE. > 4. At least one stakind-stat is wonky (impossible for datatype, missing > stat pair, wrong type on input parameter), but that's the worst of it => 1 > to N WARNs, write stats that do make sense, return TRUE. > 5. Hunky-dory. => No warns. Write all stats. return TRUE. > Which of those seem like good ereturn candidates to you? I'm good with all those behaviors. On reflection, the design I was vaguely imagining wouldn't cope with case 4 (multiple WARNs per call) so never mind that. regards, tom lane
Re: psql not responding to SIGINT upon db reconnection
On Wed, Apr 3, 2024 at 11:17 AM Tristan Partin wrote: > I think patch 2 makes it worse. The value in -Wswitch is that when new > enum variants are added, the developer knows the locations to update. > Adding a default case makes -Wswitch pointless. > > Patch 1 is still good. The comment change in patch 2 is good too! It seems to me that 0001 should either remove the pg_unreachable() call or change the break to a return, but not both. The commit message tries to justify doing both by saying that the pg_unreachable() call doesn't have much value, but there's not much value in omitting pg_unreachable() from unreachable places, either, so I'm not convinced. I agree with Tristan's analysis of 0002. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
On 2024-04-03 We 15:12, Daniel Gustafsson wrote: The fact that very few animals run the ssl tests is a pet peeve of mine, it would be nice if we could get broader coverage there. Well, the only reason for that is that the SSL tests need to be listed in PG_TEST_EXTRA, and the only reason for that is that there's a possible hazard on multi-user servers. But I bet precious few buildfarm animals run in such an environment. Mine don't - I'm the only user. Maybe we could send out an email to the buildfarm-owners list asking people to consider allowing the ssl tests. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: Security lessons from liblzma
> On 3 Apr 2024, at 20:09, Peter Eisentraut wrote: > > On 30.03.24 00:14, Daniel Gustafsson wrote: >> One take-away for me is how important it is to ship recipes for regenerating >> any testdata which is included in generated/compiled/binary format. Kind of >> how we in our tree ship the config for test TLS certificates and keys which >> can >> be manually inspected, and used to rebuild the testdata (although the risk >> for >> injections in this particular case seems low). Bad things can still be >> injected, but formats which allow manual review at least goes some way >> towards >> lowering risk. > > I actually find the situation with the test TLS files quite unsatisfactory. > While we have build rules, the output files are not reproducible, both > because of some inherent randomness in the generation, and because different > OpenSSL versions produce different details. This testdata is by nature not reproducible, and twisting arms to make it so will only result in testing against synthetic data which risk hiding bugs IMO. > So you just have to "trust" that what's there now makes sense. Not entirely, you can review the input files which are used to generate the test data and verify that those make sense.. > Of course, you can use openssl tools to unpack these files, but that is > difficult and unreliable unless you know exactly what you are looking for. ..and check like you mention above, but it's as you say not entirely trivial. It would be nice to improve this but I'm not sure how. Document how to inspect the data in src/test/ssl/README perhaps? > It would be better if we created the required test files as part of the test > run. (Why not? Too slow?) The make sslfiles step requires OpenSSL 1.1.1, which is higher than what we require to be installed to build postgres. The higher-than-base requirement is due to it building test data only used when running 1.1.1 or higher, so technically it could be made to work if anyone is interesting in investing time in 1.0.2. Time is one aspect, on my crusty old laptop it takes ~2 seconds to regenerate the files. That in itself isn't that much, but we've rejected test-time additions far less than that. We could however make CI and the Buildfarm run the regeneration and leave it up to each developer to decide locally? Or remove them and regenerate on the first SSL test run and then use the cached ones after that? On top of time I have a feeling the regeneration won't run on Windows. When it's converted to use Meson then that might be fixed though. -- Daniel Gustafsson
Re: Use streaming read API in ANALYZE
Hi, Thank you for looking into this! On Wed, 3 Apr 2024 at 20:17, Heikki Linnakangas wrote: > > On 03/04/2024 13:31, Nazir Bilal Yavuz wrote: > > Streaming API has been committed but the committed version has a minor > > change, the read_stream_begin_relation function takes Relation instead > > of BufferManagerRelation now. So, here is a v5 which addresses this > > change. > > I'm getting a repeatable segfault / assertion failure with this: > > postgres=# CREATE TABLE tengiga (i int, filler text) with (fillfactor=10); > CREATE TABLE > postgres=# insert into tengiga select g, repeat('x', 900) from > generate_series(1, 140) g; > INSERT 0 140 > postgres=# set default_statistics_target = 10; ANALYZE tengiga; > SET > ANALYZE > postgres=# set default_statistics_target = 100; ANALYZE tengiga; > SET > ANALYZE > postgres=# set default_statistics_target =1000; ANALYZE tengiga; > SET > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > > TRAP: failed Assert("BufferIsValid(hscan->rs_cbuf)"), File: > "heapam_handler.c", Line: 1079, PID: 262232 > postgres: heikki postgres [local] > ANALYZE(ExceptionalCondition+0xa8)[0x56488a0de9d8] > postgres: heikki postgres [local] > ANALYZE(heapam_scan_analyze_next_block+0x63)[0x5648899ece34] > postgres: heikki postgres [local] ANALYZE(+0x2d3f34)[0x564889b6af34] > postgres: heikki postgres [local] ANALYZE(+0x2d2a3a)[0x564889b69a3a] > postgres: heikki postgres [local] ANALYZE(analyze_rel+0x33e)[0x564889b68fa9] > postgres: heikki postgres [local] ANALYZE(vacuum+0x4b3)[0x564889c2dcc0] > postgres: heikki postgres [local] ANALYZE(ExecVacuum+0xd6f)[0x564889c2d7fe] > postgres: heikki postgres [local] > ANALYZE(standard_ProcessUtility+0x901)[0x564889f0b8b9] > postgres: heikki postgres [local] > ANALYZE(ProcessUtility+0x136)[0x564889f0afb1] > postgres: heikki postgres [local] ANALYZE(+0x6728c8)[0x564889f098c8] > postgres: heikki postgres [local] ANALYZE(+0x672b3b)[0x564889f09b3b] > postgres: heikki postgres [local] ANALYZE(PortalRun+0x320)[0x564889f09015] > postgres: heikki postgres [local] ANALYZE(+0x66b2c6)[0x564889f022c6] > postgres: heikki postgres [local] > ANALYZE(PostgresMain+0x80c)[0x564889f06fd7] > postgres: heikki postgres [local] ANALYZE(+0x667876)[0x564889efe876] > postgres: heikki postgres [local] > ANALYZE(postmaster_child_launch+0xe6)[0x564889e1f4b3] > postgres: heikki postgres [local] ANALYZE(+0x58e68e)[0x564889e2568e] > postgres: heikki postgres [local] ANALYZE(+0x58b7f0)[0x564889e227f0] > postgres: heikki postgres [local] > ANALYZE(PostmasterMain+0x152b)[0x564889e2214d] > postgres: heikki postgres [local] ANALYZE(+0xb4)[0x564889cdb4b4] > /lib/x86_64-linux-gnu/libc.so.6(+0x2724a)[0x7f7d83b6724a] > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f7d83b67305] > postgres: heikki postgres [local] ANALYZE(_start+0x21)[0x564889971a61] > 2024-04-03 20:15:49.157 EEST [262101] LOG: server process (PID 262232) > was terminated by signal 6: Aborted I realized the same error while working on Jakub's benchmarking results. Cause: I was using the nblocks variable to check how many blocks will be returned from the streaming API. But I realized that sometimes the number returned from BlockSampler_Init() is not equal to the number of blocks that BlockSampler_Next() will return as BlockSampling algorithm decides how many blocks to return on the fly by using some random seeds. There are a couple of solutions I thought of: 1- Use BlockSampler_HasMore() instead of nblocks in the main loop in the acquire_sample_rows(): Streaming API uses this function to prefetch block numbers. BlockSampler_HasMore() will reach to the end first as it is used while prefetching, so it will start to return false while there are still buffers to return from the streaming API. That will cause some buffers at the end to not be processed. 2- Expose something (function, variable etc.) from the streaming API to understand if the read is finished and there is no buffer to return: I think this works but I am not sure if the streaming API allows something like that. 3- Check every buffer returned from the streaming API, if it is invalid stop the main loop in the acquire_sample_rows(): This solves the problem but there will be two if checks for each buffer returned, - in heapam_scan_analyze_next_block() to check if the returned buffer is invalid - to break main loop in acquire_sample_rows() if heapam_scan_analyze_next_block() returns false One of the if cases can be bypassed by moving heapam_scan_analyze_next_block()'s code to the main loop in the acquire_sample_rows(). I implemented the third solution, here is v6. -- Regards, Nazir Bilal Yavuz Microsoft From 8d396a42186325f920d5a05e7092d8e1b66f3cdf Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Wed, 3 Apr 2024 15:14:15 +0300 Subject: [PATCH v6] Use streaming read API in ANALYZE ANALYZE command gets random tuples using
Re: On disable_cost
On Tue, Apr 2, 2024 at 12:58 PM Tom Lane wrote: > Not really. But if you had, say, a join of a promoted path to a > disabled path, that would be treated as on-par with a join of two > regular paths, which seems like it'd lead to odd choices. Maybe > it'd be fine, but my gut says it'd likely not act nicely. As you > say, it's a lot easier to believe that only-promoted or only-disabled > situations would behave sanely. Makes sense. I wanted to further explore the idea of just not generating plans of types that are currently disabled. I looked into doing this for enable_indexscan and enable_indexonlyscan. As a first step, I investigated how those settings work now, and was horrified. I don't know whether I just wasn't paying attention back when the original index-only scan work was done -- I remember discussing enable_indexonlyscan with you at the time -- or whether it got changed subsequently. Anyway, the current behavior is: [A] enable_indexscan=false adds disable_cost to the cost of every Index Scan path *and also* every Index-Only Scan path. So disabling index-scans also in effect discourages the use of index-only scans, which would make sense if we didn't have a separate setting called enable_indexonlyscan, but we do. Given that, I think this is completely and utterly wrong. [b] enable_indexonlyscan=false causes index-only scan paths not to be generated at all, but instead, we generate index-scan paths to do the same thing that we would not have generated otherwise. This is weird because it means that disabling one plan type causes us to consider additional plans of another type, which seems like a thing that a user might not expect. It's more defensible than [A], though, because you could argue that we only omit the index scan path as an optimization, on the theory that it will always lose to the index-only scan path, and thus if the index-only scan path is not generated, there's a point to generating the index scan path after all, so we should. However, it seems unlikely to me that someone reading the one line of documentation that we have about this parameter would be able to guess that it works this way. Here's an example of how the current system behaves: robert.haas=# explain select count(*) from pgbench_accounts; QUERY PLAN - Aggregate (cost=2854.29..2854.30 rows=1 width=8) -> Index Only Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.29..2604.29 rows=10 width=0) (2 rows) robert.haas=# set enable_indexscan=false; SET robert.haas=# explain select count(*) from pgbench_accounts; QUERY PLAN -- Aggregate (cost=2890.00..2890.01 rows=1 width=8) -> Seq Scan on pgbench_accounts (cost=0.00..2640.00 rows=10 width=0) (2 rows) robert.haas=# set enable_seqscan=false; SET robert.haas=# set enable_bitmapscan=false; SET robert.haas=# explain select count(*) from pgbench_accounts; QUERY PLAN -- Aggregate (cost=1002854.29..1002854.30 rows=1 width=8) -> Index Only Scan using pgbench_accounts_pkey on pgbench_accounts (cost=100.29..1002604.29 rows=10 width=0) (2 rows) robert.haas=# set enable_indexonlyscan=false; SET robert.haas=# explain select count(*) from pgbench_accounts; QUERY PLAN --- Aggregate (cost=1002890.00..1002890.01 rows=1 width=8) -> Seq Scan on pgbench_accounts (cost=100.00..1002640.00 rows=10 width=0) (2 rows) The first time we run the query, it picks an index-only scan because it's the cheapest. When index scans are disabled, the query now picks a sequential scan, even though it wasn't use an index-only scan, because the index scan that it was using is perceived to have become very expensive. When we then shut off sequential scans and bitmap-only scans, it switches back to an index-only scan, because setting enable_indexscan=false didn't completely disable index-only scans, but just made them expensive. But now everything looks expensive, so we go back to the same plan we had initially, except with the cost increased by a bazillion. Finally, when we disable index-only scans, that removes that plan from the pool, so now we pick the second-cheapest plan overall, which in this case is a sequential scan. So just to see what would happen, I wrote a patch to make enable_indexscan and enable_indexonlyscan do exactly what they say on the tin: when you set one of them to false, paths of that type are not generated,
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
> On 3 Apr 2024, at 17:29, Tom Lane wrote: > > Jacob Champion writes: >> As far as I can tell, no versions of LibreSSL so far provide >> X509_get_signature_info(), so this patch is probably a bit too >> aggressive. > > Another problem with cutting support is how many buildfarm members > will we lose. I scraped recent configure logs and got the attached > results. I count 3 machines running 1.0.1, Support for 1.0.1 was removed with 8e278b657664 in July 2023 so those are not building with OpenSSL enabled already. > 18 running some flavor of 1.0.2, massasauga and snakefly run the ssl_passphrase_callback-check test but none of these run the ssl-check tests AFAICT, so we have very low coverage as is. The fact that very few animals run the ssl tests is a pet peeve of mine, it would be nice if we could get broader coverage there. Worth noting is that the OpenSSL check in configure.ac only reports what the version of the OpenSSL binary in $PATH is, not which version of the library that we build against (using --with-libs/--with-includes etc). -- Daniel Gustafsson
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
> On 3 Apr 2024, at 19:38, Tom Lane wrote: > > Jacob Champion writes: >> The RHEL7-alikes are the biggest set, but that's already discussed >> above. Looks like SUSE 12 goes EOL later this year (October 2024), and >> it ships OpenSSL 1.1.1 as an option. Already-dead distros are Ubuntu >> 16.04 (April 2021), Photon 2 (January 2023), and Photon 3 (March >> 2024). That leaves AL2, OpenIndiana Hipster, and Solaris 11.4, all of >> which appear to have newer versions of OpenSSL shipped and selectable. > > The discussion we had last year concluded that we were OK with > dropping 1.0.1 support when RHEL6 goes out of extended support > (June 2024 per this thread, I didn't check it). Seems like we > should have the same policy for RHEL7. Also, calling Photon 3 > dead because it went EOL three days ago seems over-hasty. > > Bottom line for me is that pulling 1.0.1 support now is OK, > but I think pulling 1.0.2 is premature. Is Red Hat building and and shipping v17 packages for RHEL7 ELS customers? If not then it seems mostly academical to tie our dependencies to RHEL ELS unless I'm missing something. -- Daniel Gustafsson
Re: [HACKERS] make async slave to wait for lsn to be replayed
Hi, Alexander! On Wed, 3 Apr 2024 at 22:18, Alexander Korotkov wrote: > On Wed, Apr 3, 2024 at 7:55 PM Alvaro Herrera > wrote: > > > > On 2024-Apr-03, Alexander Korotkov wrote: > > > > > Regarding the shmem data structure for LSN waiters. I didn't pick > > > LWLock or ConditionVariable, because I needed the ability to wake up > > > only those waiters whose LSN is already replayed. In my experience > > > waking up a process is way slower than scanning a short flat array. > > > > I agree, but I think that's unrelated to what I was saying, which is > > just the patch I attach here. > > Oh, sorry for the confusion. I'd re-read your message. Indeed you > meant this very clearly! > > I'm good with the patch. Attached revision contains a bit of a commit > message. > > > > However, I agree that when the number of waiters is very high and flat > > > array may become a problem. It seems that the pairing heap is not > > > hard to use for shmem structures. The only memory allocation call in > > > paritingheap.c is in pairingheap_allocate(). So, it's only needed to > > > be able to initialize the pairing heap in-place, and it will be fine > > > for shmem. > > > > Ok. > > > > With the code as it stands today, everything in WaitLSNState apart from > > the pairing heap is accessed without any locking. I think this is at > > least partly OK because each backend only accesses its own entry; but it > > deserves a comment. Or maybe something more, because WaitLSNSetLatches > > does modify the entry for other backends. (Admittedly, this could only > > happens for backends that are already sleeping, and it only happens > > with the lock acquired, so it's probably okay. But clearly it deserves > > a comment.) > > Please, check 0002 patch attached. I found it easier to move two > assignments we previously moved out of lock, into the lock; then claim > WaitLSNState.procInfos is also protected by WaitLSNLock. > Could you re-attach 0002. Seems it failed to attach to the previous message. Regards, Pavel
Re: Use streaming read API in ANALYZE
Hi Jakub, Thank you for looking into this and doing a performance analysis. On Wed, 3 Apr 2024 at 11:42, Jakub Wartak wrote: > > On Tue, Apr 2, 2024 at 9:24 AM Nazir Bilal Yavuz wrote: > [..] > > v4 is rebased on top of v14 streaming read API changes. > > Hi Nazir, so with streaming API committed, I gave a try to this patch. > With autovacuum=off and 30GB table on NVMe (with standard readahead of > 256kb and ext4, Debian 12, kernel 6.1.0, shared_buffers = 128MB > default) created using: create table t as select repeat('a', 100) || i > || repeat('b', 500) as filler from generate_series(1, 4500) as i; > > on master, effect of mainteance_io_concurency [default 10] is like > that (when resetting the fs cache after each ANALYZE): > > m_io_c = 0: > Time: 3137.914 ms (00:03.138) > Time: 3094.540 ms (00:03.095) > Time: 3452.513 ms (00:03.453) > > m_io_c = 1: > Time: 2972.751 ms (00:02.973) > Time: 2939.551 ms (00:02.940) > Time: 2904.428 ms (00:02.904) > > m_io_c = 2: > Time: 1580.260 ms (00:01.580) > Time: 1572.132 ms (00:01.572) > Time: 1558.334 ms (00:01.558) > > m_io_c = 4: > Time: 938.304 ms > Time: 931.772 ms > Time: 920.044 ms > > m_io_c = 8: > Time: 666.025 ms > Time: 660.241 ms > Time: 648.848 ms > > m_io_c = 16: > Time: 542.450 ms > Time: 561.155 ms > Time: 539.683 ms > > m_io_c = 32: > Time: 538.487 ms > Time: 541.705 ms > Time: 538.101 ms > > with patch applied: > > m_io_c = 0: > Time: 3106.469 ms (00:03.106) > Time: 3140.343 ms (00:03.140) > Time: 3044.133 ms (00:03.044) > > m_io_c = 1: > Time: 2959.817 ms (00:02.960) > Time: 2920.265 ms (00:02.920) > Time: 2911.745 ms (00:02.912) > > m_io_c = 2: > Time: 1581.912 ms (00:01.582) > Time: 1561.444 ms (00:01.561) > Time: 1558.251 ms (00:01.558) > > m_io_c = 4: > Time: 908.116 ms > Time: 901.245 ms > Time: 901.071 ms > > m_io_c = 8: > Time: 619.870 ms > Time: 620.327 ms > Time: 614.266 ms > > m_io_c = 16: > Time: 529.885 ms > Time: 526.958 ms > Time: 528.474 ms > > m_io_c = 32: > Time: 521.185 ms > Time: 520.713 ms > Time: 517.729 ms > > No difference to me, which seems to be good. I've double checked and > patch used the new way > > acquire_sample_rows -> heapam_scan_analyze_next_block -> > ReadBufferExtended -> ReadBuffer_common (inlined) -> WaitReadBuffers > -> mdreadv -> FileReadV -> pg_preadv (inlined) > acquire_sample_rows -> heapam_scan_analyze_next_block -> > ReadBufferExtended -> ReadBuffer_common (inlined) -> StartReadBuffer > -> ... > > I gave also io_combine_limit to 32 (max, 256kB) a try and got those > slightly better results: > > [..] > m_io_c = 16: > Time: 494.599 ms > Time: 496.345 ms > Time: 973.500 ms > > m_io_c = 32: > Time: 461.031 ms > Time: 449.037 ms > Time: 443.375 ms > > and that (last one) apparently was able to push it to ~50-60k still > random IOPS range, the rareq-sz was still ~8 (9.9) kB as analyze was > still reading random , so I assume no merging was done: > > Devicer/s rMB/s rrqm/s %rrqm r_await rareq-sz > w/s wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s > drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util > nvme0n1 61212.00591.82 0.00 0.000.10 9.90 > 2.00 0.02 0.00 0.000.0012.000.00 0.00 > 0.00 0.000.00 0.000.000.006.28 85.20 > > So in short it looks good to me. My results are similar to yours, also I realized a bug while working on your benchmarking cases. I will share the cause and the fix soon. -- Regards, Nazir Bilal Yavuz Microsoft
Re: Combine Prune and Freeze records emitted by vacuum
On Wed, Apr 3, 2024 at 1:04 PM Melanie Plageman wrote: > Thanks! And thanks for updating the commitfest entry! Nice work! -- Peter Geoghegan
Re: Parent/child context relation in pg_get_backend_memory_contexts()
Hi, On 2024-02-14 16:23:38 +0900, Michael Paquier wrote: > It is possible to retrieve this information some WITH RECURSIVE as well, as > mentioned upthread. Perhaps we could consider documenting these tricks? I think it's sufficiently hard that it's not a reasonable way to do this. Greetings, Andres Freund
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
On Wed, Apr 3, 2024 at 11:13 AM Tom Lane wrote: > wikipedia says that RHEL7 ends ELS as of June 2026 [1]. I may have misunderstood something in here then: https://www.redhat.com/en/blog/announcing-4-years-extended-life-cycle-support-els-red-hat-enterprise-linux-7 > ELS for RHEL 7 is now available for 4 years, starting on July 1, 2024. Am I missing something? --Jacob
Re: [HACKERS] make async slave to wait for lsn to be replayed
On Wed, Apr 3, 2024 at 7:55 PM Alvaro Herrera wrote: > > On 2024-Apr-03, Alexander Korotkov wrote: > > > Regarding the shmem data structure for LSN waiters. I didn't pick > > LWLock or ConditionVariable, because I needed the ability to wake up > > only those waiters whose LSN is already replayed. In my experience > > waking up a process is way slower than scanning a short flat array. > > I agree, but I think that's unrelated to what I was saying, which is > just the patch I attach here. Oh, sorry for the confusion. I'd re-read your message. Indeed you meant this very clearly! I'm good with the patch. Attached revision contains a bit of a commit message. > > However, I agree that when the number of waiters is very high and flat > > array may become a problem. It seems that the pairing heap is not > > hard to use for shmem structures. The only memory allocation call in > > paritingheap.c is in pairingheap_allocate(). So, it's only needed to > > be able to initialize the pairing heap in-place, and it will be fine > > for shmem. > > Ok. > > With the code as it stands today, everything in WaitLSNState apart from > the pairing heap is accessed without any locking. I think this is at > least partly OK because each backend only accesses its own entry; but it > deserves a comment. Or maybe something more, because WaitLSNSetLatches > does modify the entry for other backends. (Admittedly, this could only > happens for backends that are already sleeping, and it only happens > with the lock acquired, so it's probably okay. But clearly it deserves > a comment.) Please, check 0002 patch attached. I found it easier to move two assignments we previously moved out of lock, into the lock; then claim WaitLSNState.procInfos is also protected by WaitLSNLock. > Don't we need to WaitLSNCleanup() during error recovery or something? Yes, there is WaitLSNCleanup(). It's currently only called from one place, given that waiting for LSN can't be invoked from background workers or inside the transaction. -- Regards, Alexander Korotkov
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
Jacob Champion writes: > On Wed, Apr 3, 2024 at 10:38 AM Tom Lane wrote: >> Bottom line for me is that pulling 1.0.1 support now is OK, >> but I think pulling 1.0.2 is premature. > Okay, but IIUC, waiting for it to drop out of extended support means > we deal with it for four more years. That seems excessive. wikipedia says that RHEL7 ends ELS as of June 2026 [1]. regards, tom lane [1] https://en.wikipedia.org/wiki/Red_Hat_Enterprise_Linux#Product_life_cycle
Re: Statistics Import and Export
> > > As far as that goes, it shouldn't be that hard to deal with, at least > not for "soft" errors which hopefully cover most input-function > failures these days. You should be invoking array_in via > InputFunctionCallSafe and passing a suitably-set-up ErrorSaveContext. > (Look at pg_input_error_info() for useful precedent.) > Ah, my understanding may be out of date. I was under the impression that that mechanism relied on the the cooperation of the per-element input function, so even if we got all the builtin datatypes to play nice with *Safe(), we were always going to be at risk with a user-defined input function. > There might be something to be said for handling all the error > cases via an ErrorSaveContext and use of ereturn() instead of > ereport(). Not sure if it's worth the trouble or not. > It would help us tailor the user experience. Right now we have several endgames. To recap: 1. NULL input => Return NULL. (because strict). 2. Actual error (permissions, cache lookup not found, etc) => Raise ERROR (thus ruining binary upgrade) 3. Call values are so bad (examples: attname not found, required stat missing) that nothing can recover => WARN, return FALSE. 4. At least one stakind-stat is wonky (impossible for datatype, missing stat pair, wrong type on input parameter), but that's the worst of it => 1 to N WARNs, write stats that do make sense, return TRUE. 5. Hunky-dory. => No warns. Write all stats. return TRUE. Which of those seem like good ereturn candidates to you?
Re: Security lessons from liblzma
On 30.03.24 00:14, Daniel Gustafsson wrote: One take-away for me is how important it is to ship recipes for regenerating any testdata which is included in generated/compiled/binary format. Kind of how we in our tree ship the config for test TLS certificates and keys which can be manually inspected, and used to rebuild the testdata (although the risk for injections in this particular case seems low). Bad things can still be injected, but formats which allow manual review at least goes some way towards lowering risk. I actually find the situation with the test TLS files quite unsatisfactory. While we have build rules, the output files are not reproducible, both because of some inherent randomness in the generation, and because different OpenSSL versions produce different details. So you just have to "trust" that what's there now makes sense. Of course, you can use openssl tools to unpack these files, but that is difficult and unreliable unless you know exactly what you are looking for. Also, for example, do we even know whether all the files that are there now are even used by any tests? A few years ago, some guy on the internet sent in a purported update to one of the files [0], which I ended up committing, but I remember that that situation gave me quite some pause at the time. It would be better if we created the required test files as part of the test run. (Why not? Too slow?) Alternatively, I have been thinking that maybe we could make the output more reproducible by messing with whatever random seed OpenSSL uses. Or maybe use a Python library to create the files. Some things to think about. [0]: https://www.postgresql.org/message-id/FEF81714-D479-4512-839B-C769D2605F8A%40yesql.se
Re: LogwrtResult contended spinlock
On 2024-Apr-03, Bharath Rupireddy wrote: > On Wed, Apr 3, 2024 at 4:49 PM Alvaro Herrera wrote: > > So what I do in the attached 0001 is stop using the XLogwrtResult struct > > in XLogCtl and replace it with separate Write and Flush values, and add > > the macro XLogUpdateLocalLogwrtResult() that copies the values of Write > > and Flush from the shared XLogCtl to the local variable given as macro > > argument. (I also added our idiomatic do {} while(0) to the macro > > definition, for safety). The new members are XLogCtl->logWriteResult > > and XLogCtl->logFlushResult and as of 0001 are just XLogRecPtr, so > > essentially identical semantics as the previous code. No atomic access > > yet! > > +1. > Also, I'm fine with renaming XLogUpdateLocalLogwrtResult() to > RefreshXLogWriteResult(). Okay, I have pushed 0001 with the name change, will see about getting the others in tomorrow. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Re: Security lessons from liblzma - libsystemd
Hi, As most will know by now, the way xz debacle was able to make sshd vulnerable was through a dependency from sshd to libsystemd and then from libsystemd to liblzma. One lesson from this is that unnecessary dependencies can still increase risk. It's worth noting that we have an optional dependency on libsystemd as well. Openssh has now integrated [1] a patch to remove the dependency on libsystemd for triggering service manager readyness notifications, by inlining the necessary function. That's not hard, the protocol is pretty simple. I suspect we should do the same. We're not even close to being a target as attractive as openssh, but still, it seems unnecessary. Intro into the protocol is at [2], with real content and outline of the relevant code at [3]. An argument could be made to instead just remove support, but I think it's quite valuable to have intra service dependencies that can rely on the server actually having started up. Greetings, Andres Freund [1] https://bugzilla.mindrot.org/show_bug.cgi?id=2641 [2] https://www.freedesktop.org/software/systemd/man/devel/systemd.html#Readiness%20Protocol [3] https://www.freedesktop.org/software/systemd/man/devel/sd_notify.html#Notes
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
On Wed, Apr 3, 2024 at 10:38 AM Tom Lane wrote: > Also, calling Photon 3 > dead because it went EOL three days ago seems over-hasty. Well, March 1, but either way I thought "dead" for the purposes of this thread meant "you can't build the very latest version of Postgres on it", not "we've forgotten it exists". Back branches will continue to need support and testing. > Bottom line for me is that pulling 1.0.1 support now is OK, > but I think pulling 1.0.2 is premature. Okay, but IIUC, waiting for it to drop out of extended support means we deal with it for four more years. That seems excessive. --Jacob
Re: Can't compile PG 17 (master) from git under Msys2 autoconf
Hi Regina, On 2024-Mar-27, Regina Obe wrote: > The error is > > rm -f '../../src/include/storage/lwlocknames.h' > cp -pR ../../backend/storage/lmgr/lwlocknames.h > '../../src/include/storage/lwlocknames.h' > cp: cannot stat '../../backend/storage/lmgr/lwlocknames.h': No such file or > directory > make[1]: *** [Makefile:143: ../../src/include/storage/lwlocknames.h] Error 1 > make[1]: Leaving directory '/projects/postgresql/postgresql-git/src/backend' > make: *** [../../src/Makefile.global:382: submake-generated-headers] Error 2 Hmm, I changed these rules again in commit da952b415f44, maybe your problem is with that one? I wonder if changing the references to ../include/storage/lwlocklist.h to $(topdir)/src/include/storage/lwlocklist.h (and similar things in src/backend/storage/lmgr/Makefile) would fix it. Do you run configure in the source directory or a separate build one? -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "If it is not right, do not do it. If it is not true, do not say it." (Marcus Aurelius, Meditations)
Re: Popcount optimization using AVX512
I committed v23-0001. Here is a rebased version of the remaining patches. I intend to test the masking idea from Ants next. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 295b03530de5f42fe876b4489191da2f8dc83194 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Wed, 27 Mar 2024 16:39:24 -0500 Subject: [PATCH v24 1/2] AVX512 popcount support --- config/c-compiler.m4 | 58 ++ configure| 252 +++ configure.ac | 51 ++ meson.build | 87 + src/Makefile.global.in | 5 + src/include/pg_config.h.in | 12 ++ src/include/port/pg_bitutils.h | 15 ++ src/makefiles/meson.build| 4 +- src/port/Makefile| 11 ++ src/port/meson.build | 6 +- src/port/pg_bitutils.c | 29 ++- src/port/pg_popcount_avx512.c| 49 ++ src/port/pg_popcount_avx512_choose.c | 71 src/test/regress/expected/bit.out| 24 +++ src/test/regress/sql/bit.sql | 4 + 15 files changed, 673 insertions(+), 5 deletions(-) create mode 100644 src/port/pg_popcount_avx512.c create mode 100644 src/port/pg_popcount_avx512_choose.c diff --git a/config/c-compiler.m4 b/config/c-compiler.m4 index 3268a780bb..5fb60775ca 100644 --- a/config/c-compiler.m4 +++ b/config/c-compiler.m4 @@ -694,3 +694,61 @@ if test x"$Ac_cachevar" = x"yes"; then fi undefine([Ac_cachevar])dnl ])# PGAC_LOONGARCH_CRC32C_INTRINSICS + +# PGAC_XSAVE_INTRINSICS +# - +# Check if the compiler supports the XSAVE instructions using the _xgetbv +# intrinsic function. +# +# An optional compiler flag can be passed as argument (e.g., -mxsave). If the +# intrinsic is supported, sets pgac_xsave_intrinsics and CFLAGS_XSAVE. +AC_DEFUN([PGAC_XSAVE_INTRINSICS], +[define([Ac_cachevar], [AS_TR_SH([pgac_cv_xsave_intrinsics_$1])])dnl +AC_CACHE_CHECK([for _xgetbv with CFLAGS=$1], [Ac_cachevar], +[pgac_save_CFLAGS=$CFLAGS +CFLAGS="$pgac_save_CFLAGS $1" +AC_LINK_IFELSE([AC_LANG_PROGRAM([#include ], + [return _xgetbv(0) & 0xe0;])], + [Ac_cachevar=yes], + [Ac_cachevar=no]) +CFLAGS="$pgac_save_CFLAGS"]) +if test x"$Ac_cachevar" = x"yes"; then + CFLAGS_XSAVE="$1" + pgac_xsave_intrinsics=yes +fi +undefine([Ac_cachevar])dnl +])# PGAC_XSAVE_INTRINSICS + +# PGAC_AVX512_POPCNT_INTRINSICS +# - +# Check if the compiler supports the AVX512 POPCNT instructions using the +# _mm512_setzero_si512, _mm512_loadu_si512, _mm512_popcnt_epi64, +# _mm512_add_epi64, and _mm512_reduce_add_epi64 intrinsic functions. +# +# An optional compiler flag can be passed as argument +# (e.g., -mavx512vpopcntdq). If the intrinsics are supported, sets +# pgac_avx512_popcnt_intrinsics and CFLAGS_POPCNT. +AC_DEFUN([PGAC_AVX512_POPCNT_INTRINSICS], +[define([Ac_cachevar], [AS_TR_SH([pgac_cv_avx512_popcnt_intrinsics_$1])])dnl +AC_CACHE_CHECK([for _mm512_popcnt_epi64 with CFLAGS=$1], [Ac_cachevar], +[pgac_save_CFLAGS=$CFLAGS +CFLAGS="$pgac_save_CFLAGS $1" +AC_LINK_IFELSE([AC_LANG_PROGRAM([#include ], + [const char buf@<:@sizeof(__m512i)@:>@; + PG_INT64_TYPE popcnt = 0; + __m512i accum = _mm512_setzero_si512(); + const __m512i val = _mm512_loadu_si512((const __m512i *) buf); + const __m512i cnt = _mm512_popcnt_epi64(val); + accum = _mm512_add_epi64(accum, cnt); + popcnt = _mm512_reduce_add_epi64(accum); + /* return computed value, to prevent the above being optimized away */ + return popcnt == 0;])], + [Ac_cachevar=yes], + [Ac_cachevar=no]) +CFLAGS="$pgac_save_CFLAGS"]) +if test x"$Ac_cachevar" = x"yes"; then + CFLAGS_POPCNT="$1" + pgac_avx512_popcnt_intrinsics=yes +fi +undefine([Ac_cachevar])dnl +])# PGAC_AVX512_POPCNT_INTRINSICS diff --git a/configure b/configure index 36feeafbb2..b48ed7f271 100755 --- a/configure +++ b/configure @@ -647,6 +647,9 @@ MSGFMT_FLAGS MSGFMT PG_CRC32C_OBJS CFLAGS_CRC +PG_POPCNT_OBJS +CFLAGS_POPCNT +CFLAGS_XSAVE LIBOBJS OPENSSL ZSTD @@ -17404,6 +17407,40 @@ $as_echo "#define HAVE__GET_CPUID 1" >>confdefs.h fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for __get_cpuid_count" >&5 +$as_echo_n "checking for __get_cpuid_count... " >&6; } +if ${pgac_cv__get_cpuid_count+:} false; then : + $as_echo_n "(cached) " >&6 +else + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +#include +int +main () +{ +unsigned int exx[4] = {0, 0, 0, 0}; + __get_cpuid_count(7, 0, [0], [1], [2], [3]); + + ; + return 0; +} +_ACEOF +if ac_fn_c_try_link "$LINENO"; then : + pgac_cv__get_cpuid_count="yes" +else + pgac_cv__get_cpuid_count="no" +fi +rm -f core conftest.err conftest.$ac_objext \ +conftest$ac_exeext conftest.$ac_ext +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv__get_cpuid_count" >&5 +$as_echo "$pgac_cv__get_cpuid_count" >&6; } +if test x"$pgac_cv__get_cpuid_count" = x"yes"; then + +$as_echo "#define
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
Jacob Champion writes: > The RHEL7-alikes are the biggest set, but that's already discussed > above. Looks like SUSE 12 goes EOL later this year (October 2024), and > it ships OpenSSL 1.1.1 as an option. Already-dead distros are Ubuntu > 16.04 (April 2021), Photon 2 (January 2023), and Photon 3 (March > 2024). That leaves AL2, OpenIndiana Hipster, and Solaris 11.4, all of > which appear to have newer versions of OpenSSL shipped and selectable. The discussion we had last year concluded that we were OK with dropping 1.0.1 support when RHEL6 goes out of extended support (June 2024 per this thread, I didn't check it). Seems like we should have the same policy for RHEL7. Also, calling Photon 3 dead because it went EOL three days ago seems over-hasty. Bottom line for me is that pulling 1.0.1 support now is OK, but I think pulling 1.0.2 is premature. regards, tom lane
Re: Psql meta-command conninfo+
Building the docs fail for v26. The error is: ref/psql-ref.sgml:1042: element member: validity error : Element term is not declared in member list of possible children ^ I am able to build up to v24 before the was replaced with I tested building with a modified structure as below; the is a that has a within it. Each field is a simple list member and the the name of the fields should be . \conninfo[+] Outputs a string displaying information about the current database connection. When + is appended, more details about the connection are displayed in table format: Database:The database name of the connection. Authenticated User:The authenticated database user of the connection. Current User:The user name of the current execution context; see the current_user() function in for more details. Regards, Sami
Re: Improve eviction algorithm in ReorderBuffer
On Wed, 2024-04-03 at 01:45 -0700, Jeff Davis wrote: > I suggest that you add a "heap_index" field to ReorderBufferTXN that > would point to the index into the heap's array (the same as > bh_nodeidx_entry.index in your patch). Each time an element moves > within the heap array, just follow the pointer to the > ReorderBufferTXN > object and update the heap_index -- no hash lookup required. It looks like my email was slightly too late, as the work was already committed. My suggestion is not required for 17, and so it's fine if this waits until the next CF. If it turns out to be a win we can consider backporting to 17 just to keep the code consistent, otherwise it can go in 18. Regards, Jeff Davis
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
On Wed, Apr 3, 2024 at 8:29 AM Tom Lane wrote: > I count 3 machines running 1.0.1, 18 running some flavor > of 1.0.2, and 7 running various LibreSSL versions. I don't know all the tradeoffs with buildfarm wrangling, but IMO all those 1.0.2 installations are the most problematic, so I dug in a bit: arowana CentOS 7 batfish Ubuntu 16.04.3 boa RHEL 7 buri CentOS 7 butterflyfish Photon 2.0 clam RHEL 7.1 cuon Ubuntu 16.04 dhole CentOS 7.4 hake OpenIndiana hipster mantid CentOS 7.9 margay Solaris 11.4.42 massasauga Amazon Linux 2 myna Photon 3.0 parula Amazon Linux 2 rhinoceros CentOS 7.1 shelduck SUSE 12SP5 siskin RHEL 7.9 snakefly Amazon Linux 2 The RHEL7-alikes are the biggest set, but that's already discussed above. Looks like SUSE 12 goes EOL later this year (October 2024), and it ships OpenSSL 1.1.1 as an option. Already-dead distros are Ubuntu 16.04 (April 2021), Photon 2 (January 2023), and Photon 3 (March 2024). That leaves AL2, OpenIndiana Hipster, and Solaris 11.4, all of which appear to have newer versions of OpenSSL shipped and selectable. --Jacob
Re: Statistics Import and Export
Corey Huinker writes: > - functions strive to not ERROR unless absolutely necessary. The biggest > exposure is the call to array_in(). As far as that goes, it shouldn't be that hard to deal with, at least not for "soft" errors which hopefully cover most input-function failures these days. You should be invoking array_in via InputFunctionCallSafe and passing a suitably-set-up ErrorSaveContext. (Look at pg_input_error_info() for useful precedent.) There might be something to be said for handling all the error cases via an ErrorSaveContext and use of ereturn() instead of ereport(). Not sure if it's worth the trouble or not. regards, tom lane
Re: Use streaming read API in ANALYZE
On 03/04/2024 13:31, Nazir Bilal Yavuz wrote: Streaming API has been committed but the committed version has a minor change, the read_stream_begin_relation function takes Relation instead of BufferManagerRelation now. So, here is a v5 which addresses this change. I'm getting a repeatable segfault / assertion failure with this: postgres=# CREATE TABLE tengiga (i int, filler text) with (fillfactor=10); CREATE TABLE postgres=# insert into tengiga select g, repeat('x', 900) from generate_series(1, 140) g; INSERT 0 140 postgres=# set default_statistics_target = 10; ANALYZE tengiga; SET ANALYZE postgres=# set default_statistics_target = 100; ANALYZE tengiga; SET ANALYZE postgres=# set default_statistics_target =1000; ANALYZE tengiga; SET server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. TRAP: failed Assert("BufferIsValid(hscan->rs_cbuf)"), File: "heapam_handler.c", Line: 1079, PID: 262232 postgres: heikki postgres [local] ANALYZE(ExceptionalCondition+0xa8)[0x56488a0de9d8] postgres: heikki postgres [local] ANALYZE(heapam_scan_analyze_next_block+0x63)[0x5648899ece34] postgres: heikki postgres [local] ANALYZE(+0x2d3f34)[0x564889b6af34] postgres: heikki postgres [local] ANALYZE(+0x2d2a3a)[0x564889b69a3a] postgres: heikki postgres [local] ANALYZE(analyze_rel+0x33e)[0x564889b68fa9] postgres: heikki postgres [local] ANALYZE(vacuum+0x4b3)[0x564889c2dcc0] postgres: heikki postgres [local] ANALYZE(ExecVacuum+0xd6f)[0x564889c2d7fe] postgres: heikki postgres [local] ANALYZE(standard_ProcessUtility+0x901)[0x564889f0b8b9] postgres: heikki postgres [local] ANALYZE(ProcessUtility+0x136)[0x564889f0afb1] postgres: heikki postgres [local] ANALYZE(+0x6728c8)[0x564889f098c8] postgres: heikki postgres [local] ANALYZE(+0x672b3b)[0x564889f09b3b] postgres: heikki postgres [local] ANALYZE(PortalRun+0x320)[0x564889f09015] postgres: heikki postgres [local] ANALYZE(+0x66b2c6)[0x564889f022c6] postgres: heikki postgres [local] ANALYZE(PostgresMain+0x80c)[0x564889f06fd7] postgres: heikki postgres [local] ANALYZE(+0x667876)[0x564889efe876] postgres: heikki postgres [local] ANALYZE(postmaster_child_launch+0xe6)[0x564889e1f4b3] postgres: heikki postgres [local] ANALYZE(+0x58e68e)[0x564889e2568e] postgres: heikki postgres [local] ANALYZE(+0x58b7f0)[0x564889e227f0] postgres: heikki postgres [local] ANALYZE(PostmasterMain+0x152b)[0x564889e2214d] postgres: heikki postgres [local] ANALYZE(+0xb4)[0x564889cdb4b4] /lib/x86_64-linux-gnu/libc.so.6(+0x2724a)[0x7f7d83b6724a] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f7d83b67305] postgres: heikki postgres [local] ANALYZE(_start+0x21)[0x564889971a61] 2024-04-03 20:15:49.157 EEST [262101] LOG: server process (PID 262232) was terminated by signal 6: Aborted -- Heikki Linnakangas Neon (https://neon.tech)
Re: Adding comments to help understand psql hidden queries
I got Greg's blessing on squashing the commits down, and now including a v4 with additional improvements on the output formatting front. Main changes: - all generated comments are the same width - width has been bumped to 80 - removed _() functions for consumers of the new output functions This patch adds two new helper functions, OutputComment() and OutputCommentStars() to output and wrap the comments to the appropriate widths. Everything should continue to work just fine if we want to adjust the width by just adjusting the MAX_COMMENT_WIDTH symbol. I removed _() in the output of the query/stars since there'd be no sensible existing translations for the constructed string, which included the query string itself. If we need it for the "QUERY" string, this could be added fairly easily, but the existing piece would have been nonsensical and never used in practice. Thanks, David v4-0001-Improve-SQL-comments-on-echo-hidden-output.patch Description: Binary data
Re: Combine Prune and Freeze records emitted by vacuum
On Wed, Apr 3, 2024 at 12:34 PM Heikki Linnakangas wrote: > > On 03/04/2024 17:18, Melanie Plageman wrote: > > I noticed you didn't make the comment updates I suggested in my > > version 13 here [1]. A few of them are outdated references to > > heap_page_prune() and one to a now deleted variable name > > (all_visible_except_removable). > > > > I applied them to your v13 and attached the diff. > > Applied those changes, and committed. Thank you! Thanks! And thanks for updating the commitfest entry! - Melanie
Re: RFC: Additional Directory for Extensions
> Fwiw, I wrote this patch to solve the problem of testing extensions at > build-time where the build process does not have write access to > PGSHAREDIR. It solves that problem quite well, almost all PG > extensions have build-time test coverage now (where there was > basically 0 before). Also, it's called extension "destdir" because it behaves like DESTDIR in Makefiles: It prepends the given path to the path that PG is trying to open when set. So it doesn't allow arbitrary new locations as of now, just /home/build/foo-1/debian/foo/usr/share/postgresql/17/extension in addition to /usr/share/postgresql/17/extension. (That is what the Debian package build process needs, so that restriction/design choice made sense.) > Security is not a concern at this point as everything is running as > the same user, and the test cluster will be wiped right after the > test. I figured marking the setting as "super user" only was enough > security at that point, but I would recommend another audit before > using it together with "trusted" extensions and other things in > production. That's also included in the current GUC description: This directory is prepended to paths when loading extensions (control and SQL files), and to the '$libdir' directive when loading modules that back functions. The location is made configurable to allow build-time testing of extensions that do not have been installed to their proper location yet. Perhaps I should have included a more verbose "NOT FOR PRODUCTION" there. As for compatibility, the patch has been part of the PG 9.5..17 now for several years, and I'm very happy with extra test coverage it provides, especially on the Debian architectures that don't have "autopkgtest" runners yet. Christoph
Re: Streaming read-ready sequential scan code
On Tue, Apr 2, 2024 at 1:10 PM Heikki Linnakangas wrote: > > On 01/04/2024 22:58, Melanie Plageman wrote: > > Attached v7 has version 14 of the streaming read API as well as a few > > small tweaks to comments and code. > > I saw benchmarks in this thread to show that there's no regression when > the data is in cache, but I didn't see any benchmarks demonstrating the > benefit of this. So I ran this quick test: Good point! It would be good to show why we would actually want this patch. Attached v8 is rebased over current master (which now has the streaming read API). On the topic of BAS_BULKREAD buffer access strategy, I think the least we could do is add an assert like this to read_stream_begin_relation() after calculating max_pinned_buffers. Assert(GetAccessStrategyBufferCount(strategy) > max_pinned_buffers); Perhaps we should do more? I think with a ring size of 16 MB, large SELECTs are safe for now. But I know future developers will make changes and it would be good not to assume they will understand that pinning more buffers than the size of the ring effectively invalidates the ring. - Melanie From cfccafec650a77c53b1d78180b52db31742181ff Mon Sep 17 00:00:00 2001 From: Melanie Plageman Date: Wed, 27 Mar 2024 20:25:06 -0400 Subject: [PATCH v8 3/3] Sequential scans and TID range scans stream reads Implementing streaming read support for heap sequential scans and TID range scans includes three parts: Allocate the read stream object in heap_beginscan(). On rescan, reset the stream by releasing all pinned buffers and resetting the prefetch block. Implement a callback returning the next block to prefetch to the read stream infrastructure. Invoke the read stream API when a new page is needed. When the scan direction changes, reset the stream. --- src/backend/access/heap/heapam.c | 94 src/include/access/heapam.h | 15 + 2 files changed, 97 insertions(+), 12 deletions(-) diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 6b26f5bf8af..3546f637c13 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -221,6 +221,25 @@ static const int MultiXactStatusLock[MaxMultiXactStatus + 1] = * */ +static BlockNumber +heap_scan_stream_read_next(ReadStream *pgsr, void *private_data, + void *per_buffer_data) +{ + HeapScanDesc scan = (HeapScanDesc) private_data; + + if (unlikely(!scan->rs_inited)) + { + scan->rs_prefetch_block = heapgettup_initial_block(scan, scan->rs_dir); + scan->rs_inited = true; + } + else + scan->rs_prefetch_block = heapgettup_advance_block(scan, + scan->rs_prefetch_block, + scan->rs_dir); + + return scan->rs_prefetch_block; +} + /* * initscan - scan code common to heap_beginscan and heap_rescan * @@ -323,6 +342,13 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock) scan->rs_cbuf = InvalidBuffer; scan->rs_cblock = InvalidBlockNumber; + /* + * Initialize to ForwardScanDirection because it is most common and heap + * scans usually must go forwards before going backward. + */ + scan->rs_dir = ForwardScanDirection; + scan->rs_prefetch_block = InvalidBlockNumber; + /* page-at-a-time fields are always invalid when not rs_inited */ /* @@ -459,12 +485,14 @@ heapbuildvis(TableScanDesc sscan) /* * heapfetchbuf - subroutine for heapgettup() * - * This routine reads the next block of the relation into a buffer and returns - * with that pinned buffer saved in the scan descriptor. + * This routine gets gets the next block of the relation from the read stream + * and saves that pinned buffer in the scan descriptor. */ static inline void heapfetchbuf(HeapScanDesc scan, ScanDirection dir) { + Assert(scan->rs_read_stream); + /* release previous scan buffer, if any */ if (BufferIsValid(scan->rs_cbuf)) { @@ -479,19 +507,23 @@ heapfetchbuf(HeapScanDesc scan, ScanDirection dir) */ CHECK_FOR_INTERRUPTS(); - if (unlikely(!scan->rs_inited)) + /* + * If the scan direction is changing, reset the prefetch block to the + * current block. Otherwise, we will incorrectly prefetch the blocks + * between the prefetch block and the current block again before + * prefetching blocks in the new, correct scan direction. + */ + if (unlikely(scan->rs_dir != dir)) { - scan->rs_cblock = heapgettup_initial_block(scan, dir); - Assert(scan->rs_cblock != InvalidBlockNumber || !BufferIsValid(scan->rs_cbuf)); - scan->rs_inited = true; + scan->rs_prefetch_block = scan->rs_cblock; + read_stream_reset(scan->rs_read_stream); } - else - scan->rs_cblock = heapgettup_advance_block(scan, scan->rs_cblock, dir); - /* read block if valid */ - if (BlockNumberIsValid(scan->rs_cblock)) - scan->rs_cbuf = ReadBufferExtended(scan->rs_base.rs_rd, MAIN_FORKNUM, - scan->rs_cblock, RBM_NORMAL, scan->rs_strategy); +
Re: Is it safe to cache data by GiST consistent function
=?utf-8?Q?Micha=C5=82_K=C5=82eczek?= writes: > On 3 Apr 2024, at 16:27, Tom Lane wrote: >> AFAIK it works. I don't see any of the in-core ones doing so, >> but at least range_gist_consistent and multirange_gist_consistent >> are missing a bet by repeating their cache search every time. > pg_trgm consistent caches tigrams but it has some logic to make sure cached > values are recalculated: > cache = (gtrgm_consistent_cache *) fcinfo->flinfo->fn_extra; > if (cache == NULL || > cache->strategy != strategy || > VARSIZE(cache->query) != querysize || > memcmp((char *) cache->query, (char *) query, querysize) != 0) > What I don’t understand is if it is necessary or it is enough to check > fn_extra==NULL. Ah, I didn't think to search contrib. Yes, you need to validate the cache entry. In this example, a rescan could insert a new query value. In general, an opclass support function could get called using a pretty long-lived FunctionCallInfo (e.g. one in the index's relcache entry), so it's unwise to assume that cached data is relevant to the current call without checking. regards, tom lane
Re: LogwrtResult contended spinlock
On Wed, Apr 3, 2024 at 4:49 PM Alvaro Herrera wrote: > > Thanks for keeping this moving forward. I gave your proposed patches a > look. One thing I didn't like much is that we're adding a new member > (Copy) to XLogwrtAtomic -- but this struct is supposed to be a mirror of > XLogwrtResult for use with atomic access. Since this new member is not > added to XLogwrtResult (because it's not needed there), the whole idea > of there being symmetry between those two structs crumbles down. > Because we later stop using struct-assign anyway, meaning we no longer > need the structs to match, we can instead spell out the members in > XLogCtl and call it a day. Hm, I have no objection to having separate variables in XLogCtl. > So what I do in the attached 0001 is stop using the XLogwrtResult struct > in XLogCtl and replace it with separate Write and Flush values, and add > the macro XLogUpdateLocalLogwrtResult() that copies the values of Write > and Flush from the shared XLogCtl to the local variable given as macro > argument. (I also added our idiomatic do {} while(0) to the macro > definition, for safety). The new members are XLogCtl->logWriteResult > and XLogCtl->logFlushResult and as of 0001 are just XLogRecPtr, so > essentially identical semantics as the previous code. No atomic access > yet! +1. > 0002 then adds pg_atomic_monotonic_advance_u64. (I don't add the _u32 > variant, because I don't think it's a great idea to add dead code. If > later we see a need for it we can put it in.) It also changes the two > new members to be atomics, changes the macro to use atomic read, and > XLogWrite now uses monotonic increment. A couple of other places can > move the macro calls to occur outside the spinlock. Also, XLogWrite > gains the invariant checking that involves Write and Flush. I'm fine with not having the 32 bit variant of pg_atomic_monotonic_advance. However, a recent commit bd5132db5 added both 32 and 64 bit versions of pg_atomic_read_membarrier even though 32 bit isn't being used. > Finally, 0003 adds the Copy pointer to XLogCtl alongside Write and > Flush, and updates WALReadFromBuffers to test that instead of the Write > pointer, and adds in XLogWrite the invariant checks that involve the > Copy pointer. +1. The attached patches look good to me. Also, I'm fine with renaming XLogUpdateLocalLogwrtResult() to RefreshXLogWriteResult(). > I haven't rerun Bharath test loop yet; will do so shortly. I ran it a 100 times [1] on top of all the 3 patches, it looks fine. [1] for i in {1..100}; do make check PROVE_TESTS="t/027_stream_regress.pl"; if [ $? -ne 0 ]; then echo "The command failed on iteration $i"; break; fi; done ./configure --prefix=$PWD/pg17/ --enable-debug --enable-tap-tests --enable-cassert CC=/usr/bin/clang-14 > install.log && make -j 8 install > install.log 2>&1 & -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Detoasting optionally to make Explain-Analyze less misleading
On Tue, 2 Apr 2024 at 17:47, Tom Lane wrote: > > Matthias van de Meent writes: > > Attached is v9, which is rebased on master to handle 24eebc65's > > changed signature of pq_sendcountedtext. > > It now also includes autocompletion, and a second patch which adds > > documentation to give users insights into this new addition to > > EXPLAIN. > > I took a quick look through this. Some comments in no particular > order: Thanks! > Documentation is not optional, so I don't really see the point of > splitting this into two patches. I've seen the inverse several times, but I've merged them in the attached version 10. > IIUC, it's not possible to use the SERIALIZE option when explaining > CREATE TABLE AS, because you can't install the instrumentation tuple > receiver when the IntoRel one is needed. I think that's fine because > no serialization would happen in that case anyway, but should we > throw an error if that combination is requested? Blindly reporting > that zero bytes were serialized seems like it'd confuse people. I think it's easily explained as no rows were transfered to the client. If there is actual confusion, we can document it, but confusing disk with network is not a case I'd protect against. See also: EXPLAIN (ANALYZE, SERIALIZE) INSERT without the RETURNING clause. > I'd lose the stuff about measuring memory consumption. Nobody asked > for that and the total is completely misleading, because in reality > we'll reclaim the memory used after each row. It would allow cutting > the text-mode output down to one line, too, instead of having your > own format that's not like anything else. Done. > I thought the upthread agreement was to display the amount of > data sent rounded to kilobytes, so why is the code displaying > an exact byte count? Probably it was because the other explain code I referenced was using bytes in the json/yaml format. Fixed. > I don't especially care for magic numbers like these: > > + /* see printtup.h why we add 18 bytes here. These are the > infos > +* needed for each attribute plus the attribute's name */ > + receiver->metrics.bytesSent += (int64) namelen + 1 + 18; > > If the protocol is ever changed in a way that invalidates this, > there's about 0 chance that somebody would remember to touch > this code. > However, isn't the bottom half of serializeAnalyzeStartup doing > exactly what the comment above it says we don't do, that is accounting > for the RowDescription message? Frankly I agree with the comment that > it's not worth troubling over, so I'd just drop that code rather than > finding a solution for the magic-number problem. In the comment above I intended to explain that it takes negligible time to serialize the RowDescription message (when compared to all other tasks of explain), so skipping the actual writing of the message would be fine. I'm not sure I agree with not including the size of RowDescription itself though, as wide results can have a very large RowDescription overhead; up to several times the returned data in cases where few rows are returned. Either way, I've removed that part of the code. > Don't bother with duplicating valgrind-related logic in > serializeAnalyzeReceive --- that's irrelevant to actual users. Removed. I've instead added buffer usage, as I realised that wasn't covered yet, and quite important to detect excessive detoasting (it's not covered at the top-level scan). > This seems like cowboy coding: > > + self->destRecevier.mydest = DestNone; > > You should define a new value of the CommandDest enum and > integrate this receiver type into the support functions > in dest.c. Done. > BTW, "destRecevier" is misspelled... Thanks, fixed. Kind regards, Matthias van de Meent. v10-0001-Explain-Add-SERIALIZE-option.patch Description: Binary data
Re: [HACKERS] make async slave to wait for lsn to be replayed
Hello Alexander, On 2024-Apr-03, Alexander Korotkov wrote: > Regarding the shmem data structure for LSN waiters. I didn't pick > LWLock or ConditionVariable, because I needed the ability to wake up > only those waiters whose LSN is already replayed. In my experience > waking up a process is way slower than scanning a short flat array. I agree, but I think that's unrelated to what I was saying, which is just the patch I attach here. > However, I agree that when the number of waiters is very high and flat > array may become a problem. It seems that the pairing heap is not > hard to use for shmem structures. The only memory allocation call in > paritingheap.c is in pairingheap_allocate(). So, it's only needed to > be able to initialize the pairing heap in-place, and it will be fine > for shmem. Ok. With the code as it stands today, everything in WaitLSNState apart from the pairing heap is accessed without any locking. I think this is at least partly OK because each backend only accesses its own entry; but it deserves a comment. Or maybe something more, because WaitLSNSetLatches does modify the entry for other backends. (Admittedly, this could only happens for backends that are already sleeping, and it only happens with the lock acquired, so it's probably okay. But clearly it deserves a comment.) Don't we need to WaitLSNCleanup() during error recovery or something? -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "World domination is proceeding according to plan"(Andrew Morton) >From 4079dc6a6a6893055b32dee75f178b324bbaef77 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Wed, 3 Apr 2024 18:35:15 +0200 Subject: [PATCH] Use an LWLock instead of a spinlock in waitlsn.c --- src/backend/commands/waitlsn.c | 15 +++ src/backend/utils/activity/wait_event_names.txt | 1 + src/include/commands/waitlsn.h | 5 + src/include/storage/lwlocklist.h| 1 + 4 files changed, 10 insertions(+), 12 deletions(-) diff --git a/src/backend/commands/waitlsn.c b/src/backend/commands/waitlsn.c index 51a34d422e..a57b818a2d 100644 --- a/src/backend/commands/waitlsn.c +++ b/src/backend/commands/waitlsn.c @@ -58,7 +58,6 @@ WaitLSNShmemInit(void) ); if (!found) { - SpinLockInit(>waitersHeapMutex); pg_atomic_init_u64(>minWaitedLSN, PG_UINT64_MAX); pairingheap_initialize(>waitersHeap, lsn_cmp, NULL); memset(>procInfos, 0, MaxBackends * sizeof(WaitLSNProcInfo)); @@ -115,13 +114,13 @@ addLSNWaiter(XLogRecPtr lsn) procInfo->procnum = MyProcNumber; procInfo->waitLSN = lsn; - SpinLockAcquire(>waitersHeapMutex); + LWLockAcquire(WaitLSNLock, LW_EXCLUSIVE); pairingheap_add(>waitersHeap, >phNode); procInfo->inHeap = true; updateMinWaitedLSN(); - SpinLockRelease(>waitersHeapMutex); + LWLockRelease(WaitLSNLock); } /* @@ -132,11 +131,11 @@ deleteLSNWaiter(void) { WaitLSNProcInfo *procInfo = >procInfos[MyProcNumber]; - SpinLockAcquire(>waitersHeapMutex); + LWLockAcquire(WaitLSNLock, LW_EXCLUSIVE); if (!procInfo->inHeap) { - SpinLockRelease(>waitersHeapMutex); + LWLockRelease(WaitLSNLock); return; } @@ -144,7 +143,7 @@ deleteLSNWaiter(void) procInfo->inHeap = false; updateMinWaitedLSN(); - SpinLockRelease(>waitersHeapMutex); + LWLockRelease(WaitLSNLock); } /* @@ -160,7 +159,7 @@ WaitLSNSetLatches(XLogRecPtr currentLSN) wakeUpProcNums = palloc(sizeof(int) * MaxBackends); - SpinLockAcquire(>waitersHeapMutex); + LWLockAcquire(WaitLSNLock, LW_EXCLUSIVE); /* * Iterate the pairing heap of waiting processes till we find LSN not yet @@ -182,7 +181,7 @@ WaitLSNSetLatches(XLogRecPtr currentLSN) updateMinWaitedLSN(); - SpinLockRelease(>waitersHeapMutex); + LWLockRelease(WaitLSNLock); /* * Set latches for processes, whose waited LSNs are already replayed. This diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt index 0d288d6b3d..ec43a78d60 100644 --- a/src/backend/utils/activity/wait_event_names.txt +++ b/src/backend/utils/activity/wait_event_names.txt @@ -330,6 +330,7 @@ WALSummarizer "Waiting to read or update WAL summarization state." DSMRegistry "Waiting to read or update the dynamic shared memory registry." InjectionPoint "Waiting to read or update information related to injection points." SerialControl "Waiting to read or update shared pg_serial state." +WaitLSN "Waiting to read or update shared Wait-for-LSN state." # # END OF PREDEFINED LWLOCKS (DO NOT CHANGE THIS LINE) diff --git a/src/include/commands/waitlsn.h b/src/include/commands/waitlsn.h index 0d80248682..b3d9eed64d 100644 --- a/src/include/commands/waitlsn.h +++ b/src/include/commands/waitlsn.h @@ -55,13 +55,10 @@ typedef struct WaitLSNState /* * A pairing heap of waiting processes order by LSN values (least LSN is - * on top). + * on top). Protected by WaitLSNLock. */ pairingheap
Re: Is it safe to cache data by GiST consistent function
Thanks for taking your time to answer. Not sure if I understand though. > On 3 Apr 2024, at 16:27, Tom Lane wrote: > > =?utf-8?Q?Micha=C5=82_K=C5=82eczek?= writes: >> When implementing a GiST consistent function I found the need to cache >> pre-processed query across invocations. >> I am not sure if it is safe to do (or I need to perform some steps to make >> sure cached info is not leaked between rescans). > > AFAIK it works. I don't see any of the in-core ones doing so, > but at least range_gist_consistent and multirange_gist_consistent > are missing a bet by repeating their cache search every time. pg_trgm consistent caches tigrams but it has some logic to make sure cached values are recalculated: cache = (gtrgm_consistent_cache *) fcinfo->flinfo->fn_extra; if (cache == NULL || cache->strategy != strategy || VARSIZE(cache->query) != querysize || memcmp((char *) cache->query, (char *) query, querysize) != 0) What I don’t understand is if it is necessary or it is enough to check fn_extra==NULL. > >> The comment in gistrescan says: > >> /* >> * If this isn't the first time through, preserve the fn_extra >> * pointers, so that if the consistentFns are using them to >> cache >> * data, that data is not leaked across a rescan. >> */ > >> which seems to me self-contradictory as fn_extra is preserved between >> rescans (so leaks are indeed possible). > > I think you're reading it wrong. If we cleared fn_extra during > rescan, access to the old extra value would be lost so a new one > would have to be created, leaking the old value for the rest of > the query. I understand that but not sure what “that data is not leaked across a rescan” means. — Michal
Re: RFC: Additional Directory for Extensions
Re: David E. Wheeler > > Yes, I like the suggestion to make it require a restart, which lets the > > sysadmin control it and not limited to whatever the person who compiled it > > thought would make sense. > > Here’s a revision of the Debian patch that requires a server start. Thanks for bringing this up, I should have submitted this years ago. (The patch is originally from September 2020.) I designed the patch to require a superuser to set it, so it doesn't matter very much by which mechanism it gets updated. There should be little reason to vary it at run-time, so I'd be fine with requiring a restart, but otoh, why restrict the superuser from reloading it if they know what they are doing? > However, in studying the patch, it appears that the `extension_directory` is > searched for *all* shared libraries, not just those being loaded for an > extension. Am I reading the `expand_dynamic_library_name()` function right? > > If so, this seems like a good way for a bad actor to muck with things, by > putting an exploited libpgtypes library into the extension directory, where > it would be loaded in preference to the core libpgtypes library, if they > couldn’t exploit the original. > > I’m thinking it would be better to have the dynamic library lookup for > extension libraries (and LOAD libraries?) separate, so that the > `extension_directory` would not be used for core libraries. I'm not sure the concept of "core libraries" exists. PG happens to dlopen things at run time, and it doesn't know/care if they were installed by users or by the original PG server. Also, an exploited libpgtypes library is not worse than any other untrusted "user" library, so you really don't want to allow users to provide their own .so files, no matter by what mechanism. > This would also allow the lookup of extension libraries prefixed by the > directory field from the control file, which would enable much tidier > extension installation: The control file, SQL scripts, and DSOs could all be > in a single directory for an extension. Nice idea, but that would mean installing .so files into PGSHAREDIR. Perhaps the whole extension stuff would have to move to PKGLIBDIR instead. Fwiw, I wrote this patch to solve the problem of testing extensions at build-time where the build process does not have write access to PGSHAREDIR. It solves that problem quite well, almost all PG extensions have build-time test coverage now (where there was basically 0 before). Security is not a concern at this point as everything is running as the same user, and the test cluster will be wiped right after the test. I figured marking the setting as "super user" only was enough security at that point, but I would recommend another audit before using it together with "trusted" extensions and other things in production. Christoph
Re: Combine Prune and Freeze records emitted by vacuum
On 03/04/2024 17:18, Melanie Plageman wrote: I noticed you didn't make the comment updates I suggested in my version 13 here [1]. A few of them are outdated references to heap_page_prune() and one to a now deleted variable name (all_visible_except_removable). I applied them to your v13 and attached the diff. Applied those changes, and committed. Thank you! Off-list, Melanie reported that there is a small regression with the benchmark script she posted yesterday, after all, but I'm not able to reproduce that. Actually, I think it was noise. Ok, phew. -- Heikki Linnakangas Neon (https://neon.tech)
Re: [PATCH] Modify pg_ctl to detect presence of geek user
> On 3 Apr 2024, at 18:17, Panda Developpeur > wrote: > > Thank you for considering my contribution. Looks interesting! + usleep(50); Don't we need to make system 500ms faster instead? Let's change it to + usleep(-50); Thanks! Best regards, Andrey Borodin.
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Wed, Apr 03, 2024 at 08:28:04PM +0530, Bharath Rupireddy wrote: > On Wed, Apr 3, 2024 at 6:46 PM Bertrand Drouvot > wrote: > > > > Just one comment on v32-0001: > > > > +# Synced slot on the standby must get its own inactive_since. > > +is( $standby1->safe_psql( > > + 'postgres', > > + "SELECT '$inactive_since_on_primary'::timestamptz <= > > '$inactive_since_on_standby'::timestamptz AND > > + '$inactive_since_on_standby'::timestamptz <= > > '$slot_sync_time'::timestamptz;" > > + ), > > + "t", > > + 'synchronized slot has got its own inactive_since'); > > + > > > > By using <= we are not testing that it must get its own inactive_since (as > > we > > allow them to be equal in the test). I think we should just add some > > usleep() > > where appropriate and deny equality during the tests on inactive_since. > > > Except for the above, v32-0001 LGTM. > > Thanks. Please see the attached v33-0001 patch after removing equality > on inactive_since TAP tests. Thanks! v33-0001 LGTM. > On Wed, Apr 3, 2024 at 1:47 PM Bertrand Drouvot > wrote: > > Some comments regarding v31-0002: > > > > T2 === > > > > In case the slot is invalidated on the primary, > > > > primary: > > > > postgres=# select slot_name, inactive_since, invalidation_reason from > > pg_replication_slots where slot_name = 's1'; > > slot_name |inactive_since | invalidation_reason > > ---+---+- > > s1| 2024-04-03 06:56:28.075637+00 | inactive_timeout > > > > then on the standby we get: > > > > standby: > > > > postgres=# select slot_name, inactive_since, invalidation_reason from > > pg_replication_slots where slot_name = 's1'; > > slot_name |inactive_since| invalidation_reason > > ---+--+- > > s1| 2024-04-03 07:06:43.37486+00 | inactive_timeout > > > > shouldn't the slot be dropped/recreated instead of updating inactive_since? > > The sync slots that are invalidated on the primary aren't dropped and > recreated on the standby. Yeah, right (I was confused with synced slots that are invalidated locally). > However, I > found that the synced slot is acquired and released unnecessarily > after the invalidation_reason is synced from the primary. I added a > skip check in synchronize_one_slot to skip acquiring and releasing the > slot if it's locally found inactive. With this, inactive_since won't > get updated for invalidated sync slots on the standby as we don't > acquire and release the slot. CR1 === Yeah, I can see: @@ -575,6 +575,13 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid) " name slot \"%s\" already exists on the standby", remote_slot->name)); + /* +* Skip the sync if the local slot is already invalidated. We do this +* beforehand to save on slot acquire and release. +*/ + if (slot->data.invalidated != RS_INVAL_NONE) + return false; Thanks to the drop_local_obsolete_slots() call I think we are not missing the case where the slot has been invalidated on the primary, invalidation reason has been synced on the standby and later the slot is dropped/ recreated manually on the primary (then it should be dropped/recreated on the standby too). Also it seems we are not missing the case where a sync slot is invalidated locally due to wal removal (it should be dropped/recreated). > > > CR5 === > > > > + /* > > +* This function isn't expected to be called for inactive timeout > > based > > +* invalidation. A separate function > > InvalidateInactiveReplicationSlot is > > +* to be used for that. > > > > Do you think it's worth to explain why? > > Hm, I just wanted to point out the actual function here. I modified it > to something like the following, if others feel we don't need that, I > can remove it. Sorry If I was not clear but I meant to say "Do you think it's worth to explain why we decided to create a dedicated function"? (currently we "just" explain that we created one). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock
On 2024-Apr-03, Dilip Kumar wrote: > Yeah, we missed acquiring the bank lock w.r.t. intervening pages, > thanks for reporting. Your fix looks correct to me. Thanks for the quick review! And thanks to Alexander for the report. Pushed the fix. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "No hay hombre que no aspire a la plenitud, es decir, la suma de experiencias de que un hombre es capaz"
Re: [PATCH] Modify pg_ctl to detect presence of geek user
Yeah sorry for the delay, it took me some time to understood how build, modify and test the modification
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
Jacob Champion writes: > As far as I can tell, no versions of LibreSSL so far provide > X509_get_signature_info(), so this patch is probably a bit too > aggressive. Another problem with cutting support is how many buildfarm members will we lose. I scraped recent configure logs and got the attached results. I count 3 machines running 1.0.1, 18 running some flavor of 1.0.2, and 7 running various LibreSSL versions. We could probably retire or update the 1.0.1 installations, but the rest would represent a heavier lift. Notably, it seems that what macOS is shipping is LibreSSL. regards, tom lane sysname| l ---+-- alabio| configure: using openssl: OpenSSL 1.1.1w 11 Sep 2023 alimoche | configure: using openssl: OpenSSL 1.1.1n 15 Mar 2022 arowana | configure: using openssl: OpenSSL 1.0.2k-fips 26 Jan 2017 avocet| configure: using openssl: OpenSSL 1.1.1l-fips 24 Aug 2021 SUSE release 150400.7.60.2 ayu | configure: using openssl: OpenSSL 1.1.0l 10 Sep 2019 babbler | configure: using openssl: OpenSSL 1.1.1k FIPS 25 Mar 2021 basilisk | configure: using openssl: OpenSSL 3.1.4 24 Oct 2023 (Library: OpenSSL 3.1.4 24 Oct 2023) batfish | configure: using openssl: OpenSSL 1.0.2g 1 Mar 2016 batta | configure: using openssl: OpenSSL 1.1.1w 11 Sep 2023 blackneck | configure: using openssl: OpenSSL 1.1.1n 15 Mar 2022 boa | configure: using openssl: OpenSSL 1.0.2k-fips 26 Jan 2017 boomslang | configure: using openssl: OpenSSL 1.1.1k 25 Mar 2021 broadbill | configure: using openssl: OpenSSL 1.1.1k FIPS 25 Mar 2021 bulbul| configure: using openssl: OpenSSL 1.1.1k FIPS 25 Mar 2021 buri | configure: using openssl: OpenSSL 1.0.2k-fips 26 Jan 2017 bushmaster| configure: using openssl: OpenSSL 3.1.5 30 Jan 2024 (Library: OpenSSL 3.1.5 30 Jan 2024) butterflyfish | configure: using openssl: OpenSSL 1.0.2p-fips 14 Aug 2018 caiman| configure: using openssl: OpenSSL 3.2.1 30 Jan 2024 (Library: OpenSSL 3.2.1 30 Jan 2024) canebrake | configure: using openssl: OpenSSL 3.1.5 30 Jan 2024 (Library: OpenSSL 3.1.5 30 Jan 2024) cascabel | configure: using openssl: OpenSSL 1.1.1n 15 Mar 2022 (Library: OpenSSL 1.1.1w 11 Sep 2023) cavefish | configure: using openssl: OpenSSL 1.1.1 11 Sep 2018 chevrotain| configure: using openssl: OpenSSL 1.1.1n 15 Mar 2022 (Library: OpenSSL 1.1.1w 11 Sep 2023) chimaera | configure: using openssl: OpenSSL 1.1.0l 10 Sep 2019 chipmunk | configure: using openssl: OpenSSL 1.0.1t 3 May 2016 cisticola | configure: using openssl: OpenSSL 1.1.1g FIPS 21 Apr 2020 clam | configure: using openssl: OpenSSL 1.0.2k-fips 26 Jan 2017 conchuela | configure: using openssl: LibreSSL 3.2.5 copperhead| configure: using openssl: OpenSSL 1.1.1k 25 Mar 2021 culpeo| configure: using openssl: OpenSSL 3.0.11 19 Sep 2023 (Library: OpenSSL 3.0.11 19 Sep 2023) cuon | configure: using openssl: OpenSSL 1.0.2g 1 Mar 2016 demoiselle| configure: using openssl: OpenSSL 1.1.0h-fips 27 Mar 2018 desman| configure: using openssl: OpenSSL 3.0.9 30 May 2023 (Library: OpenSSL 3.0.9 30 May 2023) dhole | configure: using openssl: OpenSSL 1.0.2k-fips 26 Jan 2017 dikkop| configure: using openssl: OpenSSL 3.0.10 1 Aug 2023 (Library: OpenSSL 3.0.10 1 Aug 2023) elasmobranch | configure: using openssl: OpenSSL 1.1.0h-fips 27 Mar 2018 gokiburi | configure: using openssl: OpenSSL 1.1.1w 11 Sep 2023 grison| configure: using openssl: OpenSSL 1.1.0l 10 Sep 2019 grison| configure: using openssl: OpenSSL 1.1.1n 15 Mar 2022 guaibasaurus | configure: using openssl: OpenSSL 1.1.1w 11 Sep 2023 gull | configure: using openssl: OpenSSL 1.1.1w 11 Sep 2023 habu | configure: using openssl: OpenSSL 3.0.9 30 May 2023 (Library: OpenSSL 3.0.9 30 May 2023) hachi | configure: using openssl: OpenSSL 1.1.1w 11 Sep 2023 hake | configure: using openssl: OpenSSL 1.0.2u 20 Dec 2019 hippopotamus | configure: using openssl: OpenSSL 1.1.1l-fips 24 Aug 2021 SUSE release 150400.7.60.2 indri | configure: using openssl: OpenSSL 3.2.0 23 Nov 2023 (Library: OpenSSL 3.2.0 23 Nov 2023) jackdaw | configure: using openssl: OpenSSL 1.1.1n 15 Mar 2022 (Library: OpenSSL 1.1.1w 11 Sep 2023) jay | configure: using openssl: OpenSSL 1.1.1l-fips 24 Aug 2021 SUSE release 150400.7.60.2 kingsnake | configure: using openssl: OpenSSL 3.0.9 30 May 2023 (Library: OpenSSL 3.0.9 30 May 2023) krait | configure: using openssl: OpenSSL 1.1.1k FIPS 25 Mar 2021 lancehead | configure: using