Re: index prefetching

2025-08-12 Thread Andres Freund
0.00 0.000.00 0.000.000.000.69 63.80 Note the different read sizes... > I did look into pg_aios, but there's only 8kB requests in both cases. I > didn't have time to look closer yet. That's what we'd expect, right? There's nothing on master that'd perform read combining for index scans... Greetings, Andres Freund

Re: Update LDAP Protocol in fe-connect.c to v3

2025-08-12 Thread Andres Freund
e failure, or at least the cfbot is showing a red > column at the moment. See https://postgr.es/m/CAN55FZ1RuBhJmPWs3Oi%3D9UoezDfrtO-VaU67db5%2B0_uy19uF%2BA%40mail.gmail.com Greetings, Andres Freund

Re: Annoying warning in SerializeClientConnectionInfo

2025-08-12 Thread Andres Freund
Hi, On 2025-08-11 16:30:30 -0700, Jacob Champion wrote: > On Mon, Aug 11, 2025 at 3:52 PM Andres Freund wrote: > > And the warning is right. Not sure why a new compiler is needed, IIRC this > > warning is present in other cases with older compilers too. > > Probably >

Re: Adding basic NUMA awareness

2025-08-12 Thread Andres Freund
reate > PGPROC partitions only for those)? I suppose that requires literally > walking all the nodes. I didn't think of numa_node_of_cpu(). As long as numa_node_of_cpu() returns *something* I think it may be good enough. Nobody uses an RPi for high-throughput postgres workloads with a lot of memory. Slightly sub-optimal mappings should really not matter. I'm kinda wondering if we should deal with such fake numa systems by detecting them and disabling our numa support. Greetings, Andres Freund

Re: Making type Datum be 8 bytes everywhere

2025-08-12 Thread Andres Freund
────┴────┘ (3 rows) Greetings, Andres Freund

Annoying warning in SerializeClientConnectionInfo

2025-08-11 Thread Andres Freund
iler is needed, IIRC this warning is present in other cases with older compilers too. The most obvious fix is to slap on a PG_USED_FOR_ASSERTS_ONLY. However, we so far don't seem to have used it for function parameters... But I don't see a problem with starting to do so. Greetings, Andres Freund

Re: Some ExecSeqScan optimizations

2025-08-11 Thread Andres Freund
Hi, On 2025-07-11 11:22:36 +0900, Amit Langote wrote: > On Fri, Jul 11, 2025 at 5:55 AM Andres Freund wrote: > > On 2025-07-10 17:28:50 +0900, Amit Langote wrote: > > > On Thu, Jul 10, 2025 at 8:34 AM Andres Freund wrote: > > > > The performance gain unsurprisingl

Re: meson: add and use stamp files for generated headers

2025-08-11 Thread Andres Freund
Hi, On 2025-08-11 14:40:40 +0300, Nazir Bilal Yavuz wrote: > Thank you for working on this! Thanks for the review - pushed. Greetings, Andres Freund

Re: Adding basic NUMA awareness

2025-08-08 Thread Andres Freund
yc too, if "L3 LLC as NUMA" is enabled. > I'm not sure what to do about this (or how getcpu() or libnuma handle this). I don't immediately see any libnuma functions that would care? I also am somewhat curious about what getcpu() returns for the current node... Greetings, Andres Freund

meson: add and use stamp files for generated headers

2025-08-08 Thread Andres Freund
l is somewhat expensive. Greetings, Andres Freund >From d845c0d56a0357730a7ec398cd77c6a1ada392fa Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Fri, 8 Aug 2025 19:49:23 -0400 Subject: [PATCH v2] meson: add and use stamp files for generated headers Without using stamp files, meson lists the g

Re: index prefetching

2025-08-08 Thread Andres Freund
ot;strange" combinations of parameters, looking for > weird behaviors like that. I'm just catching up: Isn't it a bit early to focus this much on testing? ISMT that the patchsets for both approaches currently have some known architectural issues and that addressing them seems likely to change their performance characteristics. Greetings, Andres Freund

Re: headerscheck warnings with late-model gcc

2025-08-08 Thread Andres Freund
It's possible to do this by globing for files at configure time, but that wouldn't detect adding new headers (which would need to trigger a re-configure). Whether that's an issue worth caring about I'm a bit on the fence about. Greetings, Andres Freund

Re: Enhance statistics reset functions to return reset timestamp

2025-08-08 Thread Andres Freund
istake to introduce support for granular resets, we shouldn't bury ourselves deeper. If anything we should rip out everything other than 1) a global reset b) a per-database reset. Leaving that aside, I just don't see a convincing use case for returning the timestamp here. Greetings, Andres Freund

Re: Kernel AIO on FreeBSD, macOS and a couple of other Unixen

2025-08-08 Thread Andres Freund
Hi, On 2025-08-08 18:28:09 -0400, Andres Freund wrote: > > From 6574ac9267fe9938f59ed67c8f0282716d8c28f3 Mon Sep 17 00:00:00 2001 > > From: Thomas Munro > > Date: Sun, 3 Aug 2025 00:15:01 +1200 > > Subject: [PATCH v1 3/4] aio: Support I/O methods without true vectore

Re: Kernel AIO on FreeBSD, macOS and a couple of other Unixen

2025-08-08 Thread Andres Freund
_completion_queue() to give up > + * early since this backend can process its own queue promptly and > efficiently. > + */ > +static void > +pgaio_posix_aio_ipc_acquire_own_completion_lock(PgAioPosixAioContext > *context) > +{ > + Assert(context == pgaio_my_posix_aio_context); > + Assert(!LWLockHeldByMe(&context->completion_lock)); > + > + if (!LWLockConditionalAcquire(&context->completion_lock, LW_EXCLUSIVE)) > + { > + ProcNumber procno; > + > + procno = pg_atomic_exchange_u32(&context->ipc_procno, > MyProcNumber); > + if (procno != INVALID_PROC_NUMBER) > + SetLatch(&GetPGProcByNumber(procno)->procLatch); > + > + LWLockAcquire(&context->completion_lock, LW_EXCLUSIVE); > + pg_atomic_write_u32(&context->ipc_procno, INVALID_PROC_NUMBER); > + } > +} Is the "command pgaio_posix_aio_ipc_drain_completion_queue() to give up" path frequent enough to be worth the complexity? I somewhat doubt so? Greetings, Andres Freund

Re: Broken ./configure checks for __cpuid() and __cpuidex()

2025-08-08 Thread Andres Freund
README should do the trick, I'll go > investigate that. FWIW, you can trigger manual tasks in the cirrus-ci web-interface. Greetings, Andres Freund

Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem)

2025-08-08 Thread Andres Freund
A large portion of the cases I've seen where toast ID assignments were a problem were when the global OID counter wrapped around due to activity on *other* tables (and/or temporary table creation). If you instead had a per-toast-table sequence for assigning chunk IDs, that problem would largely vanish. With 64bit toast IDs we shouldn't need to search the index for a non-conflicting toast IDs, there can't be wraparounds (we'd hit wraparound of LSNs well before that and that's not practically reachable). Greetings, Andres Freund

Re: Custom pgstat support performance regression for simple queries

2025-08-08 Thread Andres Freund
On 2025-07-28 08:18:01 +0900, Michael Paquier wrote: > I have used that and applied it down to v18, closing the open item. Thanks!

Re: Datum as struct

2025-08-08 Thread Andres Freund
LTRUE(entry->key)) > + else if (!LTG_ISALLTRUE(entry->key.value)) This should be DatumGet*(), no? > diff --git a/contrib/sepgsql/label.c b/contrib/sepgsql/label.c > index 996ce174454..5d57563ecb7 100644 > --- a/contrib/sepgsql/label.c > +++ b/contrib/sepgsql/label.c > @@ -330,7 +330,7 @@ sepgsql_fmgr_hook(FmgrHookEventType event, > stack = palloc(sizeof(*stack)); > stack->old_label = NULL; > stack->new_label = > sepgsql_avc_trusted_proc(flinfo->fn_oid); > - stack->next_private = 0; > + stack->next_private.value = 0; > > MemoryContextSwitchTo(oldcxt); Probably should use DummyDatum. Greetings, Andres Freund

Re: pgaio_io_get_id() type (was Re: Datum as struct)

2025-08-08 Thread Andres Freund
Hi, On 2025-08-05 19:20:20 +0200, Peter Eisentraut wrote: > On 31.07.25 19:17, Tom Lane wrote: > > Also I see a "// XXX" in pg_get_aios, which I guess is a note > > to confirm the data type to use for ioh_id? > > Yes, the stuff returned from pgaio_io_get_id() should be int, but some code > uses u

Re: Improve error reporting in 027_stream_regress test

2025-07-28 Thread Andres Freund
of days before > getting down to it. I don't really get what the point of designing that mechanism is before we have a usecase. If we need it, we can expand it at that time. Greetings, Andres Freund

Re: Explicitly enable meson features in CI

2025-07-28 Thread Andres Freund
cific changes, so I guess "... all good" covers it... Greetings, Andres Freund

Re: trivial grammar refactor

2025-07-23 Thread Andres Freund
or VERBOSE and once without. That's not exactly a free lunch... Greetings, Andres Freund

Re: Making type Datum be 8 bytes everywhere

2025-07-23 Thread Andres Freund
Hi, On 2025-07-18 13:24:32 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2025-07-17 20:09:57 -0400, Tom Lane wrote: > >> I made it just as a proof-of-concept that this can work. It compiled > >> cleanly and passed check-world for me on a 32-bit FreeBSD im

Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

2025-07-23 Thread Andres Freund
te and IndexScanInstrumentation seems to be pre-destined for that information. But it seems a a bit too much memory to just keep a BufferUsage around even when analyze isn't used. Greetings, Andres Freund PS: Another thing that I think we ought to track is the number of fetches from the table

Re: Parallel heap vacuum

2025-07-23 Thread Andres Freund
? Yes, that might make sense. But wiring it up via tableam doesn't make sense. Greetings, Andres Freund

Re: Custom pgstat support performance regression for simple queries

2025-07-23 Thread Andres Freund
Hi, On 2025-07-23 09:54:12 +0900, Michael Paquier wrote: > On Tue, Jul 22, 2025 at 10:57:06AM -0400, Andres Freund wrote: > > It seems rather unsurprising that that causes a slowdown. > > > > The pre-check is there to: > > /* Don't expend a clock check if n

Re: Custom pgstat support performance regression for simple queries

2025-07-23 Thread Andres Freund
; pending anymore (when flushing) without saying "all the stats have nothing > pending" (while some may still have pending stats)? I don't think that's a problem - reset that global flag after checking it at the start of pgstat_report_stat() and set it to true if partial_flush is true at the end of pgstat_report_stat(). Greetings, Andres Freund

Re: index prefetching

2025-07-23 Thread Andres Freund
Hi, On 2025-07-23 14:50:15 +0200, Tomas Vondra wrote: > On 7/23/25 02:59, Andres Freund wrote: > > Hi, > > > > On 2025-07-23 02:50:04 +0200, Tomas Vondra wrote: > >> But I don't see why would this have any effect on the prefetch distance, >

Re: index prefetching

2025-07-22 Thread Andres Freund
ld be bug, of course. But it'd be helpful to see the dataset/query. Pgbench scale 500, with the simpler query from my message. Greetings, Andres Freund

Re: index prefetching

2025-07-22 Thread Andres Freund
Hi, On 2025-07-22 19:13:23 -0400, Peter Geoghegan wrote: > On Tue, Jul 22, 2025 at 6:53 PM Andres Freund wrote: > > That may be true with local fast NVMe disks, but won't be true for networked > > storage like in common clouds. Latencies of 0.3 - 4ms leave a lot of CPU &

Re: index prefetching

2025-07-22 Thread Andres Freund
eing prefetched. Currently the behaviour in that case is to synchronously wait for IO on that buffer to complete. That obviously causes a "pipeline bubble"... Greetings, Andres Freund

Re: index prefetching

2025-07-22 Thread Andres Freund
Hi, On 2025-07-18 23:25:38 -0400, Peter Geoghegan wrote: > On Fri, Jul 18, 2025 at 10:47 PM Andres Freund wrote: > > > (Within an index AM, there is a 1:1 correspondence between batches and > > > leaf > > > pages, and batches need to hold on to a leaf page buffer

Custom pgstat support performance regression for simple queries

2025-07-22 Thread Andres Freund
kinds, the overhead goes away almost completely. Greetings, Andres Freund [1] https://www.postgresql.org/message-id/aGKSzFlpQWSh%2F%2B2w%40ip-10-97-1-34.eu-west-3.compute.internal

Re: Adding wait events statistics

2025-07-22 Thread Andres Freund
collector, counting how often > > it sees certain wait events when sampling. > > Yeah but even if we are okay with losing "counters" by sampling, we'd still > not get > the duration. For the duration to be meaningful we also need the exact number > of counters. You don't need precise duration to see what wait events are a problem. If you see that some event is samples a lot you know it's because there either are a *lot* of those wait events or the wait events are entered into for a long time. Greetings, Andres Freund

Re: AIO v2.5

2025-07-22 Thread Andres Freund
Hi, On 2025-07-10 21:00:21 +0200, Matthias van de Meent wrote: > On Wed, 9 Jul 2025 at 16:59, Andres Freund wrote: > > On 2025-07-09 13:26:09 +0200, Matthias van de Meent wrote: > > > I've been going through the new AIO code as an effort to rebase and > > > ad

Re: Parallel heap vacuum

2025-07-21 Thread Andres Freund
, it just doesn't make sense for those callbacks to be at the level of tableam. If you want to make vacuumparallel support parallel table vacuuming for multiple table AMs (I'm somewhat doubtful that's a good idea), you could do that by having a vacuumparallel.c specific callback struct. Greetings, Andres Freund

Re: Non-reproducible AIO failure

2025-07-21 Thread Andres Freund
n't see what else I could do. RMT, note that there were two issues in this thread, the original report by Tom has been addressed (in e9a3615a522). I guess the best thing would be to split the open items entry into two? Greetings, Andres Freund [1] Rather impressed at how stable our test

Re: [PATCH] Let's get rid of the freelist and the buffer_strategy_lock

2025-07-21 Thread Andres Freund
Hi, On 2025-07-21 13:37:04 -0400, Greg Burd wrote: > On 7/18/25 13:03, Andres Freund wrote: > Hello.  Thanks again for taking the time to review the email and patch, > I think we're onto something good here. > > > > > I'd be curious if anybody wants to argue f

Re: index prefetching

2025-07-18 Thread Andres Freund
Hi, On 2025-07-18 17:44:26 -0400, Peter Geoghegan wrote: > On Fri, Jul 18, 2025 at 4:52 PM Andres Freund wrote: > > I don't agree with that. For efficiency reasons alone table AMs should get a > > whole batch of TIDs at once. If you have an ordered indexscan that retur

Re: Adding basic NUMA awareness

2025-07-18 Thread Andres Freund
Hi, On 2025-07-18 22:48:00 +0200, Tomas Vondra wrote: > On 7/18/25 18:46, Andres Freund wrote: > >> For a read-write pgbench I however saw some strange drops/increases of > >> throughput. I suspect this might be due to some thinko in the clocksweep > >> partiti

Re: index prefetching

2025-07-18 Thread Andres Freund
the way the visibilitymap, which it really has no business accessing, that's a heap specific thing. It also knows too much about different formats that can be stored by indexes, but that's kind of a separate issue. Greetings, Andres Freund

Re: libpq: Process buffered SSL read bytes to support records >8kB on async API

2025-07-18 Thread Andres Freund
Hi, On 2025-07-17 09:48:29 -0700, Jacob Champion wrote: > On Wed, Jul 16, 2025 at 4:35 PM Andres Freund wrote: > > Why do we care about not hitting the socket? We always operate the socket in > > non-blocking mode anyway? > > IIUC, that would change pqReadData() from

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-07-18 Thread Andres Freund
Hi, On 2025-06-30 19:42:51 -0400, Andres Freund wrote: > On 2025-07-01 00:52:49 +0200, Daniel Gustafsson wrote: > > > On 30 Jun 2025, at 20:33, Jacob Champion > > > wrote: > > > > > > On Mon, Jun 30, 2025 at 10:02 AM Daniel Gustafsson > > >

Re: [PATCH] Let's get rid of the freelist and the buffer_strategy_lock

2025-07-18 Thread Andres Freund
:13 -0400, Greg Burd wrote: > On Fri, Jul 11, 2025, at 2:52 PM, Andres Freund wrote: > > I think we'll likely need something to replace it. > > Fair, this (v5) patch doesn't yet try to address this. > > > TBH, I'm not convinced that autoprewarm using have

Re: Adding basic NUMA awareness

2025-07-18 Thread Andres Freund
more frequently when using a foreign partition. Another way would be to have bgwriter manage this. Whenever it detects that one ring is too far ahead, it could set a "avoid this partition" bit, which would trigger backends that natively use that partition to switch to foreign partitions that don't currently have that bit set. I suspect there's a problem with that approach though, I worry that the amount of time that bgwriter spends in BgBufferSync() may sometimes be too long, leading to too much imbalance. Greetings, Andres Freund

Re: Making type Datum be 8 bytes everywhere

2025-07-18 Thread Andres Freund
s. We probably > should at least try to measure that, though I'm not sure what > our threshold of pain would be for deciding not to do this. >From my POV the threshold would have to be rather high for backend code. Less so in libpq, but that's not affected here. Greetings, Andres Freund

Re: Adding wait events statistics

2025-07-18 Thread Andres Freund
wn c) for each callsite that is converted to the extended wait event, you either need to reason why the added overhead is ok, or do a careful experiment Personally I'd rather have an in-core sampling collector, counting how often it sees certain wait events when sampling. It then also can correlate those samples with other things on the system, e.g. by counting the number of active backends together with each sample. And eventually also memory usage etc. Greetings, Andres Freund

Re: simple patch for discussion

2025-07-17 Thread Andres Freund
ally with every additional parallel worker, but for things like seqscans that's really not true. I've seen reasonably-close-to-linear scalability for parallel seqscans up to 48 workers (the CPUs in the system I tested on). Given that our degree-of-parallelism logic doesn't really make sense. Greetings, Andres Freund

Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

2025-07-16 Thread Andres Freund
s I still think this would be a rather awesome improvement. > Open questions I have: > - Could we rely on checking whether the TSC timesource is invariant (via > CPUID), instead of relying on Linux choosing it as a clocksource? I don't see why not? Greetings, Andres Freund

Re: Use CLOCK_MONOTONIC_COARSE for instr_time when available

2025-07-16 Thread Andres Freund
Hi, On 2025-07-16 18:24:33 -0400, Tom Lane wrote: > ... BTW, another resource worth looking at is src/bin/pg_test_timing/ > which we just improved a few days ago [1]. What I see on two different > Linux-on-Intel boxes is that the loop time that that reports is 16 ns > and change, and the clock re

Re: libpq: Process buffered SSL read bytes to support records >8kB on async API

2025-07-16 Thread Andres Freund
Hi, On 2025-07-16 15:25:14 -0700, Jacob Champion wrote: > On Wed, Jul 16, 2025 at 2:34 PM Andres Freund wrote: > > > Based on my understanding of [1], readahead makes this overall problem > > > much worse by opportunistically slurping bytes off the wire and doing > >

Re: index prefetching

2025-07-16 Thread Andres Freund
Hi, On 2025-07-16 17:47:53 -0400, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 5:41 PM Andres Freund wrote: > > I don't mean the index tids, but how the read stream is fed block numbers. > > In > > the "complex" patch that's done by index_scan_stream

Re: index prefetching

2025-07-16 Thread Andres Freund
Hi, On 2025-07-16 17:27:23 -0400, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 4:46 PM Andres Freund wrote: > > Maybe I'm missing something, but the current interface doesn't seem to work > > for AMs that don't have a 1:1 mapping between the block number porti

Re: libpq: Process buffered SSL read bytes to support records >8kB on async API

2025-07-16 Thread Andres Freund
Hi, On 2025-07-16 11:50:46 -0700, Jacob Champion wrote: > On Wed, Jul 16, 2025 at 11:11 AM Andres Freund wrote: > > If one modifies libpq to use openssl readahead (which does result in > > speedups, > > because otherwise openssl think it's useful to do lots of 5 byte

Re: index prefetching

2025-07-16 Thread Andres Freund
Hi, On 2025-07-16 16:54:06 -0400, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 3:40 PM Andres Freund wrote: > > As a first thing I just wanted to get a feel for the improvements we can > > get. > > I had a scale 5 tpch already loaded, so I ran a bogus query on t

Re: index prefetching

2025-07-16 Thread Andres Freund
Hi, On 2025-07-16 15:39:58 -0400, Andres Freund wrote: > Looking at the actual patches now. I just did an initial, not particularly in depth look. A few comments and questions below. For either patch, I think it's high time we split the index/table buffer stats in index scans. It&

Re: index prefetching

2025-07-16 Thread Andres Freund
Hi, On 2025-07-16 14:30:05 -0400, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 2:27 PM Andres Freund wrote: > > Could you share the current version of the complex patch (happy with a git > > tree)? Afaict it hasn't been posted, which makes this pretty hard follow >

Re: index prefetching

2025-07-16 Thread Andres Freund
t it hasn't been posted, which makes this pretty hard follow along / provide feedback on, for others. Greetings, Andres Freund

Re: libpq: Process buffered SSL read bytes to support records >8kB on async API

2025-07-16 Thread Andres Freund
What are the limits for the maximum amount of data this could make us buffer in addition to what we are buffering right now? It's not entirely obvious to me that a loop around pqReadData() as long as there is pending data couldn't make us buffer a lot of data. Do you have a WIP patch? Greetings, Andres Freund

Re: Explicitly enable meson features in CI

2025-07-16 Thread Andres Freund
27;s a good idea to the Auto bit to the name. We have several special things about various tests, if we add all of them to the task name, we'll have very long task names. This one would already be Linux - Debian Bookworm - Meson Auto Features Detection - 32 and 64 Bit build & tests - Alignment, Undefined Behaviour Sanitizer - IO method=io_uring And the task names would change a lot more, which is also a pain for things like the commitfest / cfbot web apps. But it *should* be added to the "SPECIAL:" comment. Greetings, Andres Freund

Re: Read-Write optimistic lock (Re: sinvaladt.c: remove msgnumLock, use atomic operations on maxMsgNum)

2025-07-16 Thread Andres Freund
Hi, On 2025-07-16 18:27:45 +0300, Yura Sokolov wrote: > 16.07.2025 17:58, Andres Freund пишет: > >> Now, if I simply remove the spinlock in SIGetDataEntries, I see a drop of > >> just ~6% under concurrent DDL. I think this strongly suggests that the > >> spinlock is

Re: Changing shared_buffers without restart

2025-07-16 Thread Andres Freund
m may change. Resizing shared_buffers is particularly important because it's becoming more important to be able to dynamically increase/decrease the resources of a running postgres instance to adjust for system load. Memory and CPUs can be hot added/removed from VMs, but we need to utilize them... Greetings, Andres Freund

Re: Read-Write optimistic lock (Re: sinvaladt.c: remove msgnumLock, use atomic operations on maxMsgNum)

2025-07-16 Thread Andres Freund
Hi, On 2025-06-25 16:41:46 +0300, Sergey Shinderuk wrote: > On 16.06.2025 17:41, Andres Freund wrote: > > TBH, I don't see a point in continuing with this thread without something > > that > > others can test. I rather doubt that the right fix here is to just change &

Re: index prefetching

2025-07-16 Thread Andres Freund
ing (not much, but it's clearly visible for cached queries). This imo isn't something worth optimizing for - if you use an io_method that actually can execute IO asynchronously this issue does not exist, as the start of the IO will already have populated the buffer entry (without BM_VALID set, of course). Thus we won't start another IO for that block. Greetings, Andres Freund

Re: Interrupts vs signals

2025-07-15 Thread Andres Freund
their own interrupt ids. For 2), I wonder if we ought to have a global mask of interrupt kinds that can be processed in some context. Instead of having INTERRUPT_CFI_MASK() compute what mask to use, we could have things like HOLD_CANCEL_INTERRUPTS be defined as something like if (InterruptHoldCount[CANCEL]++ == 0) InterruptMask &= ~CANCEL; which would allow CHECK_FOR_INTERRUPTS to just use InterruptMask to check for to-be-processed interrupts. Greetings, Andres Freund

Re: Interrupts vs signals

2025-07-15 Thread Andres Freund
| INTERRUPT_GENERAL, ...); > * } > * > * It's important to clear the interrupt *before* checking if there's work to > - * do. Otherwise, if someone sets the interrupt between the check and the > + * do. Otherwise, if someone sets the interrupt between the check and the > * ClearInterrupt() call, you will miss it and Wait will incorrectly block. Isn't the change to move CHECK_FOR_INTERRUPTS() before ClearInterrupt() violating what the paragraph explains? > /* > * Flags in the pending interrupts bitmask. Each value represents one bit in > * the bitmask. > */ > -typedef enum > +typedef enum InterruptType > { I'm rather concerned about the number of interrupt bits we've already consume. I'll respond about that in a separate, higher-level, email. > /* > - * Clear an interrupt flag. > + * Clear an interrupt flag (or flags). > */ > static inline void > ClearInterrupt(uint32 interruptMask) > { > pg_atomic_fetch_and_u32(MyPendingInterrupts, ~interruptMask); > + pg_write_barrier(); > } pg_atomic_fetch_and_u32 is a full barrier, no separate barrier needed. > #endif > diff --git a/src/include/postmaster/startup.h > b/src/include/postmaster/startup.h > index 158f52255a6..a0316202b95 100644 > --- a/src/include/postmaster/startup.h > +++ b/src/include/postmaster/startup.h > @@ -25,6 +25,14 @@ > > extern PGDLLIMPORT int log_startup_progress_interval; > > +/* The set of interrupts that are processed by ProcessStartupProcInterrupts > */ > +#define INTERRUPT_STARTUP_PROC_MASK ( \ > + INTERRUPT_BARRIER | > \ > + INTERRUPT_DIE | > \ > + INTERRUPT_LOG_MEMORY_CONTEXT | \ > + INTERRUPT_CONFIG_RELOAD \ > + ) Somehow I find this name a bit confusing, the first parse attempt ends up with PROC_MASK as one of the components of the name. How about INTERRUPT_MASK_STARTUP[_PROC]? Greetings, Andres Freund

Re: Improving and extending int128.h to more of numeric.c

2025-07-15 Thread Andres Freund
nings the CompilerWarnings task will fail. It's "just" the 32bit build and msvc windows builds that currently don't... There was a patch adding it for the msvc build at some point, but ... Greetings, Andres Freund

Re: Make COPY format extendable: Extract COPY TO format implementations

2025-07-15 Thread Andres Freund
not followed the development of this patch - but I continue to be concerned about the performance impact it has as-is and the amount of COPY performance improvements it forecloses. This seems to add yet another layer of indirection to a lot of hot functions like CopyGetData() etc. Greetings, Andres Freund

Re: AIO v2.5

2025-07-14 Thread Andres Freund
royed all the evidence. Besides differences in filesystem level fragmentation, another potential theory is that the SSDs were internally more fragmented. Occasionally dumping/restoring the data could allow the drive to do internal wear leveling before the new data is loaded, leading to a better layout. I found that I get more consistent benchmark performance if I delete as much of the data as possible, run fstrim -v -a and then load the data. And do another round of fstrim. Greetings, Andres Freund

Re: AIO v2.5

2025-07-14 Thread Andres Freund
st interesting thing would be some runs with cloud-ish storage (relatively high iops, very high latency)... > The repository also has branches with plots showing results with WIP > indexscan prefetching. (It's excluded from the PDFs I presented here). Hm, I looked for those, but I couldn't quickly find any plots that include them. Would I have to generate the plots from a checkout of the repo? > The conclusions are similar to what we found here - "worker" is good > with enough workers, io_uring is good too. Sync has issues for some of > the data sets, but still helps a lot. Nice. Greetings, Andres Freund

Re: Changing shared_buffers without restart

2025-07-14 Thread Andres Freund
Hi, On July 14, 2025 10:39:33 AM EDT, Dmitry Dolgov <9erthali...@gmail.com> wrote: >> On Mon, Jul 14, 2025 at 10:23:23AM -0400, Andres Freund wrote: >> > Those steps are separated in time, and I'm currently trying to understand >> > what are the consequences of

Re: Changing shared_buffers without restart

2025-07-14 Thread Andres Freund
Hi, On 2025-07-14 16:01:50 +0200, Dmitry Dolgov wrote: > > On Mon, Jul 14, 2025 at 09:42:46AM -0400, Andres Freund wrote: > > What on earth would be the point of putting a buffer on the freelist but not > > make it reachable by the clock sweep? To me that's just nonse

Re: Changing shared_buffers without restart

2025-07-14 Thread Andres Freund
Hi, On 2025-07-14 15:20:03 +0200, Dmitry Dolgov wrote: > > On Mon, Jul 14, 2025 at 09:14:26AM -0400, Andres Freund wrote: > > > > Clock sweep can find any buffer, independent of whether it's on the > > > > freelist. > > > > > > It does t

Re: Changing shared_buffers without restart

2025-07-14 Thread Andres Freund
Hi, On 2025-07-14 15:08:28 +0200, Dmitry Dolgov wrote: > > On Mon, Jul 14, 2025 at 08:56:56AM -0400, Andres Freund wrote: > > > Ah, I see what you mean folks. But I'm talking here only about buffers > > > which will be allocated after extending shared memory -- th

Re: Changing shared_buffers without restart

2025-07-14 Thread Andres Freund
Hi, On 2025-07-14 11:32:25 +0200, Dmitry Dolgov wrote: > > On Mon, Jul 14, 2025 at 10:24:50AM +0100, Thom Brown wrote: > > On Mon, 14 Jul 2025, 09:54 Dmitry Dolgov, <9erthali...@gmail.com> wrote: > > > > > > On Mon, Jul 14, 2025 at 01:55:39PM +0530, Ashutosh Bapat wrote: > > > > > You're right of

Re: patch: Use pg_assume in jsonb_util.c to fix GCC 15 warnings

2025-07-12 Thread Andres Freund
we could > do that, but after reflection I think the best way is to modify > JsonbIteratorNext to make that guarantee. I've checked that > the attached silences the warning on gcc 15.1.1 (current > Fedora 42). WFM. Greetings, Andres Freund

Re: TransactionIdIsActive() has long been unused

2025-07-12 Thread Andres Freund
Hi, On 2025-07-10 09:52:45 +0900, Michael Paquier wrote: > On Wed, Jul 09, 2025 at 03:46:43PM -0400, Tom Lane wrote: > > Andres Freund writes: > >> Seems like we should just remove TransactionIdIsActive()? > > > > +1. I wondered if any extensions might depend on

Re: [PATCH] Let's get rid of the freelist and the buffer_strategy_lock

2025-07-11 Thread Andres Freund
27;);") -t1 (with c=1 for the single-threaded case obviously) The reason for the pg_shmem_allocations_numa is to ensure that shared_buffers is actually mapped, as otherwise the bottleneck will be the kernel zeroing out buffers. The reason for doing -t1 is that I wanted to compare freelist vs clock sweep, rather than clock sweep in general. Note that I patched EvictUnpinnedBufferInternal() to call StrategyFreeBuffer(), otherwise running this a second time won't actually measure the freelist. And the first time run after postmaster start will always be more noisy... Greetings, Andres Freund

Re: Adding basic NUMA awareness

2025-07-11 Thread Andres Freund
Hi, On 2025-07-10 14:17:21 +, Bertrand Drouvot wrote: > On Wed, Jul 09, 2025 at 03:42:26PM -0400, Andres Freund wrote: > > I wonder if we should *increase* the size of shared_buffers whenever huge > > pages are in use and there's padding space due to the huge page &g

Re: Adding basic NUMA awareness

2025-07-11 Thread Andres Freund
Hi, On 2025-07-10 17:31:45 +0200, Tomas Vondra wrote: > On 7/9/25 19:23, Andres Freund wrote: > > There's other things around this that could use some attention. It's not > > hard > > to see clock sweep be a bottleneck in concurrent workloads - partially due >

Re: Adding wait events statistics

2025-07-11 Thread Andres Freund
ion as much as possible > > Does that sound reasonable to you? That does seem like the minimum. Unfortunately I'm rather doubtful this provides enough value to be worth the cost. But I'm rather willing to be proven wrong. Greetings, Andres Freund

Re: Some ExecSeqScan optimizations

2025-07-11 Thread Andres Freund
m. That means that you can't just evaluate the whole predicate using ScanKeys. 3) ScanKey evaluation is actually sometimes *more* expensive than expression evaluation, because the columns are deformed one-by-one. Greetings, Andres Freund

Re: Some ExecSeqScan optimizations

2025-07-11 Thread Andres Freund
Hi, On 2025-07-11 11:22:36 +0900, Amit Langote wrote: > On Fri, Jul 11, 2025 at 5:55 AM Andres Freund wrote: > > On 2025-07-10 17:28:50 +0900, Amit Langote wrote: > > > Thanks for the patch. > > > > > > +/* > > > + * Use pg_assume() for

Re: Using ASSUME in place of ASSERT in non-assert builds

2025-07-10 Thread Andres Freund
me() if the release build should be influenced. > Was this strategy considered before introducing pg_assume, or did I miss > that part of the discussion? No, it wasn't. It seemed like a rather obviously bad idea to me, and the primary motivation in this case really was to get rid of warnings like the one addressed in te subsequent commit. Greetings, Andres Freund

Re: Some ExecSeqScan optimizations

2025-07-10 Thread Andres Freund
Hi, On 2025-07-10 17:28:50 +0900, Amit Langote wrote: > On Thu, Jul 10, 2025 at 8:34 AM Andres Freund wrote: > > On 2025-01-22 10:07:51 +0900, Amit Langote wrote: > > > On Fri, Jan 17, 2025 at 2:05 PM Amit Langote > > > wrote: > > > > Her

Re: AIO v2.5

2025-07-10 Thread Andres Freund
Hi, On 2025-07-10 21:00:21 +0200, Matthias van de Meent wrote: > On Wed, 9 Jul 2025 at 16:59, Andres Freund wrote: > > > 3. I noticed that there is AIO code for writev-related operations > > > (specifically, pgaio_io_start_writev is exposed, as is > > > PGAIO_OP_WR

Re: Some ExecSeqScan optimizations

2025-07-09 Thread Andres Freund
a small gain by avoiding that. Greetings, Andres Freund >From a443d7dc6419a5648b10bbd900acf2fc745255b4 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Wed, 9 Jul 2025 19:27:19 -0400 Subject: [PATCH v1] Optimize seqscan code generation using pg_assume() Discussion: https://postgr.es/m/CA+

Re: gcc 15 "array subscript 0" warning at level -O3

2025-07-09 Thread Andres Freund
Hi, On 2025-07-02 22:13:17 +0800, jian he wrote: > On Thu, Jun 5, 2025 at 3:00 AM Andres Freund wrote: > > I've been once more annoyed by this warning. Here's a prototype for the > > approach outlined above. > > > > I can confirm the warning disappears when

Re: gcc 15 "array subscript 0" warning at level -O3

2025-07-09 Thread Andres Freund
Hi, On 2025-06-05 15:50:48 -0400, Tom Lane wrote: > Andres Freund writes: > >> I've been wondering about adding wrapping something like that in a > >> pg_assume(expr) or such. > > > I've been once more annoyed by this warning. Here's a prototype fo

Re: C11 / VS 2019

2025-07-09 Thread Andres Freund
(/Zc:preprocessor, introduced in VS 2019 v16.6). Which seems likely to describe precisely what we're seeing? Greetings, Andres Freund PS: Wonder if we should make the SDK version visible in meson setup...

Re: Adding basic NUMA awareness

2025-07-09 Thread Andres Freund
en we could replace the modulo with a mask, which > is > + * likely more efficient. > + */ > + switch (numa_partition_freelist) > + { > + case FREELIST_PARTITION_CPU: > + freelist_idx = cpu % strategy_ncpus; As mentioned earlier, modulo is rather expensive for something executed so frequently... > + break; > + > + case FREELIST_PARTITION_NODE: > + freelist_idx = node % strategy_nnodes; > + break; Here we shouldn't need modulo, right? > + > + case FREELIST_PARTITION_PID: > + freelist_idx = MyProcPid % strategy_ncpus; > + break; > + > + default: > + elog(ERROR, "unknown freelist partitioning value"); > + } > + > + return &StrategyControl->freelists[freelist_idx]; > +} > /* size of lookup hash table ... see comment in StrategyInitialize */ > size = add_size(size, BufTableShmemSize(NBuffers + > NUM_BUFFER_PARTITIONS)); > > /* size of the shared replacement strategy control block */ > - size = add_size(size, MAXALIGN(sizeof(BufferStrategyControl))); > + size = add_size(size, MAXALIGN(offsetof(BufferStrategyControl, > freelists))); > + > + /* > + * Allocate one frelist per CPU. We might use per-node freelists, but > the > + * assumption is the number of CPUs is less than number of NUMA nodes. > + * > + * FIXME This assumes the we have more CPUs than NUMA nodes, which seems > + * like a safe assumption. But maybe we should calculate how many > elements > + * we actually need, depending on the GUC? Not a huge amount of memory. FWIW, I don't think that's a safe assumption anymore. With CXL we can get a) PCIe attached memory and b) remote memory as a separate NUMA nodes, and that very well could end up as more NUMA nodes than cores. Ugh, -ETOOLONG. Gotta schedule some other things... Greetings, Andres Freund

TransactionIdIsActive() has long been unused

2025-07-09 Thread Andres Freund
37:50 +0300 Fix race condition in preparing a transaction for two-phase commit. Seems like we should just remove TransactionIdIsActive()? Greetings, Andres Freund

Re: Adding basic NUMA awareness - Preliminary feedback and outline for an extensible approach

2025-07-09 Thread Andres Freund
argue for adding extensibility, we can do that at this stage. Trying to design this for extensibility from the get go, where that extensibility is very unlikely to be used widely, seems rather likely to just tank this entire project without getting us anything in return. Greetings, Andres Freund

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-07-09 Thread Andres Freund
Hi, On 2025-07-09 13:36:26 -0400, Tom Lane wrote: > It doesn't look like the Meson support needs such explicit tracking of > required libraries, but perhaps I'm missing something? It should be fine, -ldl is added to "os_deps" if needed, and os_deps is used for all code

Re: Improving and extending int128.h to more of numeric.c

2025-07-09 Thread Andres Freund
r testsuite... Having testcode that is not run automatically may be helpful while originally developing something, but it doesn't do anything to detect portability issues or regressions. Greetings, Andres Freund

Re: Adding basic NUMA awareness

2025-07-09 Thread Andres Freund
Hi, On 2025-07-09 12:55:51 -0400, Greg Burd wrote: > On Jul 9 2025, at 12:35 pm, Andres Freund wrote: > > > FWIW, I've started to wonder if we shouldn't just get rid of the freelist > > entirely. While clocksweep is perhaps minutely slower in a single > > t

Re: Adding basic NUMA awareness

2025-07-09 Thread Andres Freund
Hi, On 2025-07-09 12:04:00 +0200, Jakub Wartak wrote: > On Tue, Jul 8, 2025 at 2:56 PM Andres Freund wrote: > > On 2025-07-08 14:27:12 +0200, Tomas Vondra wrote: > > > On 7/8/25 05:04, Andres Freund wrote: > > > > On 2025-07-04 13:05:05 +0200, Jakub Wartak wrote:

Re: Adding basic NUMA awareness

2025-07-09 Thread Andres Freund
ink it's worth favoring clock sweep. Also needing to switch between getting buffers from the freelist and the sweep makes the code more expensive. I think just having the buffer in the sweep, with a refcount / usagecount of zero would suffice. That seems particularly advantageous if w

Re: ABI Compliance Checker GSoC Project

2025-07-09 Thread Andres Freund
ce if we could get there, but it'd require annotating *all* intentionally exported functions in the backend with PGDLLIMPORT (rather than just doing that for variables). Then we could make some symbols *intentionally* not exported, which can improve the code generation (allowing more compiler and linker optimizations). Greetings, Andres Freund

  1   2   3   4   5   6   7   8   9   10   >