0.00 0.000.00
0.000.000.000.69 63.80
Note the different read sizes...
> I did look into pg_aios, but there's only 8kB requests in both cases. I
> didn't have time to look closer yet.
That's what we'd expect, right? There's nothing on master that'd perform read
combining for index scans...
Greetings,
Andres Freund
e failure, or at least the cfbot is showing a red
> column at the moment.
See
https://postgr.es/m/CAN55FZ1RuBhJmPWs3Oi%3D9UoezDfrtO-VaU67db5%2B0_uy19uF%2BA%40mail.gmail.com
Greetings,
Andres Freund
Hi,
On 2025-08-11 16:30:30 -0700, Jacob Champion wrote:
> On Mon, Aug 11, 2025 at 3:52 PM Andres Freund wrote:
> > And the warning is right. Not sure why a new compiler is needed, IIRC this
> > warning is present in other cases with older compilers too.
>
> Probably
>
reate
> PGPROC partitions only for those)? I suppose that requires literally
> walking all the nodes.
I didn't think of numa_node_of_cpu().
As long as numa_node_of_cpu() returns *something* I think it may be good
enough. Nobody uses an RPi for high-throughput postgres workloads with a lot
of memory. Slightly sub-optimal mappings should really not matter.
I'm kinda wondering if we should deal with such fake numa systems by detecting
them and disabling our numa support.
Greetings,
Andres Freund
────┴────┘
(3 rows)
Greetings,
Andres Freund
iler is needed, IIRC this
warning is present in other cases with older compilers too.
The most obvious fix is to slap on a PG_USED_FOR_ASSERTS_ONLY. However, we so
far don't seem to have used it for function parameters... But I don't see a
problem with starting to do so.
Greetings,
Andres Freund
Hi,
On 2025-07-11 11:22:36 +0900, Amit Langote wrote:
> On Fri, Jul 11, 2025 at 5:55 AM Andres Freund wrote:
> > On 2025-07-10 17:28:50 +0900, Amit Langote wrote:
> > > On Thu, Jul 10, 2025 at 8:34 AM Andres Freund wrote:
> > > > The performance gain unsurprisingl
Hi,
On 2025-08-11 14:40:40 +0300, Nazir Bilal Yavuz wrote:
> Thank you for working on this!
Thanks for the review - pushed.
Greetings,
Andres Freund
yc too, if "L3 LLC as
NUMA" is enabled.
> I'm not sure what to do about this (or how getcpu() or libnuma handle this).
I don't immediately see any libnuma functions that would care?
I also am somewhat curious about what getcpu() returns for the current node...
Greetings,
Andres Freund
l is somewhat expensive.
Greetings,
Andres Freund
>From d845c0d56a0357730a7ec398cd77c6a1ada392fa Mon Sep 17 00:00:00 2001
From: Andres Freund
Date: Fri, 8 Aug 2025 19:49:23 -0400
Subject: [PATCH v2] meson: add and use stamp files for generated headers
Without using stamp files, meson lists the g
ot;strange" combinations of parameters, looking for
> weird behaviors like that.
I'm just catching up: Isn't it a bit early to focus this much on testing? ISMT
that the patchsets for both approaches currently have some known architectural
issues and that addressing them seems likely to change their performance
characteristics.
Greetings,
Andres Freund
It's possible to do this by globing for files at configure time, but that
wouldn't detect adding new headers (which would need to trigger a
re-configure). Whether that's an issue worth caring about I'm a bit on the
fence about.
Greetings,
Andres Freund
istake to introduce support for granular resets, we
shouldn't bury ourselves deeper. If anything we should rip out everything
other than 1) a global reset b) a per-database reset.
Leaving that aside, I just don't see a convincing use case for returning the
timestamp here.
Greetings,
Andres Freund
Hi,
On 2025-08-08 18:28:09 -0400, Andres Freund wrote:
> > From 6574ac9267fe9938f59ed67c8f0282716d8c28f3 Mon Sep 17 00:00:00 2001
> > From: Thomas Munro
> > Date: Sun, 3 Aug 2025 00:15:01 +1200
> > Subject: [PATCH v1 3/4] aio: Support I/O methods without true vectore
_completion_queue() to give up
> + * early since this backend can process its own queue promptly and
> efficiently.
> + */
> +static void
> +pgaio_posix_aio_ipc_acquire_own_completion_lock(PgAioPosixAioContext
> *context)
> +{
> + Assert(context == pgaio_my_posix_aio_context);
> + Assert(!LWLockHeldByMe(&context->completion_lock));
> +
> + if (!LWLockConditionalAcquire(&context->completion_lock, LW_EXCLUSIVE))
> + {
> + ProcNumber procno;
> +
> + procno = pg_atomic_exchange_u32(&context->ipc_procno,
> MyProcNumber);
> + if (procno != INVALID_PROC_NUMBER)
> + SetLatch(&GetPGProcByNumber(procno)->procLatch);
> +
> + LWLockAcquire(&context->completion_lock, LW_EXCLUSIVE);
> + pg_atomic_write_u32(&context->ipc_procno, INVALID_PROC_NUMBER);
> + }
> +}
Is the "command pgaio_posix_aio_ipc_drain_completion_queue() to give up" path
frequent enough to be worth the complexity? I somewhat doubt so?
Greetings,
Andres Freund
README should do the trick, I'll go
> investigate that.
FWIW, you can trigger manual tasks in the cirrus-ci web-interface.
Greetings,
Andres Freund
A large portion of the cases I've seen where toast ID assignments were a
problem were when the global OID counter wrapped around due to activity on
*other* tables (and/or temporary table creation). If you instead had a
per-toast-table sequence for assigning chunk IDs, that problem would largely
vanish.
With 64bit toast IDs we shouldn't need to search the index for a
non-conflicting toast IDs, there can't be wraparounds (we'd hit wraparound of
LSNs well before that and that's not practically reachable).
Greetings,
Andres Freund
On 2025-07-28 08:18:01 +0900, Michael Paquier wrote:
> I have used that and applied it down to v18, closing the open item.
Thanks!
LTRUE(entry->key))
> + else if (!LTG_ISALLTRUE(entry->key.value))
This should be DatumGet*(), no?
> diff --git a/contrib/sepgsql/label.c b/contrib/sepgsql/label.c
> index 996ce174454..5d57563ecb7 100644
> --- a/contrib/sepgsql/label.c
> +++ b/contrib/sepgsql/label.c
> @@ -330,7 +330,7 @@ sepgsql_fmgr_hook(FmgrHookEventType event,
> stack = palloc(sizeof(*stack));
> stack->old_label = NULL;
> stack->new_label =
> sepgsql_avc_trusted_proc(flinfo->fn_oid);
> - stack->next_private = 0;
> + stack->next_private.value = 0;
>
> MemoryContextSwitchTo(oldcxt);
Probably should use DummyDatum.
Greetings,
Andres Freund
Hi,
On 2025-08-05 19:20:20 +0200, Peter Eisentraut wrote:
> On 31.07.25 19:17, Tom Lane wrote:
> > Also I see a "// XXX" in pg_get_aios, which I guess is a note
> > to confirm the data type to use for ioh_id?
>
> Yes, the stuff returned from pgaio_io_get_id() should be int, but some code
> uses u
of days before
> getting down to it.
I don't really get what the point of designing that mechanism is before we
have a usecase. If we need it, we can expand it at that time.
Greetings,
Andres Freund
cific changes, so I
guess "... all good" covers it...
Greetings,
Andres Freund
or
VERBOSE and once without. That's not exactly a free lunch...
Greetings,
Andres Freund
Hi,
On 2025-07-18 13:24:32 -0400, Tom Lane wrote:
> Andres Freund writes:
> > On 2025-07-17 20:09:57 -0400, Tom Lane wrote:
> >> I made it just as a proof-of-concept that this can work. It compiled
> >> cleanly and passed check-world for me on a 32-bit FreeBSD im
te and IndexScanInstrumentation seems to be
pre-destined for that information. But it seems a a bit too much memory to
just keep a BufferUsage around even when analyze isn't used.
Greetings,
Andres Freund
PS: Another thing that I think we ought to track is the number of fetches from
the table
?
Yes, that might make sense. But wiring it up via tableam doesn't make sense.
Greetings,
Andres Freund
Hi,
On 2025-07-23 09:54:12 +0900, Michael Paquier wrote:
> On Tue, Jul 22, 2025 at 10:57:06AM -0400, Andres Freund wrote:
> > It seems rather unsurprising that that causes a slowdown.
> >
> > The pre-check is there to:
> > /* Don't expend a clock check if n
; pending anymore (when flushing) without saying "all the stats have nothing
> pending" (while some may still have pending stats)?
I don't think that's a problem - reset that global flag after checking it at
the start of pgstat_report_stat() and set it to true if partial_flush is true
at the end of pgstat_report_stat().
Greetings,
Andres Freund
Hi,
On 2025-07-23 14:50:15 +0200, Tomas Vondra wrote:
> On 7/23/25 02:59, Andres Freund wrote:
> > Hi,
> >
> > On 2025-07-23 02:50:04 +0200, Tomas Vondra wrote:
> >> But I don't see why would this have any effect on the prefetch distance,
>
ld be bug, of course. But it'd be helpful to see the dataset/query.
Pgbench scale 500, with the simpler query from my message.
Greetings,
Andres Freund
Hi,
On 2025-07-22 19:13:23 -0400, Peter Geoghegan wrote:
> On Tue, Jul 22, 2025 at 6:53 PM Andres Freund wrote:
> > That may be true with local fast NVMe disks, but won't be true for networked
> > storage like in common clouds. Latencies of 0.3 - 4ms leave a lot of CPU
&
eing
prefetched. Currently the behaviour in that case is to synchronously wait for
IO on that buffer to complete. That obviously causes a "pipeline bubble"...
Greetings,
Andres Freund
Hi,
On 2025-07-18 23:25:38 -0400, Peter Geoghegan wrote:
> On Fri, Jul 18, 2025 at 10:47 PM Andres Freund wrote:
> > > (Within an index AM, there is a 1:1 correspondence between batches and
> > > leaf
> > > pages, and batches need to hold on to a leaf page buffer
kinds, the overhead
goes away almost completely.
Greetings,
Andres Freund
[1]
https://www.postgresql.org/message-id/aGKSzFlpQWSh%2F%2B2w%40ip-10-97-1-34.eu-west-3.compute.internal
collector, counting how often
> > it sees certain wait events when sampling.
>
> Yeah but even if we are okay with losing "counters" by sampling, we'd still
> not get
> the duration. For the duration to be meaningful we also need the exact number
> of counters.
You don't need precise duration to see what wait events are a problem. If you
see that some event is samples a lot you know it's because there either are a
*lot* of those wait events or the wait events are entered into for a long
time.
Greetings,
Andres Freund
Hi,
On 2025-07-10 21:00:21 +0200, Matthias van de Meent wrote:
> On Wed, 9 Jul 2025 at 16:59, Andres Freund wrote:
> > On 2025-07-09 13:26:09 +0200, Matthias van de Meent wrote:
> > > I've been going through the new AIO code as an effort to rebase and
> > > ad
, it just doesn't make
sense for those callbacks to be at the level of tableam. If you want to make
vacuumparallel support parallel table vacuuming for multiple table AMs (I'm
somewhat doubtful that's a good idea), you could do that by having a
vacuumparallel.c specific callback struct.
Greetings,
Andres Freund
n't see what else I
could do.
RMT, note that there were two issues in this thread, the original report by
Tom has been addressed (in e9a3615a522). I guess the best thing would be to
split the open items entry into two?
Greetings,
Andres Freund
[1] Rather impressed at how stable our test
Hi,
On 2025-07-21 13:37:04 -0400, Greg Burd wrote:
> On 7/18/25 13:03, Andres Freund wrote:
> Hello. Thanks again for taking the time to review the email and patch,
> I think we're onto something good here.
>
> >
> > I'd be curious if anybody wants to argue f
Hi,
On 2025-07-18 17:44:26 -0400, Peter Geoghegan wrote:
> On Fri, Jul 18, 2025 at 4:52 PM Andres Freund wrote:
> > I don't agree with that. For efficiency reasons alone table AMs should get a
> > whole batch of TIDs at once. If you have an ordered indexscan that retur
Hi,
On 2025-07-18 22:48:00 +0200, Tomas Vondra wrote:
> On 7/18/25 18:46, Andres Freund wrote:
> >> For a read-write pgbench I however saw some strange drops/increases of
> >> throughput. I suspect this might be due to some thinko in the clocksweep
> >> partiti
the way the visibilitymap,
which it really has no business accessing, that's a heap specific thing. It
also knows too much about different formats that can be stored by indexes, but
that's kind of a separate issue.
Greetings,
Andres Freund
Hi,
On 2025-07-17 09:48:29 -0700, Jacob Champion wrote:
> On Wed, Jul 16, 2025 at 4:35 PM Andres Freund wrote:
> > Why do we care about not hitting the socket? We always operate the socket in
> > non-blocking mode anyway?
>
> IIUC, that would change pqReadData() from
Hi,
On 2025-06-30 19:42:51 -0400, Andres Freund wrote:
> On 2025-07-01 00:52:49 +0200, Daniel Gustafsson wrote:
> > > On 30 Jun 2025, at 20:33, Jacob Champion
> > > wrote:
> > >
> > > On Mon, Jun 30, 2025 at 10:02 AM Daniel Gustafsson
> > >
:13 -0400, Greg Burd wrote:
> On Fri, Jul 11, 2025, at 2:52 PM, Andres Freund wrote:
> > I think we'll likely need something to replace it.
>
> Fair, this (v5) patch doesn't yet try to address this.
>
> > TBH, I'm not convinced that autoprewarm using have
more frequently when using a foreign partition.
Another way would be to have bgwriter manage this. Whenever it detects that
one ring is too far ahead, it could set a "avoid this partition" bit, which
would trigger backends that natively use that partition to switch to foreign
partitions that don't currently have that bit set. I suspect there's a
problem with that approach though, I worry that the amount of time that
bgwriter spends in BgBufferSync() may sometimes be too long, leading to too
much imbalance.
Greetings,
Andres Freund
s. We probably
> should at least try to measure that, though I'm not sure what
> our threshold of pain would be for deciding not to do this.
>From my POV the threshold would have to be rather high for backend code. Less
so in libpq, but that's not affected here.
Greetings,
Andres Freund
wn
c) for each callsite that is converted to the extended wait event, you either
need to reason why the added overhead is ok, or do a careful experiment
Personally I'd rather have an in-core sampling collector, counting how often
it sees certain wait events when sampling. It then also can correlate those
samples with other things on the system, e.g. by counting the number of active
backends together with each sample. And eventually also memory usage etc.
Greetings,
Andres Freund
ally with every additional parallel worker, but for things like
seqscans that's really not true. I've seen reasonably-close-to-linear
scalability for parallel seqscans up to 48 workers (the CPUs in the system I
tested on). Given that our degree-of-parallelism logic doesn't really make
sense.
Greetings,
Andres Freund
s
I still think this would be a rather awesome improvement.
> Open questions I have:
> - Could we rely on checking whether the TSC timesource is invariant (via
> CPUID), instead of relying on Linux choosing it as a clocksource?
I don't see why not?
Greetings,
Andres Freund
Hi,
On 2025-07-16 18:24:33 -0400, Tom Lane wrote:
> ... BTW, another resource worth looking at is src/bin/pg_test_timing/
> which we just improved a few days ago [1]. What I see on two different
> Linux-on-Intel boxes is that the loop time that that reports is 16 ns
> and change, and the clock re
Hi,
On 2025-07-16 15:25:14 -0700, Jacob Champion wrote:
> On Wed, Jul 16, 2025 at 2:34 PM Andres Freund wrote:
> > > Based on my understanding of [1], readahead makes this overall problem
> > > much worse by opportunistically slurping bytes off the wire and doing
> >
Hi,
On 2025-07-16 17:47:53 -0400, Peter Geoghegan wrote:
> On Wed, Jul 16, 2025 at 5:41 PM Andres Freund wrote:
> > I don't mean the index tids, but how the read stream is fed block numbers.
> > In
> > the "complex" patch that's done by index_scan_stream
Hi,
On 2025-07-16 17:27:23 -0400, Peter Geoghegan wrote:
> On Wed, Jul 16, 2025 at 4:46 PM Andres Freund wrote:
> > Maybe I'm missing something, but the current interface doesn't seem to work
> > for AMs that don't have a 1:1 mapping between the block number porti
Hi,
On 2025-07-16 11:50:46 -0700, Jacob Champion wrote:
> On Wed, Jul 16, 2025 at 11:11 AM Andres Freund wrote:
> > If one modifies libpq to use openssl readahead (which does result in
> > speedups,
> > because otherwise openssl think it's useful to do lots of 5 byte
Hi,
On 2025-07-16 16:54:06 -0400, Peter Geoghegan wrote:
> On Wed, Jul 16, 2025 at 3:40 PM Andres Freund wrote:
> > As a first thing I just wanted to get a feel for the improvements we can
> > get.
> > I had a scale 5 tpch already loaded, so I ran a bogus query on t
Hi,
On 2025-07-16 15:39:58 -0400, Andres Freund wrote:
> Looking at the actual patches now.
I just did an initial, not particularly in depth look. A few comments and
questions below.
For either patch, I think it's high time we split the index/table buffer stats
in index scans. It&
Hi,
On 2025-07-16 14:30:05 -0400, Peter Geoghegan wrote:
> On Wed, Jul 16, 2025 at 2:27 PM Andres Freund wrote:
> > Could you share the current version of the complex patch (happy with a git
> > tree)? Afaict it hasn't been posted, which makes this pretty hard follow
>
t it hasn't been posted, which makes this pretty hard follow along
/ provide feedback on, for others.
Greetings,
Andres Freund
What are the limits for the maximum amount of data this could make us buffer
in addition to what we are buffering right now? It's not entirely obvious to
me that a loop around pqReadData() as long as there is pending data couldn't
make us buffer a lot of data.
Do you have a WIP patch?
Greetings,
Andres Freund
27;s a good idea to the Auto bit to the name. We have
several special things about various tests, if we add all of them to the task
name, we'll have very long task names. This one would already be
Linux - Debian Bookworm - Meson Auto Features Detection - 32 and 64 Bit build &
tests - Alignment, Undefined Behaviour Sanitizer - IO method=io_uring
And the task names would change a lot more, which is also a pain for things
like the commitfest / cfbot web apps.
But it *should* be added to the "SPECIAL:" comment.
Greetings,
Andres Freund
Hi,
On 2025-07-16 18:27:45 +0300, Yura Sokolov wrote:
> 16.07.2025 17:58, Andres Freund пишет:
> >> Now, if I simply remove the spinlock in SIGetDataEntries, I see a drop of
> >> just ~6% under concurrent DDL. I think this strongly suggests that the
> >> spinlock is
m may change.
Resizing shared_buffers is particularly important because it's becoming more
important to be able to dynamically increase/decrease the resources of a
running postgres instance to adjust for system load. Memory and CPUs can be
hot added/removed from VMs, but we need to utilize them...
Greetings,
Andres Freund
Hi,
On 2025-06-25 16:41:46 +0300, Sergey Shinderuk wrote:
> On 16.06.2025 17:41, Andres Freund wrote:
> > TBH, I don't see a point in continuing with this thread without something
> > that
> > others can test. I rather doubt that the right fix here is to just change
&
ing (not much, but it's clearly visible for cached queries).
This imo isn't something worth optimizing for - if you use an io_method that
actually can execute IO asynchronously this issue does not exist, as the start
of the IO will already have populated the buffer entry (without BM_VALID set,
of course). Thus we won't start another IO for that block.
Greetings,
Andres Freund
their own interrupt ids.
For 2), I wonder if we ought to have a global mask of interrupt kinds that can
be processed in some context. Instead of having INTERRUPT_CFI_MASK() compute
what mask to use, we could have things like HOLD_CANCEL_INTERRUPTS be defined
as something like
if (InterruptHoldCount[CANCEL]++ == 0)
InterruptMask &= ~CANCEL;
which would allow CHECK_FOR_INTERRUPTS to just use InterruptMask to check for
to-be-processed interrupts.
Greetings,
Andres Freund
| INTERRUPT_GENERAL, ...);
> * }
> *
> * It's important to clear the interrupt *before* checking if there's work to
> - * do. Otherwise, if someone sets the interrupt between the check and the
> + * do. Otherwise, if someone sets the interrupt between the check and the
> * ClearInterrupt() call, you will miss it and Wait will incorrectly block.
Isn't the change to move CHECK_FOR_INTERRUPTS() before ClearInterrupt()
violating what the paragraph explains?
> /*
> * Flags in the pending interrupts bitmask. Each value represents one bit in
> * the bitmask.
> */
> -typedef enum
> +typedef enum InterruptType
> {
I'm rather concerned about the number of interrupt bits we've already
consume. I'll respond about that in a separate, higher-level, email.
> /*
> - * Clear an interrupt flag.
> + * Clear an interrupt flag (or flags).
> */
> static inline void
> ClearInterrupt(uint32 interruptMask)
> {
> pg_atomic_fetch_and_u32(MyPendingInterrupts, ~interruptMask);
> + pg_write_barrier();
> }
pg_atomic_fetch_and_u32 is a full barrier, no separate barrier needed.
> #endif
> diff --git a/src/include/postmaster/startup.h
> b/src/include/postmaster/startup.h
> index 158f52255a6..a0316202b95 100644
> --- a/src/include/postmaster/startup.h
> +++ b/src/include/postmaster/startup.h
> @@ -25,6 +25,14 @@
>
> extern PGDLLIMPORT int log_startup_progress_interval;
>
> +/* The set of interrupts that are processed by ProcessStartupProcInterrupts
> */
> +#define INTERRUPT_STARTUP_PROC_MASK ( \
> + INTERRUPT_BARRIER |
> \
> + INTERRUPT_DIE |
> \
> + INTERRUPT_LOG_MEMORY_CONTEXT | \
> + INTERRUPT_CONFIG_RELOAD \
> + )
Somehow I find this name a bit confusing, the first parse attempt ends up with
PROC_MASK as one of the components of the name. How about
INTERRUPT_MASK_STARTUP[_PROC]?
Greetings,
Andres Freund
nings the CompilerWarnings task will fail. It's "just" the
32bit build and msvc windows builds that currently don't...
There was a patch adding it for the msvc build at some point, but ...
Greetings,
Andres Freund
not followed the development of this patch - but I continue to be
concerned about the performance impact it has as-is and the amount of COPY
performance improvements it forecloses.
This seems to add yet another layer of indirection to a lot of hot functions
like CopyGetData() etc.
Greetings,
Andres Freund
royed all the evidence.
Besides differences in filesystem level fragmentation, another potential
theory is that the SSDs were internally more fragmented. Occasionally
dumping/restoring the data could allow the drive to do internal wear
leveling before the new data is loaded, leading to a better layout.
I found that I get more consistent benchmark performance if I delete as much
of the data as possible, run fstrim -v -a and then load the data. And do
another round of fstrim.
Greetings,
Andres Freund
st interesting thing would be some runs with cloud-ish storage
(relatively high iops, very high latency)...
> The repository also has branches with plots showing results with WIP
> indexscan prefetching. (It's excluded from the PDFs I presented here).
Hm, I looked for those, but I couldn't quickly find any plots that include
them. Would I have to generate the plots from a checkout of the repo?
> The conclusions are similar to what we found here - "worker" is good
> with enough workers, io_uring is good too. Sync has issues for some of
> the data sets, but still helps a lot.
Nice.
Greetings,
Andres Freund
Hi,
On July 14, 2025 10:39:33 AM EDT, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>> On Mon, Jul 14, 2025 at 10:23:23AM -0400, Andres Freund wrote:
>> > Those steps are separated in time, and I'm currently trying to understand
>> > what are the consequences of
Hi,
On 2025-07-14 16:01:50 +0200, Dmitry Dolgov wrote:
> > On Mon, Jul 14, 2025 at 09:42:46AM -0400, Andres Freund wrote:
> > What on earth would be the point of putting a buffer on the freelist but not
> > make it reachable by the clock sweep? To me that's just nonse
Hi,
On 2025-07-14 15:20:03 +0200, Dmitry Dolgov wrote:
> > On Mon, Jul 14, 2025 at 09:14:26AM -0400, Andres Freund wrote:
> > > > Clock sweep can find any buffer, independent of whether it's on the
> > > > freelist.
> > >
> > > It does t
Hi,
On 2025-07-14 15:08:28 +0200, Dmitry Dolgov wrote:
> > On Mon, Jul 14, 2025 at 08:56:56AM -0400, Andres Freund wrote:
> > > Ah, I see what you mean folks. But I'm talking here only about buffers
> > > which will be allocated after extending shared memory -- th
Hi,
On 2025-07-14 11:32:25 +0200, Dmitry Dolgov wrote:
> > On Mon, Jul 14, 2025 at 10:24:50AM +0100, Thom Brown wrote:
> > On Mon, 14 Jul 2025, 09:54 Dmitry Dolgov, <9erthali...@gmail.com> wrote:
> >
> > > > On Mon, Jul 14, 2025 at 01:55:39PM +0530, Ashutosh Bapat wrote:
> > > > > You're right of
we could
> do that, but after reflection I think the best way is to modify
> JsonbIteratorNext to make that guarantee. I've checked that
> the attached silences the warning on gcc 15.1.1 (current
> Fedora 42).
WFM.
Greetings,
Andres Freund
Hi,
On 2025-07-10 09:52:45 +0900, Michael Paquier wrote:
> On Wed, Jul 09, 2025 at 03:46:43PM -0400, Tom Lane wrote:
> > Andres Freund writes:
> >> Seems like we should just remove TransactionIdIsActive()?
> >
> > +1. I wondered if any extensions might depend on
27;);") -t1
(with c=1 for the single-threaded case obviously)
The reason for the pg_shmem_allocations_numa is to ensure that shared_buffers
is actually mapped, as otherwise the bottleneck will be the kernel zeroing out
buffers.
The reason for doing -t1 is that I wanted to compare freelist vs clock sweep,
rather than clock sweep in general.
Note that I patched EvictUnpinnedBufferInternal() to call
StrategyFreeBuffer(), otherwise running this a second time won't actually
measure the freelist. And the first time run after postmaster start will
always be more noisy...
Greetings,
Andres Freund
Hi,
On 2025-07-10 14:17:21 +, Bertrand Drouvot wrote:
> On Wed, Jul 09, 2025 at 03:42:26PM -0400, Andres Freund wrote:
> > I wonder if we should *increase* the size of shared_buffers whenever huge
> > pages are in use and there's padding space due to the huge page
&g
Hi,
On 2025-07-10 17:31:45 +0200, Tomas Vondra wrote:
> On 7/9/25 19:23, Andres Freund wrote:
> > There's other things around this that could use some attention. It's not
> > hard
> > to see clock sweep be a bottleneck in concurrent workloads - partially due
>
ion as much as possible
>
> Does that sound reasonable to you?
That does seem like the minimum.
Unfortunately I'm rather doubtful this provides enough value to be worth the
cost. But I'm rather willing to be proven wrong.
Greetings,
Andres Freund
m. That means that you can't just
evaluate the whole predicate using ScanKeys.
3) ScanKey evaluation is actually sometimes *more* expensive than expression
evaluation, because the columns are deformed one-by-one.
Greetings,
Andres Freund
Hi,
On 2025-07-11 11:22:36 +0900, Amit Langote wrote:
> On Fri, Jul 11, 2025 at 5:55 AM Andres Freund wrote:
> > On 2025-07-10 17:28:50 +0900, Amit Langote wrote:
> > > Thanks for the patch.
> > >
> > > +/*
> > > + * Use pg_assume() for
me() if the release
build should be influenced.
> Was this strategy considered before introducing pg_assume, or did I miss
> that part of the discussion?
No, it wasn't. It seemed like a rather obviously bad idea to me, and the
primary motivation in this case really was to get rid of warnings like the one
addressed in te subsequent commit.
Greetings,
Andres Freund
Hi,
On 2025-07-10 17:28:50 +0900, Amit Langote wrote:
> On Thu, Jul 10, 2025 at 8:34 AM Andres Freund wrote:
> > On 2025-01-22 10:07:51 +0900, Amit Langote wrote:
> > > On Fri, Jan 17, 2025 at 2:05 PM Amit Langote
> > > wrote:
> > > > Her
Hi,
On 2025-07-10 21:00:21 +0200, Matthias van de Meent wrote:
> On Wed, 9 Jul 2025 at 16:59, Andres Freund wrote:
> > > 3. I noticed that there is AIO code for writev-related operations
> > > (specifically, pgaio_io_start_writev is exposed, as is
> > > PGAIO_OP_WR
a small gain by avoiding that.
Greetings,
Andres Freund
>From a443d7dc6419a5648b10bbd900acf2fc745255b4 Mon Sep 17 00:00:00 2001
From: Andres Freund
Date: Wed, 9 Jul 2025 19:27:19 -0400
Subject: [PATCH v1] Optimize seqscan code generation using pg_assume()
Discussion: https://postgr.es/m/CA+
Hi,
On 2025-07-02 22:13:17 +0800, jian he wrote:
> On Thu, Jun 5, 2025 at 3:00 AM Andres Freund wrote:
> > I've been once more annoyed by this warning. Here's a prototype for the
> > approach outlined above.
> >
>
> I can confirm the warning disappears when
Hi,
On 2025-06-05 15:50:48 -0400, Tom Lane wrote:
> Andres Freund writes:
> >> I've been wondering about adding wrapping something like that in a
> >> pg_assume(expr) or such.
>
> > I've been once more annoyed by this warning. Here's a prototype fo
(/Zc:preprocessor, introduced in VS 2019 v16.6).
Which seems likely to describe precisely what we're seeing?
Greetings,
Andres Freund
PS: Wonder if we should make the SDK version visible in meson setup...
en we could replace the modulo with a mask, which
> is
> + * likely more efficient.
> + */
> + switch (numa_partition_freelist)
> + {
> + case FREELIST_PARTITION_CPU:
> + freelist_idx = cpu % strategy_ncpus;
As mentioned earlier, modulo is rather expensive for something executed so
frequently...
> + break;
> +
> + case FREELIST_PARTITION_NODE:
> + freelist_idx = node % strategy_nnodes;
> + break;
Here we shouldn't need modulo, right?
> +
> + case FREELIST_PARTITION_PID:
> + freelist_idx = MyProcPid % strategy_ncpus;
> + break;
> +
> + default:
> + elog(ERROR, "unknown freelist partitioning value");
> + }
> +
> + return &StrategyControl->freelists[freelist_idx];
> +}
> /* size of lookup hash table ... see comment in StrategyInitialize */
> size = add_size(size, BufTableShmemSize(NBuffers +
> NUM_BUFFER_PARTITIONS));
>
> /* size of the shared replacement strategy control block */
> - size = add_size(size, MAXALIGN(sizeof(BufferStrategyControl)));
> + size = add_size(size, MAXALIGN(offsetof(BufferStrategyControl,
> freelists)));
> +
> + /*
> + * Allocate one frelist per CPU. We might use per-node freelists, but
> the
> + * assumption is the number of CPUs is less than number of NUMA nodes.
> + *
> + * FIXME This assumes the we have more CPUs than NUMA nodes, which seems
> + * like a safe assumption. But maybe we should calculate how many
> elements
> + * we actually need, depending on the GUC? Not a huge amount of memory.
FWIW, I don't think that's a safe assumption anymore. With CXL we can get a)
PCIe attached memory and b) remote memory as a separate NUMA nodes, and that
very well could end up as more NUMA nodes than cores.
Ugh, -ETOOLONG. Gotta schedule some other things...
Greetings,
Andres Freund
37:50 +0300
Fix race condition in preparing a transaction for two-phase commit.
Seems like we should just remove TransactionIdIsActive()?
Greetings,
Andres Freund
argue for adding extensibility, we can do that at
this stage. Trying to design this for extensibility from the get go, where
that extensibility is very unlikely to be used widely, seems rather likely to
just tank this entire project without getting us anything in return.
Greetings,
Andres Freund
Hi,
On 2025-07-09 13:36:26 -0400, Tom Lane wrote:
> It doesn't look like the Meson support needs such explicit tracking of
> required libraries, but perhaps I'm missing something?
It should be fine, -ldl is added to "os_deps" if needed, and os_deps is used
for all code
r testsuite... Having
testcode that is not run automatically may be helpful while originally
developing something, but it doesn't do anything to detect portability issues
or regressions.
Greetings,
Andres Freund
Hi,
On 2025-07-09 12:55:51 -0400, Greg Burd wrote:
> On Jul 9 2025, at 12:35 pm, Andres Freund wrote:
>
> > FWIW, I've started to wonder if we shouldn't just get rid of the freelist
> > entirely. While clocksweep is perhaps minutely slower in a single
> > t
Hi,
On 2025-07-09 12:04:00 +0200, Jakub Wartak wrote:
> On Tue, Jul 8, 2025 at 2:56 PM Andres Freund wrote:
> > On 2025-07-08 14:27:12 +0200, Tomas Vondra wrote:
> > > On 7/8/25 05:04, Andres Freund wrote:
> > > > On 2025-07-04 13:05:05 +0200, Jakub Wartak wrote:
ink it's worth favoring clock sweep.
Also needing to switch between getting buffers from the freelist and the sweep
makes the code more expensive. I think just having the buffer in the sweep,
with a refcount / usagecount of zero would suffice.
That seems particularly advantageous if w
ce if we could get there, but it'd require annotating *all*
intentionally exported functions in the backend with PGDLLIMPORT (rather than
just doing that for variables). Then we could make some symbols
*intentionally* not exported, which can improve the code generation (allowing
more compiler and linker optimizations).
Greetings,
Andres Freund
1 - 100 of 2170 matches
Mail list logo