Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-25 Thread Tomas Vondra
> > I was also missing it in my suggested patch draft, but this should > probably include #ifdef __linux__. > > > Re: Tomas Vondra >> +#ifdef USE_VALGRIND >> + >> +static inline void >> +pg_numa_touch_mem_if_required(uint64 tmp, char *ptr) > > St

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-25 Thread Tomas Vondra
On 6/25/25 09:15, Jakub Wartak wrote: > On Tue, Jun 24, 2025 at 5:30 PM Christoph Berg wrote: >> >> Re: Tomas Vondra >>> If it's a reliable fix, then I guess we can do it like this. But won't >>> that be a performance penalty on everyone? Or does the

Re: Remove unneeded check for XLH_INSERT_ALL_FROZEN in heap_xlog_insert

2025-06-24 Thread Tomas Vondra
atch this, even if it's ultimately harmless, just to keep the code not confusing. regards -- Tomas Vondra

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-24 Thread Tomas Vondra
On 6/24/25 17:30, Christoph Berg wrote: > Re: Tomas Vondra >> If it's a reliable fix, then I guess we can do it like this. But won't >> that be a performance penalty on everyone? Or does the system split the >> array into 16-element chunks anyway, so this makes no

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-24 Thread Tomas Vondra
ant to rely too much on my >>> interpretation of it. >> >> I don't have that much experience too but I think the issue is in >> do_pages_stat() >> and that "pages += chunk_nr" should be advanced by sizeof(compat_uptr_t) >> instead. > > Me neither, but I'll try submit this fix. > +1 Thanks to both of you for the report and the investigation. regards -- Tomas Vondra

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-24 Thread Tomas Vondra
On 6/24/25 13:10, Andres Freund wrote: > Hi, > > On 2025-06-24 03:43:19 +0200, Tomas Vondra wrote: >> FWIW while looking into this, I tried running this under valgrind (on a >> regular 64-bit system, not in the chroot), and I get this report: >> >> ==65065==

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-24 Thread Tomas Vondra
On 6/24/25 13:10, Bertrand Drouvot wrote: > Hi, > > On Tue, Jun 24, 2025 at 11:20:15AM +0200, Tomas Vondra wrote: >> On 6/24/25 10:24, Bertrand Drouvot wrote: >>> Yeah, same for me with pg_get_shmem_allocations_numa(). It works if >>> pg_numa_query_pages()

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-23 Thread Tomas Vondra
On 6/23/25 23:47, Tomas Vondra wrote: > ... > > Or maybe the 32-bit chroot on 64-bit host matters and confuses some > calculation. > I think it's likely something like this. I noticed that if I modify pg_buffercache_numa_pages() to query the addresses one by one, it works.

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-23 Thread Tomas Vondra
t; LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:394 > > Repeated calls are fine. > Huh. So it's only the first call that does this? Can you maybe print the addresses passed to pg_numa_query_pages? I wonder if there's some bug in how we fill that array. Not sure why would it happen only on 32-bit systems, though. I'll create a 32-bit VM so that I can try reproducing this. regards -- Tomas Vondra

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-23 Thread Tomas Vondra
On 6/23/25 23:25, Christoph Berg wrote: > Re: Tomas Vondra >> True. If it fails on first call, but succeeds on the other, then the >> problem is likely somewhere else. But also on the second call we won't >> do the memory touching. Can you try setting firstNumaTouch=fa

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-23 Thread Tomas Vondra
On 6/23/25 22:31, Christoph Berg wrote: > Re: Tomas Vondra >> Huh. So it's only the first call that does this? > > The first call after a restart. Reconnecting is not enough. > >> Can you maybe print the addresses passed to pg_numa_query_pages? I > > The a

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-23 Thread Tomas Vondra
On 6/23/25 22:51, Christoph Berg wrote: > Re: Tomas Vondra >> Didn't you say the first ~35 addresses succeed, right? What about the >> addresses after that? > > That was pg_shmem_allocations_numa. The pg_numa_query_pages() in there > works (does not return -1)

Re: Amcheck verification of GiST and GIN

2025-06-17 Thread Tomas Vondra
On 6/17/25 16:19, Thom Brown wrote: > On Mon, 16 Jun 2025 at 21:00, Tomas Vondra wrote: >> >> On 6/16/25 21:09, Arseniy Mukhin wrote: >>> On Mon, Jun 16, 2025 at 6:58 PM Tomas Vondra wrote: >>>> >>>> Thanks. >>>> >>>&g

Re: Avoid possible dereference null pointer (src/backend/utils/cache/relcache.c)

2025-06-17 Thread Tomas Vondra
it will probably work fine. The catalog is borked, and who knows in what way. My opinion is that adding a "elog(ERROR)" here would be misleading, as it implies it's something we expect. And mostly pointless. I can imagine adding an Assert, but I don't quite see how is that better than just hitting a segfault a couple lines later. regards -- Tomas Vondra

Re: Amcheck verification of GiST and GIN

2025-06-16 Thread Tomas Vondra
On 6/16/25 21:09, Arseniy Mukhin wrote: > On Mon, Jun 16, 2025 at 6:58 PM Tomas Vondra wrote: >> >> Thanks. >> >> I went through the patches, polished the commit messages and did some >> minor tweaks in patch 0002 (to make the variable names a bit more >> co

Re: No error checking when reading from file using zstd in pg_dump

2025-06-16 Thread Tomas Vondra
ommits this week, but considering I missed the issues before commit ... For a moment I was worried about breaking ABI when fixing this in the backbranches, but I guess that's not an issue for tools like pg_dump. regards -- Tomas Vondra

Re: No error checking when reading from file using zstd in pg_dump

2025-06-16 Thread Tomas Vondra
uced this API, but it's definitely the case it was based on the initial gzip code. Regarding the Z_NULL, I believe it has always been ignored like this, at least since 9.1. The code simply returns what gzgets() returns, and then compares that to NULL, etc. Is there there's a better way to deal with Z_NULL? I suppose we could explicitly check/translate Z_NULL to NULL, although Z_NULL is simply defined as 0. I don't recall if NULL has some additional magic. regards -- Tomas Vondra

Re: Amcheck verification of GiST and GIN

2025-06-16 Thread Tomas Vondra
read through the commit messages, and let me know if I got some of the details wrong (or not clear enough). Otherwise I plan to start pushing this soon (~tomorrow). regards -- Tomas VondraFrom cb24bb068582a39df9e9e59c2a9347889e896cf2 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Mon, 9 Jun

Re: Improve CRC32C performance on SSE4.2

2025-06-14 Thread Tomas Vondra
On 6/14/25 15:56, Nathan Bossart wrote: > On Sat, Jun 14, 2025 at 03:47:33PM +0200, Tomas Vondra wrote: >> I suggest you try with a newer gcc, perhaps 13.4. There's been a bunch >> of fixes related to AVX512 since 13.0, chances are this was already >> fixed. I don&#x

Re: Handling OID Changes in Regression Tests for C Extensions

2025-06-14 Thread Tomas Vondra
gt; The OIDs for user-defined objects (e.g. those from extensions) are not stable, and this will not change. The only way is to prevent the test output, e.g. by not including OIDs in the results, and eliminating all other types of non-determinism - eg. by enforcing ordering, etc. regards -- Tomas Vondra

Re: Improve CRC32C performance on SSE4.2

2025-06-14 Thread Tomas Vondra
gt; the current master, everything is fine. Does anyone knows the reason? > > The attached is my config.log. > > > -- Tomas Vondra

Re: Amcheck verification of GiST and GIN

2025-06-09 Thread Tomas Vondra
On 6/9/25 00:14, Tomas Vondra wrote: > ... > > I propose to split it like this, into three parts, each addressing a > particular type of mistake: > > 1) gin_check_posting_tree_parent_keys_consistency > > 2) gin_check_parent_keys_consistency

Re: amcheck support for BRIN indexes

2025-06-08 Thread Tomas Vondra
hink these tests are > portable. While writing tests some minor issues were found and fixed. > Also ci compiler warnings were fixed. > Thanks. I've added myself as a reviewer, so that I don't forget about this for the next CF. regards -- Tomas Vondra

Re: Amcheck verification of GiST and GIN

2025-06-08 Thread Tomas Vondra
On 5/29/25 13:53, Arseniy Mukhin wrote: > On Mon, May 26, 2025 at 7:28 PM Arseniy Mukhin > wrote: >> On Mon, May 26, 2025 at 1:27 PM Tomas Vondra wrote: >>> Also, I've noticed that the TAP test passes even with some (most) of the >>> verify_gin.c changes rever

Re: [WIP]Vertical Clustered Index (columnar store extension) - take2

2025-06-04 Thread Tomas Vondra
On 6/4/25 19:59, Jim Nasby wrote: > > > On Fri, May 23, 2025 at 4:29 PM Tomas Vondra <mailto:to...@vondra.me>> wrote: > > Also, Alvaro seemed to think TAM is the way to go, and in order to keep > the OLTP performance he suggested to use both heap and VCI

Re: strange perf regression with data checksums

2025-06-04 Thread Tomas Vondra
son, and treated it as "normal". But with the default changes, it'll be easier to spot once they upgrade to PG18. So better to get this in now, otherwise we may have to wait until PG19, because of ABI (the patch adds a field into BTScanPosData, but maybe it'd be possible to add it into padding, not sure). regards -- Tomas Vondra

Re: [PING] fallocate() causes btrfs to never compress postgresql files

2025-05-31 Thread Tomas Vondra
ow which ones to set, a lot of the knowledge is somewhat outdated I think. Wouldn't it be better for btrfs to just start returning EOPNOTSUPP (maybe with a mount option), in which case we already do the right thing automatically already? Sure, it means the admin needs to be aware of this in both cases. regards -- Tomas Vondra

Re: [PING] fallocate() causes btrfs to never compress postgresql files

2025-05-28 Thread Tomas Vondra
efully will > not affect postgres (see CAVEATS in man 3 posix_fallocate). > Well, if btrfs starts returning EOPNOTSUPP, and glibc switches to the userspace fallback, we wouldn't notice. But that's up to the btrfs to decide if they want to support fallocate. We still need our fallback anyway, because of other OSes. regards -- Tomas Vondra

Re: Non-reproducible AIO failure

2025-05-27 Thread Tomas Vondra
u run these tests in parallel. Can you share the patch/script? thank -- Tomas Vondra

Re: Amcheck verification of GiST and GIN

2025-05-26 Thread Tomas Vondra
e the TAP test to trigger this too? To show the current code (in master) misses this? Grigory, Andrey, Heikki, any opinions on the tweaks? regards -- Tomas Vondra From 973de3eaeeca7ff2946a5b0f92f481d70ba5b78d Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Mon, 26 May 2025 12:10:37 +0200 Subje

Re: Hash table scans outside transactions

2025-05-25 Thread Tomas Vondra
that break the seqscan? FWIW I think with the use case from the beginning of this thread: 1. Add/update/remove entries in hash table 2. Scan the existing entries and perform one transaction per entry 3. Close scan Why not to simply build a linked list after step (1)? regards -- Tomas Vondra

Re: [WIP]Vertical Clustered Index (columnar store extension) - take2

2025-05-23 Thread Tomas Vondra
r.c:115:28: error: assignment to ‘ExecutorStart_hook_type’ {aka ‘void (*)(QueryDesc *, int)’} from incompatible pointer type ‘_Bool (*)(QueryDesc *, int)’ [-Wincompatible-pointer-types] 115 | ExecutorStart_hook = vci_executor_start_routine; |^ executor/vci_executor.c: In function ‘vci_executor_start_routine’: executor/vci_executor.c:161:28: error: void value not ignored as it ought to be 161 | plan_valid = executor_start_prev(queryDesc, eflags); |^ executor/vci_executor.c:163:28: error: void value not ignored as it ought to be 163 | plan_valid = standard_ExecutorStart(queryDesc, eflags); |^ make: *** [../../src/Makefile.global:973: executor/vci_executor.o] Error 1 The extension is not added to contrib/Makefile, so "make world" does not trigger this failure. regards -- Tomas Vondra

Re: Enable data checksums by default

2025-05-23 Thread Tomas Vondra
xisting tooling? I mean, there's pretty much just one thing the user can do to make it work, and that's disabling checksums. Sure, they might also enable checksums on the old cluster, but that makes the upgrade much longer, and presumably they use pg_upgrade to upgrade quickly. That being said, I don't feel very strongly about this, so if the consensus is to just error-out, so be it. regards -- Tomas Vondra

Re: Enable data checksums by default

2025-05-23 Thread Tomas Vondra
Isn't the whole point of that change to keep the current workflow working? Also, I'm not sure if "no feedback about this" is reliable. I have no clue if people did any significant testing. Maybe people did a lot of testing and the current state is fine. But it's more likely there was little testing, in which case "no feedback" says nothing. FWIW I would be +0.5 to just let pg_upgrade disable checksums. regards -- Tomas Vondra

Re: generic plans and "initial" pruning

2025-05-22 Thread Tomas Vondra
OK with that in principle, assuming the benefits outweigh the risk of making backpatching harder. The patches don't seem exceptionally large / invasive, but I don't know how often we modify these parts. regards -- Tomas Vondra

Re: plan shape work

2025-05-20 Thread Tomas Vondra
uot;why was the index not used", and the possible answers include "dominated by cost by another path" or "does not match the index keys" etc. I wonder if this work might be useful for something like that. regards -- Tomas Vondra

Re: Please update the pgconf.dev Unconference notes

2025-05-20 Thread Tomas Vondra
ved too quickly in different directions for me to catch all the details, so the notes have gaps etc. If others can improve that / clarify, that'd be great. regards -- Tomas Vondra

Re: generic plans and "initial" pruning

2025-05-20 Thread Tomas Vondra
ts > seem to be reality. The second attached file is a test case that > triggers > > ... FYI I added this as a PG18 open item: https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items regards -- Tomas Vondra

Re: wrong query results on bf leafhopper

2025-05-20 Thread Tomas Vondra
good to kick this one out the pool if there's hardware issues. > There are tools like "stress" and "stressant", etc. Works on my rpi5, but depends on the packager. I'd probably just look at dmesg first. In my experience hardware issues are often pretty visible there - reports of failed I/O requests, thermal issues on the CPU, that kind of stuff. regards -- Tomas Vondra

Re: strange perf regression with data checksums

2025-05-19 Thread Tomas Vondra
On 5/19/25 22:29, Peter Geoghegan wrote: > On Mon, May 19, 2025 at 4:17 PM Tomas Vondra wrote: >> Same effect as v1 for IOS, with regular index scans I see this: >> >> 64 clients: 0.7M tps >> 96 clients: 1.5M tps >> >> So very similar improvement as for IO

Re: strange perf regression with data checksums

2025-05-19 Thread Tomas Vondra
On 5/19/25 20:44, Peter Geoghegan wrote: > On Mon, May 19, 2025 at 2:19 PM Peter Geoghegan wrote: >> On Mon, May 19, 2025 at 2:01 PM Tomas Vondra wrote: >>> The regular index scan however still have this issue, although it's not >>> as visible as for IOS. >

Re: strange perf regression with data checksums

2025-05-19 Thread Tomas Vondra
mentioned maybe we could add an atomic variable tracking the page LSN, so that we don't have to obtain the header lock. I didn't have time to try yet. regards -- Tomas Vondra

Re: strange perf regression with data checksums

2025-05-19 Thread Tomas Vondra
dr, buf_state); AFAICS the lock is needed simply to read a consistent value from the page header, but maybe we could have an atomic variable with a copy of the LSN in the buffer descriptor? regards -- Tomas Vondra | --91.21%--btgettuple

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-05-11 Thread Tomas Vondra
On 5/11/25 18:07, Peter Geoghegan wrote: > On Sat, May 10, 2025 at 10:59 AM Tomas Vondra wrote: >> But doesn't it also highlight how fragile this memory allocation is? The >> skip scan patch didn't do anything wrong - it just added a couple >> fields, using a lit

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-05-10 Thread Tomas Vondra
ibc libraries). Still, it's a long-standing behavior, and I doubt it's likely to change. But considering glibc is what most systems use, maybe we should add some protections? I recall there were proposals to add optional mallopt() call to set the M_TOP_PAD when running on glibc. Maybe we should revive that. I also had a patch to add a "memory pool", which fixed this as a side effect. regards -- Tomas Vondra results.pdf Description: Adobe PDF document

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-05-09 Thread Tomas Vondra
end_memory_contexts after preparing and executing the sample > query, or through pg_get_process_memory_contexts() from another > backend? > I haven't noticed any elevated memory usage in top, but the queries are very short, so I'm not sure how reliable that is. But if adding 4MB is enough to make this go away, I doubt I'd notice a difference. regards -- Tomas Vondra

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-05-09 Thread Tomas Vondra
On 5/9/25 18:36, Peter Geoghegan wrote: > On Fri, May 9, 2025 at 12:28 PM Tomas Vondra wrote: >> Not sure if it matters, but this uses index-only scans, and the pages >> are all-visible, so maybe it's not much more expensive. > > You're still going to have to s

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-05-09 Thread Tomas Vondra
tine to nbtree was. It does not remove skip scan itself (that > should still work with queries that are actually eligible to use skip > scan, albeit slightly less efficiently with some opclasses). > Tried, doesn't seem to affect the results at all. -- Tomas Vondra

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-05-09 Thread Tomas Vondra
On 5/9/25 17:55, Peter Geoghegan wrote: > On Fri, May 9, 2025 at 10:57 AM Tomas Vondra wrote: >> I see the regression even with variants that actually match some rows. >> For example if I do this: > >> so that the query matches 100 rows, I get the same behavior. > &

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-05-09 Thread Tomas Vondra
n with variants that actually match some rows. For example if I do this: update pgbench_accounts set bid = aid; vacuum full; and change the query to search for "bid = 1", I get exactly the same behavior. Even with update pgbench_accounts set bid = aid / 100; vacuum full; so that the query matches 100 rows, I get the same behavior. -- Tomas Vondra

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-05-09 Thread Tomas Vondra
On 5/9/25 16:17, Peter Geoghegan wrote: > On Fri, May 9, 2025 at 8:58 AM Tomas Vondra wrote: >> I'm also not sure about the root cause, but while investigating it one >> of the experiments I tried was tweaking the glibc malloc by setting >> >> export

Re: Amcheck verification of GiST and GIN

2025-05-09 Thread Tomas Vondra
* There was a discrepancy between parent and child > * tuples. We need to verify it is not a result of > * concurrent call of gistplacetopage(). So, lock parent > * and try to find downlink for current page. It may be > * missing due to concurrent page split, this is OK. > */ > pfree(stack->parenttup); > stack->parenttup = gin_refind_parent(rel, stack->parentblk, > stack->blkno, strategy); > > I think we can remove gin_refind_parent() and do ereport right away here. > The same logic as with 3). AFAIK it's impossible to have a child item > with a key that is higher than the cached parent key. > Parent key bounds what keys we can insert into the child page, so it > seems there is no way how they can appear there. > These look like good points. I've added it to open items so that we don't forget about this, I won't have time to look at this until after pgconf.dev. thanks -- Tomas Vondra

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-05-09 Thread Tomas Vondra
116037110 7193 - prepared1 33646 3655 4 25379 1137511342 32 37319 1409713911 There's almost no difference between bc35adee8d7 and 92fe23d93aa. regards -- Tomas Vondra

Re: strange perf regression with data checksums

2025-05-09 Thread Tomas Vondra
trick. > Good question. I haven't checked that explicitly, but it's a tiny data set (15MB) and I observed this even on long benchmarks with tens of millions of queries. So the hint bits should have been set. Also, I should have mentioned the query does an index-only scan, and the pin/unpin calls are on index pages, not on the heap. regards -- Tomas Vondra

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-05-09 Thread Tomas Vondra
C_TOP_PAD_ would not help like this. But I haven't looked at the code, and I wouldn't have guessed the query to have anything to do with skip scan ... regards -- Tomas Vondra

strange perf regression with data checksums

2025-05-09 Thread Tomas Vondra
e expensive under concurrency (the clients simply have to compete when updating the same counter, and with enough clients there'll be more conflicts and retries). Kinda unfortunate, and maybe we should do something about it, not sure. But why would it depend on checksums at all? This read-only test should be entirely in-memory, so how come it's affected? regards -- Tomas Vondra

Re: Improve hash join's handling of tuples with null join keys

2025-05-05 Thread Tomas Vondra
gconf.dev. I'd be surprised if this was a regression, the hash table lookups are not exactly free. And even if it was a minor regression, it'd affect only cases with many NULL keys, but it improves robustness. BTW do you consider this to be a bugfix for PG18? Or would it have to wait for PG19 at this point? regards -- Tomas Vondra

Re: Parallel CREATE INDEX for GIN indexes

2025-05-02 Thread Tomas Vondra
On 4/30/25 14:39, Tomas Vondra wrote: > > On 4/18/25 03:03, Vinod Sridharan wrote: >> ... >> > > The patch seems fine to me - I repeated the tests with mailing list > archives, with MemoryContextStats() in _gin_parallel_merge, and it > reliably minimizes the memory

Re: Parallel CREATE INDEX for GIN indexes

2025-04-30 Thread Tomas Vondra
fine. I was also worried if this might have performance impact, but it actually seems to make it a little bit faster. I'll get this pushed. thanks -- Tomas Vondra

Re: pgsql: Add function to get memory context stats for processes

2025-04-26 Thread Tomas Vondra
ssGetMemoryContextInterrupt() do the same thing? In any case, if DSA happens to not be the right way to transfer this, what should we use instead? The only thing I can think of is some sort of pre-allocated chunk of shared memory. regards -- Tomas Vondra

Re: Get rid of integer divide in FAST_PATH_REL_GROUP() macro

2025-04-26 Thread Tomas Vondra
o be verifying something that the loop > condition was checking already. I thought it was better to check that > we end up with a power-of-two. > > Please see the attached patch. > Thanks. Those changes seem fine to me to. Do you intend to push these, or do you want me to do it? regards -- Tomas Vondra

Re: AIO v2.5

2025-04-22 Thread Tomas Vondra
cause of the RMT, but I'm also willing to do some of the tests, if needed - but it'd be good to get some guidance. regards -- Tomas Vondra

Re: Enable data checksums by default

2025-04-22 Thread Tomas Vondra
ecksums by default, but now I realize the thread talks about "upgrade experience" which seems fairly wide. So, what kind of data we expect to gather in order to evaluate this? Who's expected to collect it and evaluate this? regards -- Tomas Vondra

Re: index prefetching

2025-04-22 Thread Tomas Vondra
On 4/22/25 18:26, Peter Geoghegan wrote: > On Tue, Apr 22, 2025 at 6:46 AM Tomas Vondra wrote: >> here's an improved (rebased + updated) version of the patch series, with >> some significant fixes and changes. The patch adds infrastructure and >> modifies btree index

Re: Parallel CREATE INDEX for GIN indexes

2025-04-21 Thread Tomas Vondra
approaches > to > resolve this too). > Thanks for the report. I didn't have time to look at this in detail yet, but the fix looks roughly correct. I've added this to the list of open items for PG18. regards -- Tomas Vondra

Re: Draft for basic NUMA observability

2025-04-10 Thread Tomas Vondra
bigint, perhaps? Attached is v28, with the commit messages updated, added about allocation of the memory, etc. I'll let the CI run the tests on it, and then will push, unless someone has more comments. regards -- Tomas Vondra From 9a222c77de2ee4a0b32d97c3d8bab2bb33f066de Mon Sep 17 00:0

Re: Add os_page_num to pg_buffercache

2025-04-10 Thread Tomas Vondra
> - It's currently doing the changes in pg_buffercache v1.6 but will need to > create v1.7 for 19 (if the above stands true) > This seems like a good idea in principle, but at this point it has to wait for PG19. Please add it to the July commitfest. regards -- Tomas Vondra

Re: long-standing data loss bug in initial sync of logical replication

2025-04-10 Thread Tomas Vondra
gt; >> >> Seeing no responses for a long time, I am planning to push the fix >> till 14 tomorrow unless there are some opinions on the fix for 13. We >> can continue to discuss the scope of the fix for 13. >> > > Pushed till 14. > Thanks everyone who persevered and kept working on fixing this! Highly appreciated. regards -- Tomas Vondra

Re: Draft for basic NUMA observability

2025-04-09 Thread Tomas Vondra
On 4/9/25 17:51, Andres Freund wrote: > Hi, > > On 2025-04-09 17:28:31 +0200, Tomas Vondra wrote: >> On 4/9/25 17:14, Andres Freund wrote: >>> I'd mention that the includes of postgres.h/fmgr.h is what caused missing >>> build-time dependencies and via tha

Re: Draft for basic NUMA observability

2025-04-09 Thread Tomas Vondra
On 4/9/25 17:14, Andres Freund wrote: > Hi, > > On 2025-04-09 16:33:14 +0200, Tomas Vondra wrote: >> From e1f093d091610d70fba72b2848f25ff44899ea8e Mon Sep 17 00:00:00 2001 >> From: Tomas Vondra >> Date: Tue, 8 Apr 2025 23:31:29 +0200 >> Subject: [PATCH 1/2] Clea

Re: Draft for basic NUMA observability

2025-04-09 Thread Tomas Vondra
On 4/9/25 01:29, Andres Freund wrote: > Hi, > > On 2025-04-09 01:10:09 +0200, Tomas Vondra wrote: >> On 4/8/25 15:06, Andres Freund wrote: >>> Hi, >>> >>> On 2025-04-08 17:44:19 +0500, Kirill Reshke wrote: >>>> On Mon, 7 Apr 2025 at 23:00, To

Re: Draft for basic NUMA observability

2025-04-09 Thread Tomas Vondra
Updated patches with proper commit messages etc. -- Tomas Vondra From e1f093d091610d70fba72b2848f25ff44899ea8e Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Tue, 8 Apr 2025 23:31:29 +0200 Subject: [PATCH 1/2] Cleanup of pg_numa.c This moves/renames some of the functions defined in

Re: Draft for basic NUMA observability

2025-04-09 Thread Tomas Vondra
On 4/9/25 14:07, Tomas Vondra wrote: > ... > > OK, here are two patches, where 0001 adds the missingdeps check to the > Debian meson build. It just adds that to the build script. > > 0002 leaves the NUMA stuff in src/port (i.e. it's no longer moved to > src/backen

Re: Draft for basic NUMA observability

2025-04-08 Thread Tomas Vondra
On 4/8/25 15:06, Andres Freund wrote: > Hi, > > On 2025-04-08 17:44:19 +0500, Kirill Reshke wrote: >> On Mon, 7 Apr 2025 at 23:00, Tomas Vondra wrote: >>> I'll let the CI run the tests on it, and >>> then will push, unless someone has more comments. >

Re: Draft for basic NUMA observability

2025-04-08 Thread Tomas Vondra
On 4/8/25 15:06, Andres Freund wrote: > Hi, > > On 2025-04-08 17:44:19 +0500, Kirill Reshke wrote: >> On Mon, 7 Apr 2025 at 23:00, Tomas Vondra wrote: >>> I'll let the CI run the tests on it, and >>> then will push, unless someone has more comments. >

Re: Draft for basic NUMA observability

2025-04-08 Thread Tomas Vondra
On 4/8/25 16:59, Andres Freund wrote: > Hi, > > On 2025-04-08 09:35:37 -0400, Andres Freund wrote: >> On April 8, 2025 9:21:57 AM EDT, Tomas Vondra wrote: >>> On 4/8/25 15:06, Andres Freund wrote: >>>> On 2025-04-08 17:44:19 +0500, Kirill Reshke wro

Re: Draft for basic NUMA observability

2025-04-08 Thread Tomas Vondra
> The attached small patch fixes the manual. > Thank you for noticing this and for the fix! Pushed. This also reminded me we agreed to change page_num to bigint, which I forgot to change before commit. So I adjusted that too, separately. regards -- Tomas Vondra

Re: Draft for basic NUMA observability

2025-04-07 Thread Tomas Vondra
On 4/7/25 17:51, Andres Freund wrote: > Hi, > > On 2025-04-06 13:56:54 +0200, Tomas Vondra wrote: >> On 4/6/25 01:00, Andres Freund wrote: >>> On 2025-04-05 18:29:22 -0400, Andres Freund wrote: >>>> I think one thing that the docs should mention is that callin

Re: Draft for basic NUMA observability

2025-04-07 Thread Tomas Vondra
On 4/7/25 23:50, Jakub Wartak wrote: > On Mon, Apr 7, 2025 at 11:27 PM Tomas Vondra wrote: >> >> Hi, >> >> I've pushed all three parts of v29, with some additional corrections >> (picked lower OIDs, bumped catversion, fixed commit messages). > > H

Re: Draft for basic NUMA observability

2025-04-07 Thread Tomas Vondra
Hi, I've pushed all three parts of v29, with some additional corrections (picked lower OIDs, bumped catversion, fixed commit messages). On 4/7/25 23:01, Jakub Wartak wrote: > On Mon, Apr 7, 2025 at 9:51 PM Tomas Vondra wrote: > >>> So it looks like that the new way to it

Re: Draft for basic NUMA observability

2025-04-07 Thread Tomas Vondra
On 4/7/25 20:11, Bertrand Drouvot wrote: > Hi, > > On Mon, Apr 07, 2025 at 12:42:21PM -0400, Andres Freund wrote: >> Hi, >> >> On 2025-04-07 18:36:24 +0200, Tomas Vondra wrote: >> >> I was thinking of checking if the BufferDesc indicates BM_VALID or >&g

Re: Draft for basic NUMA observability

2025-04-07 Thread Tomas Vondra
ent patches are good enough >> for PG18, with the current behavior, and then maybe improve that in >> PG19. > > I think as long as the docs mention this with or it's ok for > now. > OK, I'll add a warning explaining this. regards -- Tomas Vondra

Re: Improve monitoring of shared memory allocations

2025-04-07 Thread Tomas Vondra
ssion tests can't tell us much, considering it didn't fail once with the reverted patch :-( I did check the coverage in: https://coverage.postgresql.org/src/backend/utils/hash/dynahash.c.gcov.html and sure enough, dir_realloc() is not executed once. And there's a couple more p

Re: Draft for basic NUMA observability

2025-04-07 Thread Tomas Vondra
in os_page_status. I intend to push 0001 and 0002 shortly, and 0003 after a bit more review and testing, unless I hear objections. regards -- Tomas Vondra From fcc4fc2ada33cbbc962d561ddeea6966f0d55492 Mon Sep 17 00:00:00 2001 From: Jakub Wartak Date: Wed, 2 Apr 2025 12:29:22 +0200 Subject: [P

Re: Draft for basic NUMA observability

2025-04-06 Thread Tomas Vondra
;> pages. >>> + * It's a bit misleading to call that "aligned", no? */ >>> + >>> + /* Get number of OS aligned pages */ >>> + shm_ent_page_count >>> + = TYPEALIGN(os_page_size, ent->allocated_size) / >>> os_page_size; >>> + >>> + /* >>> + * If we get ever 0xff back from kernel inquiry, then we >>> probably have >>> + * bug in our buffers to OS page mapping code here. >>> + */ >>> + memset(pages_status, 0xff, sizeof(int) * shm_ent_page_count); >> >> There's obviously no guarantee that shm_ent_page_count is a multiple of >> os_page_size. I think it'd be interesting to show in the view when one shmem >> allocation shares a page with the prior allocation - that can contribute a >> bit >> to contention. What about showing a start_os_page_id and end_os_page_id or >> something? That could be a feature for later though. > > I was thinking about it, but it could be done when analyzing this > together with data from pg_shmem_allocations(?) My worry is timing :( > Anyway, we could extend this view in future revisions. > I'd leave this out for now. It's not difficult, but let's focus on the other issues. >>> +SELECT NOT(pg_numa_available()) AS skip_test \gset >>> +\if :skip_test >>> +\quit >>> +\endif >>> +-- switch to superuser >>> +\c - >>> +SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa; >>> + ok >>> + >>> + t >>> +(1 row) >> >> Could it be worthwhile to run the test if !pg_numa_available(), to test that >> we do the right thing in that case? We need an alternative output anyway, so >> that might be fine? > > Added. the meson test passes, but I'm sending it as fast as possible > to avoid a clash with Tomas. > Please keep working on this. I may hava a bit of time in the evening, but in the worst case I'll merge it into your patch. regards -- Tomas Vondra

Re: Draft for basic NUMA observability

2025-04-06 Thread Tomas Vondra
he current backend, so I'd bet people would not be happy with NULL, and would proceed to force the allocation in some other way (say, a large query of some sort). Which obviously causes a lot of other problems. I can imagine having a flag that makes the allocation optional, but there's no convenient way to pass that to a view, and I think most people want the allocation anyway. Especially for monitoring purposes, which usually happens in a new connection, so the backend has little opportunity to allocate the pages "naturally." regards -- Tomas Vondra

Re: Draft for basic NUMA observability

2025-04-06 Thread Tomas Vondra
at right now, but at the very least we ought to > document it. > +1 to documenting this > > On 2025-04-05 16:33:28 +0200, Tomas Vondra wrote: >> The libnuma library is not available on 32-bit builds (there's no shared >> object for i386), so we disable it in that

Re: Improve monitoring of shared memory allocations

2025-04-05 Thread Tomas Vondra
fields. Seems a bit weird, but we always did that - the patch does not really change that. I'll now mark this as committed. I haven't done about the alignment. My conclusion from the discussion was we don't quite need to do that, but if we do I think it's a matter for a separate patch - perhaps something like the 0003. Thanks for the patch, reviews, etc. -- Tomas Vondra

Re: Snapshot related assert failure on skink

2025-04-05 Thread Tomas Vondra
On 3/24/25 16:25, Heikki Linnakangas wrote: > On 24/03/2025 16:56, Tomas Vondra wrote: >> >> >> On 3/23/25 17:43, Heikki Linnakangas wrote: >>> On 21/03/2025 17:16, Andres Freund wrote: >>>> Am I right in understanding that the only scenario (w

Re: Draft for basic NUMA observability

2025-04-05 Thread Tomas Vondra
On 4/5/25 15:23, Tomas Vondra wrote: > On 4/5/25 11:37, Bertrand Drouvot wrote: >> Hi, >> >> On Fri, Apr 04, 2025 at 09:25:57PM +0200, Tomas Vondra wrote: >>> OK, >>> >>> here's v25 after going through the patches once more, fixing the issues &

Re: Draft for basic NUMA observability

2025-04-05 Thread Tomas Vondra
On 4/5/25 11:37, Bertrand Drouvot wrote: > Hi, > > On Fri, Apr 04, 2025 at 09:25:57PM +0200, Tomas Vondra wrote: >> OK, >> >> here's v25 after going through the patches once more, fixing the issues >> mentioned by Bertrand, etc. > > Thanks! > &

Re: Proposal: Adding compression of temporary files

2025-04-04 Thread Tomas Vondra
code gets multiple loops in while (wpos < file->nbytes) { ... } because bytestowrite will be the value from the last loop? I haven't tried, but I guess writing wide tuples (more than 8k) might fail. regards -- Tomas Vondra

Re: Draft for basic NUMA observability

2025-04-04 Thread Tomas Vondra
in the function comment, but I'm also not quite sure I understand what "output shared memory" is ... regards -- Tomas Vondra From 381c5077592e38dbcbbf6acc4f1e86a767a92957 Mon Sep 17 00:00:00 2001 From: Jakub Wartak Date: Wed, 2 Apr 2025 12:29:22 +0200 Subject: [PATCH v25 1/5]

Re: index prefetching

2025-04-04 Thread Tomas Vondra
Yes, I agree. regards -- Tomas Vondra

Re: Draft for basic NUMA observability

2025-04-04 Thread Tomas Vondra
On 4/4/25 08:50, Bertrand Drouvot wrote: > Hi, > > On Thu, Apr 03, 2025 at 08:53:57PM +0200, Tomas Vondra wrote: >> On 4/3/25 15:12, Jakub Wartak wrote: >>> On Thu, Apr 3, 2025 at 1:52 PM Tomas Vondra wrote: >>> >>>> ... >>>> >>&

Re: Draft for basic NUMA observability

2025-04-04 Thread Tomas Vondra
On 4/4/25 09:35, Jakub Wartak wrote: > On Fri, Apr 4, 2025 at 8:50 AM Bertrand Drouvot > wrote: >> >> Hi, >> >> On Thu, Apr 03, 2025 at 08:53:57PM +0200, Tomas Vondra wrote: >>> On 4/3/25 15:12, Jakub Wartak wrote: >>>>

Re: Draft for basic NUMA observability

2025-04-03 Thread Tomas Vondra
On 4/3/25 15:12, Jakub Wartak wrote: > On Thu, Apr 3, 2025 at 1:52 PM Tomas Vondra wrote: > >> ... >> >> So unless someone can demonstrate a use case where this would matter, >> I'd not worry about it too much. > > OK, fine for me - just 3 cols for p

Re: Draft for basic NUMA observability

2025-04-03 Thread Tomas Vondra
On 4/3/25 10:23, Bertrand Drouvot wrote: > Hi, > > On Thu, Apr 03, 2025 at 09:01:43AM +0200, Jakub Wartak wrote: >> On Wed, Apr 2, 2025 at 6:40 PM Tomas Vondra wrote: >> >> Hi Tomas, >> >>> OK, so you agree the commit messages are complete / correct

Re: Draft for basic NUMA observability

2025-04-03 Thread Tomas Vondra
On 4/3/25 09:01, Jakub Wartak wrote: > On Wed, Apr 2, 2025 at 6:40 PM Tomas Vondra wrote: > > Hi Tomas, > >> OK, so you agree the commit messages are complete / correct? > > Yes. > >> OK. FWIW if you disagree with some of my proposed changes, feel free to &

Re: BTScanOpaqueData size slows down tests

2025-04-02 Thread Tomas Vondra
On 4/2/25 17:45, Peter Geoghegan wrote: > On Wed, Apr 2, 2025 at 11:36 AM Tom Lane wrote: >> Ouch! I had no idea it had gotten that big. Yeah, we ought to >> do something about that. > > Tomas Vondra talked about this recently, in the context of his work on > prefe

  1   2   3   4   5   6   7   8   9   10   >