>
> I was also missing it in my suggested patch draft, but this should
> probably include #ifdef __linux__.
>
>
> Re: Tomas Vondra
>> +#ifdef USE_VALGRIND
>> +
>> +static inline void
>> +pg_numa_touch_mem_if_required(uint64 tmp, char *ptr)
>
> St
On 6/25/25 09:15, Jakub Wartak wrote:
> On Tue, Jun 24, 2025 at 5:30 PM Christoph Berg wrote:
>>
>> Re: Tomas Vondra
>>> If it's a reliable fix, then I guess we can do it like this. But won't
>>> that be a performance penalty on everyone? Or does the
atch this, even
if it's ultimately harmless, just to keep the code not confusing.
regards
--
Tomas Vondra
On 6/24/25 17:30, Christoph Berg wrote:
> Re: Tomas Vondra
>> If it's a reliable fix, then I guess we can do it like this. But won't
>> that be a performance penalty on everyone? Or does the system split the
>> array into 16-element chunks anyway, so this makes no
ant to rely too much on my
>>> interpretation of it.
>>
>> I don't have that much experience too but I think the issue is in
>> do_pages_stat()
>> and that "pages += chunk_nr" should be advanced by sizeof(compat_uptr_t)
>> instead.
>
> Me neither, but I'll try submit this fix.
>
+1
Thanks to both of you for the report and the investigation.
regards
--
Tomas Vondra
On 6/24/25 13:10, Andres Freund wrote:
> Hi,
>
> On 2025-06-24 03:43:19 +0200, Tomas Vondra wrote:
>> FWIW while looking into this, I tried running this under valgrind (on a
>> regular 64-bit system, not in the chroot), and I get this report:
>>
>> ==65065==
On 6/24/25 13:10, Bertrand Drouvot wrote:
> Hi,
>
> On Tue, Jun 24, 2025 at 11:20:15AM +0200, Tomas Vondra wrote:
>> On 6/24/25 10:24, Bertrand Drouvot wrote:
>>> Yeah, same for me with pg_get_shmem_allocations_numa(). It works if
>>> pg_numa_query_pages()
On 6/23/25 23:47, Tomas Vondra wrote:
> ...
>
> Or maybe the 32-bit chroot on 64-bit host matters and confuses some
> calculation.
>
I think it's likely something like this. I noticed that if I modify
pg_buffercache_numa_pages() to query the addresses one by one, it works.
t; LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:394
>
> Repeated calls are fine.
>
Huh. So it's only the first call that does this?
Can you maybe print the addresses passed to pg_numa_query_pages? I
wonder if there's some bug in how we fill that array. Not sure why would
it happen only on 32-bit systems, though.
I'll create a 32-bit VM so that I can try reproducing this.
regards
--
Tomas Vondra
On 6/23/25 23:25, Christoph Berg wrote:
> Re: Tomas Vondra
>> True. If it fails on first call, but succeeds on the other, then the
>> problem is likely somewhere else. But also on the second call we won't
>> do the memory touching. Can you try setting firstNumaTouch=fa
On 6/23/25 22:31, Christoph Berg wrote:
> Re: Tomas Vondra
>> Huh. So it's only the first call that does this?
>
> The first call after a restart. Reconnecting is not enough.
>
>> Can you maybe print the addresses passed to pg_numa_query_pages? I
>
> The a
On 6/23/25 22:51, Christoph Berg wrote:
> Re: Tomas Vondra
>> Didn't you say the first ~35 addresses succeed, right? What about the
>> addresses after that?
>
> That was pg_shmem_allocations_numa. The pg_numa_query_pages() in there
> works (does not return -1)
On 6/17/25 16:19, Thom Brown wrote:
> On Mon, 16 Jun 2025 at 21:00, Tomas Vondra wrote:
>>
>> On 6/16/25 21:09, Arseniy Mukhin wrote:
>>> On Mon, Jun 16, 2025 at 6:58 PM Tomas Vondra wrote:
>>>>
>>>> Thanks.
>>>>
>>>&g
it will probably work fine. The catalog is borked, and who knows
in what way.
My opinion is that adding a "elog(ERROR)" here would be misleading, as
it implies it's something we expect. And mostly pointless. I can imagine
adding an Assert, but I don't quite see how is that better than just
hitting a segfault a couple lines later.
regards
--
Tomas Vondra
On 6/16/25 21:09, Arseniy Mukhin wrote:
> On Mon, Jun 16, 2025 at 6:58 PM Tomas Vondra wrote:
>>
>> Thanks.
>>
>> I went through the patches, polished the commit messages and did some
>> minor tweaks in patch 0002 (to make the variable names a bit more
>> co
ommits this week, but
considering I missed the issues before commit ...
For a moment I was worried about breaking ABI when fixing this in the
backbranches, but I guess that's not an issue for tools like pg_dump.
regards
--
Tomas Vondra
uced this API, but it's definitely
the case it was based on the initial gzip code.
Regarding the Z_NULL, I believe it has always been ignored like this, at
least since 9.1. The code simply returns what gzgets() returns, and then
compares that to NULL, etc. Is there there's a better way to deal with
Z_NULL? I suppose we could explicitly check/translate Z_NULL to NULL,
although Z_NULL is simply defined as 0. I don't recall if NULL has some
additional magic.
regards
--
Tomas Vondra
read through the commit messages, and let me know if I got some
of the details wrong (or not clear enough). Otherwise I plan to start
pushing this soon (~tomorrow).
regards
--
Tomas VondraFrom cb24bb068582a39df9e9e59c2a9347889e896cf2 Mon Sep 17 00:00:00 2001
From: Tomas Vondra
Date: Mon, 9 Jun
On 6/14/25 15:56, Nathan Bossart wrote:
> On Sat, Jun 14, 2025 at 03:47:33PM +0200, Tomas Vondra wrote:
>> I suggest you try with a newer gcc, perhaps 13.4. There's been a bunch
>> of fixes related to AVX512 since 13.0, chances are this was already
>> fixed. I don
gt;
The OIDs for user-defined objects (e.g. those from extensions) are not
stable, and this will not change. The only way is to prevent the test
output, e.g. by not including OIDs in the results, and eliminating all
other types of non-determinism - eg. by enforcing ordering, etc.
regards
--
Tomas Vondra
gt; the current master, everything is fine. Does anyone knows the reason?
>
> The attached is my config.log.
>
>
>
--
Tomas Vondra
On 6/9/25 00:14, Tomas Vondra wrote:
> ...
>
> I propose to split it like this, into three parts, each addressing a
> particular type of mistake:
>
> 1) gin_check_posting_tree_parent_keys_consistency
>
> 2) gin_check_parent_keys_consistency
hink these tests are
> portable. While writing tests some minor issues were found and fixed.
> Also ci compiler warnings were fixed.
>
Thanks. I've added myself as a reviewer, so that I don't forget about
this for the next CF.
regards
--
Tomas Vondra
On 5/29/25 13:53, Arseniy Mukhin wrote:
> On Mon, May 26, 2025 at 7:28 PM Arseniy Mukhin
> wrote:
>> On Mon, May 26, 2025 at 1:27 PM Tomas Vondra wrote:
>>> Also, I've noticed that the TAP test passes even with some (most) of the
>>> verify_gin.c changes rever
On 6/4/25 19:59, Jim Nasby wrote:
>
>
> On Fri, May 23, 2025 at 4:29 PM Tomas Vondra <mailto:to...@vondra.me>> wrote:
>
> Also, Alvaro seemed to think TAM is the way to go, and in order to keep
> the OLTP performance he suggested to use both heap and VCI
son, and
treated it as "normal". But with the default changes, it'll be easier to
spot once they upgrade to PG18.
So better to get this in now, otherwise we may have to wait until PG19,
because of ABI (the patch adds a field into BTScanPosData, but maybe
it'd be possible to add it into padding, not sure).
regards
--
Tomas Vondra
ow which ones to set, a lot
of the knowledge is somewhat outdated I think.
Wouldn't it be better for btrfs to just start returning EOPNOTSUPP
(maybe with a mount option), in which case we already do the right thing
automatically already? Sure, it means the admin needs to be aware of
this in both cases.
regards
--
Tomas Vondra
efully will
> not affect postgres (see CAVEATS in man 3 posix_fallocate).
>
Well, if btrfs starts returning EOPNOTSUPP, and glibc switches to the
userspace fallback, we wouldn't notice. But that's up to the btrfs to
decide if they want to support fallocate. We still need our fallback
anyway, because of other OSes.
regards
--
Tomas Vondra
u run these tests in parallel. Can you share the
patch/script?
thank
--
Tomas Vondra
e the TAP test to trigger this
too? To show the current code (in master) misses this?
Grigory, Andrey, Heikki, any opinions on the tweaks?
regards
--
Tomas Vondra
From 973de3eaeeca7ff2946a5b0f92f481d70ba5b78d Mon Sep 17 00:00:00 2001
From: Tomas Vondra
Date: Mon, 26 May 2025 12:10:37 +0200
Subje
that break the seqscan?
FWIW I think with the use case from the beginning of this thread:
1. Add/update/remove entries in hash table
2. Scan the existing entries and perform one transaction per entry
3. Close scan
Why not to simply build a linked list after step (1)?
regards
--
Tomas Vondra
r.c:115:28: error: assignment to
‘ExecutorStart_hook_type’ {aka ‘void (*)(QueryDesc *, int)’} from
incompatible pointer type ‘_Bool (*)(QueryDesc *, int)’
[-Wincompatible-pointer-types]
115 | ExecutorStart_hook = vci_executor_start_routine;
|^
executor/vci_executor.c: In function ‘vci_executor_start_routine’:
executor/vci_executor.c:161:28: error: void value not ignored as it
ought to be
161 | plan_valid = executor_start_prev(queryDesc, eflags);
|^
executor/vci_executor.c:163:28: error: void value not ignored as it
ought to be
163 | plan_valid = standard_ExecutorStart(queryDesc,
eflags);
|^
make: *** [../../src/Makefile.global:973: executor/vci_executor.o] Error 1
The extension is not added to contrib/Makefile, so "make world" does not
trigger this failure.
regards
--
Tomas Vondra
xisting tooling? I mean,
there's pretty much just one thing the user can do to make it work, and
that's disabling checksums. Sure, they might also enable checksums on
the old cluster, but that makes the upgrade much longer, and presumably
they use pg_upgrade to upgrade quickly.
That being said, I don't feel very strongly about this, so if the
consensus is to just error-out, so be it.
regards
--
Tomas Vondra
Isn't the whole point of that
change to keep the current workflow working?
Also, I'm not sure if "no feedback about this" is reliable. I have no
clue if people did any significant testing. Maybe people did a lot of
testing and the current state is fine. But it's more likely there was
little testing, in which case "no feedback" says nothing.
FWIW I would be +0.5 to just let pg_upgrade disable checksums.
regards
--
Tomas Vondra
OK with that in principle, assuming the benefits outweigh the risk
of making backpatching harder. The patches don't seem exceptionally
large / invasive, but I don't know how often we modify these parts.
regards
--
Tomas Vondra
uot;why was the index not used", and the
possible answers include "dominated by cost by another path" or "does
not match the index keys" etc.
I wonder if this work might be useful for something like that.
regards
--
Tomas Vondra
ved too quickly in different
directions for me to catch all the details, so the notes have gaps etc.
If others can improve that / clarify, that'd be great.
regards
--
Tomas Vondra
ts
> seem to be reality. The second attached file is a test case that
> triggers
>
> ...
FYI I added this as a PG18 open item:
https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items
regards
--
Tomas Vondra
good to kick this one out the pool if there's hardware issues.
>
There are tools like "stress" and "stressant", etc. Works on my rpi5,
but depends on the packager.
I'd probably just look at dmesg first. In my experience hardware issues
are often pretty visible there - reports of failed I/O requests, thermal
issues on the CPU, that kind of stuff.
regards
--
Tomas Vondra
On 5/19/25 22:29, Peter Geoghegan wrote:
> On Mon, May 19, 2025 at 4:17 PM Tomas Vondra wrote:
>> Same effect as v1 for IOS, with regular index scans I see this:
>>
>> 64 clients: 0.7M tps
>> 96 clients: 1.5M tps
>>
>> So very similar improvement as for IO
On 5/19/25 20:44, Peter Geoghegan wrote:
> On Mon, May 19, 2025 at 2:19 PM Peter Geoghegan wrote:
>> On Mon, May 19, 2025 at 2:01 PM Tomas Vondra wrote:
>>> The regular index scan however still have this issue, although it's not
>>> as visible as for IOS.
>
mentioned maybe we could add an atomic variable
tracking the page LSN, so that we don't have to obtain the header lock.
I didn't have time to try yet.
regards
--
Tomas Vondra
dr, buf_state);
AFAICS the lock is needed simply to read a consistent value from the
page header, but maybe we could have an atomic variable with a copy of
the LSN in the buffer descriptor?
regards
--
Tomas Vondra
|
--91.21%--btgettuple
On 5/11/25 18:07, Peter Geoghegan wrote:
> On Sat, May 10, 2025 at 10:59 AM Tomas Vondra wrote:
>> But doesn't it also highlight how fragile this memory allocation is? The
>> skip scan patch didn't do anything wrong - it just added a couple
>> fields, using a lit
ibc libraries). Still,
it's a long-standing behavior, and I doubt it's likely to change. But
considering glibc is what most systems use, maybe we should add some
protections?
I recall there were proposals to add optional mallopt() call to set the
M_TOP_PAD when running on glibc. Maybe we should revive that. I also had
a patch to add a "memory pool", which fixed this as a side effect.
regards
--
Tomas Vondra
results.pdf
Description: Adobe PDF document
end_memory_contexts after preparing and executing the sample
> query, or through pg_get_process_memory_contexts() from another
> backend?
>
I haven't noticed any elevated memory usage in top, but the queries are
very short, so I'm not sure how reliable that is. But if adding 4MB is
enough to make this go away, I doubt I'd notice a difference.
regards
--
Tomas Vondra
On 5/9/25 18:36, Peter Geoghegan wrote:
> On Fri, May 9, 2025 at 12:28 PM Tomas Vondra wrote:
>> Not sure if it matters, but this uses index-only scans, and the pages
>> are all-visible, so maybe it's not much more expensive.
>
> You're still going to have to s
tine to nbtree was. It does not remove skip scan itself (that
> should still work with queries that are actually eligible to use skip
> scan, albeit slightly less efficiently with some opclasses).
>
Tried, doesn't seem to affect the results at all.
--
Tomas Vondra
On 5/9/25 17:55, Peter Geoghegan wrote:
> On Fri, May 9, 2025 at 10:57 AM Tomas Vondra wrote:
>> I see the regression even with variants that actually match some rows.
>> For example if I do this:
>
>> so that the query matches 100 rows, I get the same behavior.
>
&
n with variants that actually match some rows.
For example if I do this:
update pgbench_accounts set bid = aid;
vacuum full;
and change the query to search for "bid = 1", I get exactly the same
behavior. Even with
update pgbench_accounts set bid = aid / 100;
vacuum full;
so that the query matches 100 rows, I get the same behavior.
--
Tomas Vondra
On 5/9/25 16:17, Peter Geoghegan wrote:
> On Fri, May 9, 2025 at 8:58 AM Tomas Vondra wrote:
>> I'm also not sure about the root cause, but while investigating it one
>> of the experiments I tried was tweaking the glibc malloc by setting
>>
>> export
* There was a discrepancy between parent and child
> * tuples. We need to verify it is not a result of
> * concurrent call of gistplacetopage(). So, lock parent
> * and try to find downlink for current page. It may be
> * missing due to concurrent page split, this is OK.
> */
> pfree(stack->parenttup);
> stack->parenttup = gin_refind_parent(rel, stack->parentblk,
> stack->blkno, strategy);
>
> I think we can remove gin_refind_parent() and do ereport right away here.
> The same logic as with 3). AFAIK it's impossible to have a child item
> with a key that is higher than the cached parent key.
> Parent key bounds what keys we can insert into the child page, so it
> seems there is no way how they can appear there.
>
These look like good points. I've added it to open items so that we
don't forget about this, I won't have time to look at this until after
pgconf.dev.
thanks
--
Tomas Vondra
116037110 7193
-
prepared1 33646 3655
4 25379 1137511342
32 37319 1409713911
There's almost no difference between bc35adee8d7 and 92fe23d93aa.
regards
--
Tomas Vondra
trick.
>
Good question. I haven't checked that explicitly, but it's a tiny data
set (15MB) and I observed this even on long benchmarks with tens of
millions of queries. So the hint bits should have been set.
Also, I should have mentioned the query does an index-only scan, and the
pin/unpin calls are on index pages, not on the heap.
regards
--
Tomas Vondra
C_TOP_PAD_ would not help like this. But I haven't
looked at the code, and I wouldn't have guessed the query to have
anything to do with skip scan ...
regards
--
Tomas Vondra
e expensive under concurrency (the clients
simply have to compete when updating the same counter, and with enough
clients there'll be more conflicts and retries). Kinda unfortunate, and
maybe we should do something about it, not sure.
But why would it depend on checksums at all? This read-only test should
be entirely in-memory, so how come it's affected?
regards
--
Tomas Vondra
gconf.dev.
I'd be surprised if this was a regression, the hash table lookups are
not exactly free. And even if it was a minor regression, it'd affect
only cases with many NULL keys, but it improves robustness.
BTW do you consider this to be a bugfix for PG18? Or would it have to
wait for PG19 at this point?
regards
--
Tomas Vondra
On 4/30/25 14:39, Tomas Vondra wrote:
>
> On 4/18/25 03:03, Vinod Sridharan wrote:
>> ...
>>
>
> The patch seems fine to me - I repeated the tests with mailing list
> archives, with MemoryContextStats() in _gin_parallel_merge, and it
> reliably minimizes the memory
fine.
I was also worried if this might have performance impact, but it
actually seems to make it a little bit faster.
I'll get this pushed.
thanks
--
Tomas Vondra
ssGetMemoryContextInterrupt() do the
same thing?
In any case, if DSA happens to not be the right way to transfer this,
what should we use instead? The only thing I can think of is some sort
of pre-allocated chunk of shared memory.
regards
--
Tomas Vondra
o be verifying something that the loop
> condition was checking already. I thought it was better to check that
> we end up with a power-of-two.
>
> Please see the attached patch.
>
Thanks. Those changes seem fine to me to.
Do you intend to push these, or do you want me to do it?
regards
--
Tomas Vondra
cause of the RMT, but I'm also willing to do some of
the tests, if needed - but it'd be good to get some guidance.
regards
--
Tomas Vondra
ecksums by default, but now I realize the thread talks about "upgrade
experience" which seems fairly wide.
So, what kind of data we expect to gather in order to evaluate this?
Who's expected to collect it and evaluate this?
regards
--
Tomas Vondra
On 4/22/25 18:26, Peter Geoghegan wrote:
> On Tue, Apr 22, 2025 at 6:46 AM Tomas Vondra wrote:
>> here's an improved (rebased + updated) version of the patch series, with
>> some significant fixes and changes. The patch adds infrastructure and
>> modifies btree index
approaches
> to
> resolve this too).
>
Thanks for the report. I didn't have time to look at this in detail yet,
but the fix looks roughly correct. I've added this to the list of open
items for PG18.
regards
--
Tomas Vondra
bigint, perhaps?
Attached is v28, with the commit messages updated, added about
allocation of the memory, etc. I'll let the CI run the tests on it, and
then will push, unless someone has more comments.
regards
--
Tomas Vondra
From 9a222c77de2ee4a0b32d97c3d8bab2bb33f066de Mon Sep 17 00:0
> - It's currently doing the changes in pg_buffercache v1.6 but will need to
> create v1.7 for 19 (if the above stands true)
>
This seems like a good idea in principle, but at this point it has to
wait for PG19. Please add it to the July commitfest.
regards
--
Tomas Vondra
gt;
>>
>> Seeing no responses for a long time, I am planning to push the fix
>> till 14 tomorrow unless there are some opinions on the fix for 13. We
>> can continue to discuss the scope of the fix for 13.
>>
>
> Pushed till 14.
>
Thanks everyone who persevered and kept working on fixing this! Highly
appreciated.
regards
--
Tomas Vondra
On 4/9/25 17:51, Andres Freund wrote:
> Hi,
>
> On 2025-04-09 17:28:31 +0200, Tomas Vondra wrote:
>> On 4/9/25 17:14, Andres Freund wrote:
>>> I'd mention that the includes of postgres.h/fmgr.h is what caused missing
>>> build-time dependencies and via tha
On 4/9/25 17:14, Andres Freund wrote:
> Hi,
>
> On 2025-04-09 16:33:14 +0200, Tomas Vondra wrote:
>> From e1f093d091610d70fba72b2848f25ff44899ea8e Mon Sep 17 00:00:00 2001
>> From: Tomas Vondra
>> Date: Tue, 8 Apr 2025 23:31:29 +0200
>> Subject: [PATCH 1/2] Clea
On 4/9/25 01:29, Andres Freund wrote:
> Hi,
>
> On 2025-04-09 01:10:09 +0200, Tomas Vondra wrote:
>> On 4/8/25 15:06, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2025-04-08 17:44:19 +0500, Kirill Reshke wrote:
>>>> On Mon, 7 Apr 2025 at 23:00, To
Updated patches with proper commit messages etc.
--
Tomas Vondra
From e1f093d091610d70fba72b2848f25ff44899ea8e Mon Sep 17 00:00:00 2001
From: Tomas Vondra
Date: Tue, 8 Apr 2025 23:31:29 +0200
Subject: [PATCH 1/2] Cleanup of pg_numa.c
This moves/renames some of the functions defined in
On 4/9/25 14:07, Tomas Vondra wrote:
> ...
>
> OK, here are two patches, where 0001 adds the missingdeps check to the
> Debian meson build. It just adds that to the build script.
>
> 0002 leaves the NUMA stuff in src/port (i.e. it's no longer moved to
> src/backen
On 4/8/25 15:06, Andres Freund wrote:
> Hi,
>
> On 2025-04-08 17:44:19 +0500, Kirill Reshke wrote:
>> On Mon, 7 Apr 2025 at 23:00, Tomas Vondra wrote:
>>> I'll let the CI run the tests on it, and
>>> then will push, unless someone has more comments.
>
On 4/8/25 15:06, Andres Freund wrote:
> Hi,
>
> On 2025-04-08 17:44:19 +0500, Kirill Reshke wrote:
>> On Mon, 7 Apr 2025 at 23:00, Tomas Vondra wrote:
>>> I'll let the CI run the tests on it, and
>>> then will push, unless someone has more comments.
>
On 4/8/25 16:59, Andres Freund wrote:
> Hi,
>
> On 2025-04-08 09:35:37 -0400, Andres Freund wrote:
>> On April 8, 2025 9:21:57 AM EDT, Tomas Vondra wrote:
>>> On 4/8/25 15:06, Andres Freund wrote:
>>>> On 2025-04-08 17:44:19 +0500, Kirill Reshke wro
> The attached small patch fixes the manual.
>
Thank you for noticing this and for the fix! Pushed.
This also reminded me we agreed to change page_num to bigint, which I
forgot to change before commit. So I adjusted that too, separately.
regards
--
Tomas Vondra
On 4/7/25 17:51, Andres Freund wrote:
> Hi,
>
> On 2025-04-06 13:56:54 +0200, Tomas Vondra wrote:
>> On 4/6/25 01:00, Andres Freund wrote:
>>> On 2025-04-05 18:29:22 -0400, Andres Freund wrote:
>>>> I think one thing that the docs should mention is that callin
On 4/7/25 23:50, Jakub Wartak wrote:
> On Mon, Apr 7, 2025 at 11:27 PM Tomas Vondra wrote:
>>
>> Hi,
>>
>> I've pushed all three parts of v29, with some additional corrections
>> (picked lower OIDs, bumped catversion, fixed commit messages).
>
> H
Hi,
I've pushed all three parts of v29, with some additional corrections
(picked lower OIDs, bumped catversion, fixed commit messages).
On 4/7/25 23:01, Jakub Wartak wrote:
> On Mon, Apr 7, 2025 at 9:51 PM Tomas Vondra wrote:
>
>>> So it looks like that the new way to it
On 4/7/25 20:11, Bertrand Drouvot wrote:
> Hi,
>
> On Mon, Apr 07, 2025 at 12:42:21PM -0400, Andres Freund wrote:
>> Hi,
>>
>> On 2025-04-07 18:36:24 +0200, Tomas Vondra wrote:
>>
>> I was thinking of checking if the BufferDesc indicates BM_VALID or
>&g
ent patches are good enough
>> for PG18, with the current behavior, and then maybe improve that in
>> PG19.
>
> I think as long as the docs mention this with or it's ok for
> now.
>
OK, I'll add a warning explaining this.
regards
--
Tomas Vondra
ssion tests can't tell us much,
considering it didn't fail once with the reverted patch :-(
I did check the coverage in:
https://coverage.postgresql.org/src/backend/utils/hash/dynahash.c.gcov.html
and sure enough, dir_realloc() is not executed once. And there's a
couple more p
in os_page_status.
I intend to push 0001 and 0002 shortly, and 0003 after a bit more review
and testing, unless I hear objections.
regards
--
Tomas Vondra
From fcc4fc2ada33cbbc962d561ddeea6966f0d55492 Mon Sep 17 00:00:00 2001
From: Jakub Wartak
Date: Wed, 2 Apr 2025 12:29:22 +0200
Subject: [P
;> pages.
>>> + * It's a bit misleading to call that "aligned", no? */
>>> +
>>> + /* Get number of OS aligned pages */
>>> + shm_ent_page_count
>>> + = TYPEALIGN(os_page_size, ent->allocated_size) /
>>> os_page_size;
>>> +
>>> + /*
>>> + * If we get ever 0xff back from kernel inquiry, then we
>>> probably have
>>> + * bug in our buffers to OS page mapping code here.
>>> + */
>>> + memset(pages_status, 0xff, sizeof(int) * shm_ent_page_count);
>>
>> There's obviously no guarantee that shm_ent_page_count is a multiple of
>> os_page_size. I think it'd be interesting to show in the view when one shmem
>> allocation shares a page with the prior allocation - that can contribute a
>> bit
>> to contention. What about showing a start_os_page_id and end_os_page_id or
>> something? That could be a feature for later though.
>
> I was thinking about it, but it could be done when analyzing this
> together with data from pg_shmem_allocations(?) My worry is timing :(
> Anyway, we could extend this view in future revisions.
>
I'd leave this out for now. It's not difficult, but let's focus on the
other issues.
>>> +SELECT NOT(pg_numa_available()) AS skip_test \gset
>>> +\if :skip_test
>>> +\quit
>>> +\endif
>>> +-- switch to superuser
>>> +\c -
>>> +SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa;
>>> + ok
>>> +
>>> + t
>>> +(1 row)
>>
>> Could it be worthwhile to run the test if !pg_numa_available(), to test that
>> we do the right thing in that case? We need an alternative output anyway, so
>> that might be fine?
>
> Added. the meson test passes, but I'm sending it as fast as possible
> to avoid a clash with Tomas.
>
Please keep working on this. I may hava a bit of time in the evening,
but in the worst case I'll merge it into your patch.
regards
--
Tomas Vondra
he current backend, so I'd bet people
would not be happy with NULL, and would proceed to force the allocation
in some other way (say, a large query of some sort). Which obviously
causes a lot of other problems.
I can imagine having a flag that makes the allocation optional, but
there's no convenient way to pass that to a view, and I think most
people want the allocation anyway.
Especially for monitoring purposes, which usually happens in a new
connection, so the backend has little opportunity to allocate the pages
"naturally."
regards
--
Tomas Vondra
at right now, but at the very least we ought to
> document it.
>
+1 to documenting this
>
> On 2025-04-05 16:33:28 +0200, Tomas Vondra wrote:
>> The libnuma library is not available on 32-bit builds (there's no shared
>> object for i386), so we disable it in that
fields. Seems a bit weird, but we always did that - the patch
does not really change that.
I'll now mark this as committed. I haven't done about the alignment. My
conclusion from the discussion was we don't quite need to do that, but
if we do I think it's a matter for a separate patch - perhaps something
like the 0003.
Thanks for the patch, reviews, etc.
--
Tomas Vondra
On 3/24/25 16:25, Heikki Linnakangas wrote:
> On 24/03/2025 16:56, Tomas Vondra wrote:
>>
>>
>> On 3/23/25 17:43, Heikki Linnakangas wrote:
>>> On 21/03/2025 17:16, Andres Freund wrote:
>>>> Am I right in understanding that the only scenario (w
On 4/5/25 15:23, Tomas Vondra wrote:
> On 4/5/25 11:37, Bertrand Drouvot wrote:
>> Hi,
>>
>> On Fri, Apr 04, 2025 at 09:25:57PM +0200, Tomas Vondra wrote:
>>> OK,
>>>
>>> here's v25 after going through the patches once more, fixing the issues
&
On 4/5/25 11:37, Bertrand Drouvot wrote:
> Hi,
>
> On Fri, Apr 04, 2025 at 09:25:57PM +0200, Tomas Vondra wrote:
>> OK,
>>
>> here's v25 after going through the patches once more, fixing the issues
>> mentioned by Bertrand, etc.
>
> Thanks!
>
&
code gets multiple loops in
while (wpos < file->nbytes)
{
...
}
because bytestowrite will be the value from the last loop? I haven't
tried, but I guess writing wide tuples (more than 8k) might fail.
regards
--
Tomas Vondra
in the function comment, but I'm
also not quite sure I understand what "output shared memory" is ...
regards
--
Tomas Vondra
From 381c5077592e38dbcbbf6acc4f1e86a767a92957 Mon Sep 17 00:00:00 2001
From: Jakub Wartak
Date: Wed, 2 Apr 2025 12:29:22 +0200
Subject: [PATCH v25 1/5]
Yes, I agree.
regards
--
Tomas Vondra
On 4/4/25 08:50, Bertrand Drouvot wrote:
> Hi,
>
> On Thu, Apr 03, 2025 at 08:53:57PM +0200, Tomas Vondra wrote:
>> On 4/3/25 15:12, Jakub Wartak wrote:
>>> On Thu, Apr 3, 2025 at 1:52 PM Tomas Vondra wrote:
>>>
>>>> ...
>>>>
>>&
On 4/4/25 09:35, Jakub Wartak wrote:
> On Fri, Apr 4, 2025 at 8:50 AM Bertrand Drouvot
> wrote:
>>
>> Hi,
>>
>> On Thu, Apr 03, 2025 at 08:53:57PM +0200, Tomas Vondra wrote:
>>> On 4/3/25 15:12, Jakub Wartak wrote:
>>>>
On 4/3/25 15:12, Jakub Wartak wrote:
> On Thu, Apr 3, 2025 at 1:52 PM Tomas Vondra wrote:
>
>> ...
>>
>> So unless someone can demonstrate a use case where this would matter,
>> I'd not worry about it too much.
>
> OK, fine for me - just 3 cols for p
On 4/3/25 10:23, Bertrand Drouvot wrote:
> Hi,
>
> On Thu, Apr 03, 2025 at 09:01:43AM +0200, Jakub Wartak wrote:
>> On Wed, Apr 2, 2025 at 6:40 PM Tomas Vondra wrote:
>>
>> Hi Tomas,
>>
>>> OK, so you agree the commit messages are complete / correct
On 4/3/25 09:01, Jakub Wartak wrote:
> On Wed, Apr 2, 2025 at 6:40 PM Tomas Vondra wrote:
>
> Hi Tomas,
>
>> OK, so you agree the commit messages are complete / correct?
>
> Yes.
>
>> OK. FWIW if you disagree with some of my proposed changes, feel free to
&
On 4/2/25 17:45, Peter Geoghegan wrote:
> On Wed, Apr 2, 2025 at 11:36 AM Tom Lane wrote:
>> Ouch! I had no idea it had gotten that big. Yeah, we ought to
>> do something about that.
>
> Tomas Vondra talked about this recently, in the context of his work on
> prefe
1 - 100 of 1787 matches
Mail list logo