Re: index prefetching

2025-08-12 Thread Peter Geoghegan
its" issue seems like an issue with the instrumentation itself. Possibly one that is totally unrelated to everything else we're discussing. -- Peter Geoghegan

Re: index prefetching

2025-08-12 Thread Peter Geoghegan
#x27;s split IO handling") fixed the issue, without anyone >> realizing that the bug in question could manifest like this. > > I can't explain that. If you can consistently reproduce the change at > the two base commits, maybe bisect? Commit b4212231 was a wild guess on my part. Probably should have refrained from that. -- Peter Geoghegan

Re: index prefetching

2025-08-12 Thread Peter Geoghegan
don't need to depend on heuristic-driven OS readahead. Maybe that was wrong. -- Peter Geoghegan

Re: index prefetching

2025-08-11 Thread Peter Geoghegan
Thomas ("Fix rare bug in read_stream.c's split IO handling") fixed the issue, without anyone realizing that the bug in question could manifest like this. -- Peter Geoghegan

Re: index prefetching

2025-08-11 Thread Peter Geoghegan
guess we could re-read the internal page only when prefetching later leaf pages starts to look like a good idea, but that's another complicated code path to maintain. -- Peter Geoghegan

Re: index prefetching

2025-08-11 Thread Peter Geoghegan
y similarly, but somehow have very different performance). I can't really justify that, but my gut feeling is that that's the best place to focus our efforts for the time being. -- Peter Geoghegan

Re: index prefetching

2025-08-06 Thread Peter Geoghegan
t might still be interesting to > think about opportunities to do fuzzier speculative lookahead. I'll > start a new thread. That sounds interesting. I worry that we won't ever be able to get away without some fallback that behaves roughly like OS readahead. -- Peter Geoghegan

Re: index prefetching

2025-08-06 Thread Peter Geoghegan
pin-like mechanism to avoid unsafe concurrent TID recycling hazards). If, in the end, the only solution that really works for GiST is a more aggressive/invasive one than we'd prefer, then making those changes must have been inevitable all along -- even with the old amgettuple interface. That's why I'm not too worried about GiST ordered scans; we're not making that problem any harder to solve. It's even possible that it'll be a bit *easier* to fix the problem with the new batch interface, since it somewhat normalizes the idea of hanging on to buffer pins for longer. -- Peter Geoghegan

Re: index prefetching

2025-08-05 Thread Peter Geoghegan
ranch is subtly broken, but we can't in good conscience ignore those problems while making these kinds of changes. > It doesn't need to be committable, just good enough to be reasonably > certain it's possible. That's what I have in mind, too. If we have support for a second index AM, then we're much less likely to over-optimize for nbtree in a way that doesn't really make sense. > Understood, and I agree in principle. It's just that given the fuzziness > I find it hard how it should look like. I suspect that index AMs are much more similar for the purposes of prefetching than they are in other ways. -- Peter Geoghegan

Re: index prefetching

2025-08-05 Thread Peter Geoghegan
that's what's needed to make optimal choices. > 4) More testing to minimize the risk of regressions. > > 5) Figuring out how to make this work for IOS (the simple patch has some > special logic in the callback, which may not be great, not sure what's > the right solution in the complex patch). I agree that all these items are probably the biggest risks to the project. I'm not sure that I can attribute this to the use of the "complex" approach over the "simple" approach. > 6) I guess that this means "unknown unknowns", which are another significant risk. -- Peter Geoghegan

Re: Changes in inline behavior on O2 optimization level for GCC 10+

2025-07-28 Thread Peter Geoghegan
bute_always_inline" can be used to force the compiler to inline a function. I'd be very surprised if GCC 10 failed to honor the underlying "__attribute__((always_inline))" function attribute. -- Peter Geoghegan

Re: index prefetching

2025-07-24 Thread Peter Geoghegan
ded batches. In > principle you get that from v3 by filtering, but it might be slow on > large indexes. I'll try doing that in v3. Cool. -- Peter Geoghegan

Re: index prefetching

2025-07-24 Thread Peter Geoghegan
ve to the right without doing anything. Note that the rightmost page cannot be P_IGNORE(). This scheme will always succeed, no matter the nblocks argument, provided the initial leaf page is a valid leaf page (and provided the nblocks arg is >= 1). I get that this is just a prototype that might not go anywhere, but the scheme I've described requires few changes. -- Peter Geoghegan

Re: index prefetching

2025-07-23 Thread Peter Geoghegan
On Wed, Jul 23, 2025 at 9:59 PM Peter Geoghegan wrote: > Tomas' index-prefetch-simple-master branch: > │ I/O Timings: shared read=1490.918 > │ Execution Time: 2015.731 ms > Complex patch (same prewarming/eviction are omitted this time): > │ I/O Timings:

Re: index prefetching

2025-07-23 Thread Peter Geoghegan
On Wed, Jul 23, 2025 at 12:36 PM Peter Geoghegan wrote: > * The TPC-C order line table primary key. I tested this for myself. Tomas' index-prefetch-simple-master branch: set max_parallel_workers_per_gather =0; SELECT pg_buffercache_evict_relation('order_line');

Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

2025-07-23 Thread Peter Geoghegan
y memory bloat problems in the common case where the scan isn't running in an EXPLAIN ANALYZE. -- Peter Geoghegan

Re: index prefetching

2025-07-23 Thread Peter Geoghegan
On Tue, Jul 22, 2025 at 9:31 PM Peter Geoghegan wrote: > I'll get back to you on this soon. There are plenty of indexes that > are not perfectly correlated (like pgbench_accounts_pkey is) that'll > nevertheless benefit significantly from the approach taken by the > complex p

Re: index prefetching

2025-07-22 Thread Peter Geoghegan
of data sets from "perfectly clean" to "random" and see how the > patch(es) behave on all of them. Obviously none of your test cases are invalid -- they're all basically reasonable, when considered in isolation. But the "linear_1" test is *far* closer to the &

Re: index prefetching

2025-07-22 Thread Peter Geoghegan
7;t know about Tomas, but I've given almost no thought to INDEX_SCAN_MAX_BATCHES specifically just yet. -- Peter Geoghegan

Re: index prefetching

2025-07-22 Thread Peter Geoghegan
On Tue, Jul 22, 2025 at 5:11 PM Andres Freund wrote: > On 2025-07-18 23:25:38 -0400, Peter Geoghegan wrote: > > > To some degree the table AM will need to care about the index level > > > batching - > > > we have to be careful about how many pages we keep p

Re: index prefetching

2025-07-22 Thread Peter Geoghegan
which will store about 370 when the index is in a pristine state. It does matter, but in the grand scheme of things it's unlikely to be decisive. -- Peter Geoghegan

Re: index prefetching

2025-07-22 Thread Peter Geoghegan
302 │ (356,16) │ │ 303 │ (235,10) │ │304 │ (812,18) │ │305 │ (675,1) │ │306 │ (258,13) │ │307 │ (1187,9) │ │308 │ (185,2) │ │309 │ (179,2) │ │310 │ (951,2) │ └┴───┘ (310 rows) There's actually 55,556 heap blocks in total in the underlying table. So clearly there is some correlation here. Just not enough to ever matter very much to prefetching. Again, the sole test case that has that quality to it is the "linear" test case. -- Peter Geoghegan

Re: index prefetching

2025-07-22 Thread Peter Geoghegan
On Tue, Jul 22, 2025 at 1:35 PM Peter Geoghegan wrote: > What is the difference between cases like "linear / eic=16 / sync" and > "linear_1 / eic=16 / sync"? I figured this out for myself. > One would imagine that these tests are very similar, based on the fact

Re: index prefetching

2025-07-22 Thread Peter Geoghegan
On Tue, Jul 22, 2025 at 1:35 PM Peter Geoghegan wrote: > I attach pgbench_accounts_pkey_nhblks.txt, which shows a query that > (among other things) ouputs "nhblks" for each leaf page from a given > index (while showing the details of each leaf page in index key space > or

Re: index prefetching

2025-07-22 Thread Peter Geoghegan
t that they have very similar names. But we see very different results for each: with the former ("linear") test results, the "complex" patch is 2x-4x faster than the "simple" patch. But, with the latter test results ("linear_1", and other similar pairs of &qu

Re: Issues with hash and GiST LP_DEAD setting for kill_prior_tuple

2025-07-21 Thread Peter Geoghegan
into a new layer that will decide when to release each index page buffer pin (and call _hash_kill_items-like routines) based on its own criteria. -- Peter Geoghegan

Re: index prefetching

2025-07-18 Thread Peter Geoghegan
hat the table AM has to call some indexam function to release > index-batches, whenever it doesn't need the reference anymore? And the > index-batch release can then unpin? It does. But that can be fairly generic -- btfreebatch will probably end up looking very similar to (say) hashfreebatch and gistfreebatch. Again, the indexam.c layer actually gets to decide when it happens -- that's what I meant about it being under its control (I didn't mean that it literally did everything without involving the index AM). -- Peter Geoghegan

Re: index prefetching

2025-07-18 Thread Peter Geoghegan
AM the freedom to do its own kind of batch access at the level of heap pages. We don't necessarily have to figure all that out in the first committed version, though. -- Peter Geoghegan

Re: index prefetching

2025-07-18 Thread Peter Geoghegan
"complex" approach, but as I > said, I'm sure I can't pull that off on my own. With your help, I think > the chance of success would be considerably higher. I can commit to making this project my #1 focus for Postgres 19 (#1 focus by far), provided the "complex" approach is used - just say the word. I cannot promise that we will be successful. But I can say for sure that I'll have skin in the game. If the project fails, then I'll have failed too. > Does this clarify how I think about the complex patch? Yes, it does. BTW, I don't think that there's all that much left to be said about nbtree in particular here. I don't think that there's very much work left there. -- Peter Geoghegan

Re: Returning nbtree posting list TIDs in DESC order during backwards scans

2025-07-17 Thread Peter Geoghegan
On Thu, Jul 17, 2025 at 2:26 PM Peter Geoghegan wrote: > The loop has an early check for this (for non-itemDead entries) here: > > /* Quickly skip over items never ItemDead-set by btgettuple */ > if (!kitem->itemDead) > continue; > > I really do

Re: Returning nbtree posting list TIDs in DESC order during backwards scans

2025-07-17 Thread Peter Geoghegan
ch this patch removes) can be in ascending leaf-page-wise order, descending leaf-page-wise order, or (with a scrollable cursor) some random mix of the two -- even when there's no posting list tuples involved. -- Peter Geoghegan

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
x" approach. We chatted briefly on IM, and he seems more optimistic about it than I thought (in my on-list remarks from earlier). It is definitely his patch, and I don't want to speak for him. -- Peter Geoghegan

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
hat's what heapam expects, and what amgettuple (which I'd like to replace with amgetbatch) does. -- Peter Geoghegan

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
that won't break either patch/approach. The index AM is obligated to pass back heap TIDs, without any external code needing to understand these sorts of implementation details. The on-disk representation of TIDs remains an implementation detail known only to index AMs. -- Peter Geoghegan

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
My guess is that it's due to the less efficient memory allocation with batching. Obviously this isn't acceptable, but I'm not particularly concerned about it right now. I was actually pleased to see that there wasn't a much larger regression there. -- Peter Geoghegan

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
keeps btgettuple in largely its current form. I agree that having such a GUC is important during development, and will try to add it back soon. It'll have to work in some completely different way, but that still shouldn't be difficult. -- Peter Geoghegan

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
that (say) Thomas will be able to add pause-and-resume to the read stream interface some time soon, at which point the regressions we see with the "simple" patch (but not the "complex" patch) go away? -- Peter Geoghegan

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
petergeoghegan/postgres/tree/index-prefetch-2025-pg-revisions-v0.11 I think that the version that Tomas must have used is a few days old, and might be a tiny bit different. But I don't think that that's likely to matter, especially not if you just want to get the general idea. -- Peter Geoghegan

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
On Wed, Jul 16, 2025 at 1:42 PM Tomas Vondra wrote: > On 7/16/25 16:45, Peter Geoghegan wrote: > > I get that index characteristics could be the limiting factor, > > especially in a world where we're not yet eagerly reading leaf pages. > > But that in no way justi

Re: Saving stack space in nbtree's _bt_first function

2025-07-16 Thread Peter Geoghegan
magine how it could possibly be relevant. Thanks for the review -- Peter Geoghegan

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
On Wed, Jul 16, 2025 at 11:29 AM Peter Geoghegan wrote: > For example, with "linear_10 / eic=16 / sync", it looks like "complex" > has about half the latency of "simple" in tests where selectivity is > 10. The advantage for "complex" is even gre

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
t;, it looks like "complex" has about half the latency of "simple" in tests where selectivity is 10. The advantage for "complex" is even greater at higher "selectivity" values. All of the other "linear" test results look about the same. Have I missed something? -- Peter Geoghegan

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
> I think it'll need to do something like that in some cases, when we need > to limit the number of leaf pages kept in memory to something sane. That's the only reason? The memory usage for batches? That doesn't seem like a big deal. It's something to keep an eye on, but I see no reason why it'd be particularly difficult. Doesn't this argue for the "complex" patch's approach? -- Peter Geoghegan

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
uot; the stream, > without resetting the distance etc. But we don't have that, and the > reset thing was suggested to me as a workaround. Does the "complex" patch require a similar workaround? Why or why not? -- Peter Geoghegan

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
t BM_VALID set, > of course). Thus we won't start another IO for that block. Even if it was worth optimizing for, it'd probably still be too far down the list of problems to be worth discussing right now. -- Peter Geoghegan

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
sue with no starting the I/O earlier. The fadvise is just > easier to trace/inspect. It's not at all surprising that you're seeing duplicate prefetch requests. I have no reason to believe that it's important to suppress those ourselves, rather than leaving it up to the OS (though I also have no reason to believe that the opposite is true). -- Peter Geoghegan

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
On Wed, Jul 16, 2025 at 9:36 AM Peter Geoghegan wrote: > Another issue with the "simple" patch: it adds 2 bool fields to > "BTScanPosItem". That increases its size considerably. We're very > sensitive to the size of this struct (I think that you know about this

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
about this already). Bloating it like this will blow up our memory usage, since right now we allocate MaxTIDsPerBTreePage/1358 such structs for so->currPos (and so->markPos). Wasting all that memory on alignment padding is probably going to have consequences beyond memory bloat. -- Peter Geoghegan

Re: patch: Use pg_assume in jsonb_util.c to fix GCC 15 warnings

2025-07-15 Thread Peter Geoghegan
seems worth considering. -- Peter Geoghegan

Re: patch: Use pg_assume in jsonb_util.c to fix GCC 15 warnings

2025-07-15 Thread Peter Geoghegan
mpiler. > I'd be a bit worried about > creating a back-patching mine-field. But maybe these are all > in spots we're unlikely to touch? That seems like much less of a problem for a purely subtractive change such as this. -- Peter Geoghegan

Re: patch: Use pg_assume in jsonb_util.c to fix GCC 15 warnings

2025-07-15 Thread Peter Geoghegan
compiler quiet" variable initializations that follow an elog/ereport with elevel >= ERROR. My guess is that we have several hundred. -- Peter Geoghegan

Re: patch: Use pg_assume in jsonb_util.c to fix GCC 15 warnings

2025-07-15 Thread Peter Geoghegan
elog(ERROR) related warnings/to suppose that we no longer have to add "keep compiler quiet" variable initializations after an elog(ERROR). If it works for MSVC + pg_assume, then it ought to also work for MSVC + pg_unreachable. Right? -- Peter Geoghegan

Re: Saving stack space in nbtree's _bt_first function

2025-07-15 Thread Peter Geoghegan
essage. v2 also slightly simplifies the logic that the notnullkey code block uses to select the key's strategy (which is also the strategy that'll be used as _bt_first's strat_total later on), since that seems like it'll make my explanation slightly clearer to anybody that reads the code

Re: Issues with hash and GiST LP_DEAD setting for kill_prior_tuple

2025-07-15 Thread Peter Geoghegan
On Tue, Jul 15, 2025 at 2:19 PM Peter Geoghegan wrote: > * gistkillitems() correctly checks if the page's LSN has changed in > the period between when we initially read the leaf page and the period > when/after we accessed the heap. But (unlike nbtree), it fails to > acc

Re: index prefetching

2025-07-15 Thread Peter Geoghegan
On Sat, Jul 12, 2025 at 7:50 PM Peter Geoghegan wrote: > Why wouldn't we want to relieve all AMs of that responsibility? > Leaving it up to index AMs has resulted in subtle bugs [2][3], and > AFAICT has no redeeming quality. If affected index AMs were *forced* > to do *exactly*

Issues with hash and GiST LP_DEAD setting for kill_prior_tuple

2025-07-15 Thread Peter Geoghegan
al index AMs, and into indexam.c [1]. My hope is that centralizing this kind of logic in indexam.c will make these kinds of problems impossible. [1] https://commitfest.postgresql.org/patch/5721/ [2] https://commitfest.postgresql.org/patch/5542/ [3] https://postgr.es/m/cah2-wzmvvu2kmkyq8sueyxrc_roa5lfogde1-vdtorpk_m3...@mail.gmail.com -- Peter Geoghegan

Re: pg_overexplain extension name

2025-07-14 Thread Peter Geoghegan
x27;t find that inappropriate. > Do we want that as the name of this new extension? Personally, I think it's fine. We ought to discourage the idea that this is just "explain, but better". -- Peter Geoghegan

Re: index prefetching

2025-07-13 Thread Peter Geoghegan
ng on _bt_advance_array_keys to advance the array keys properly later on. If you're interested in why we need the remaining hard reset of the arrays within amrestrpos/btrestrpos, let me know and I'll explain.) -- Peter Geoghegan

Re: index prefetching

2025-07-12 Thread Peter Geoghegan
were *forced* to do *exactly* the same thing as each other (not just *oblidged* to do *almost* the same thing), it would make life easier for everybody. [1] https://www.postgresql.org/docs/current/index-locking.html [2] https://commitfest.postgresql.org/patch/5721/ [3] https://commitfest.postgresql.org/patch/5542/ -- Peter Geoghegan

Re: Can can I make an injection point wait occur no more than once?

2025-07-10 Thread Peter Geoghegan
on this thread. I also plan on using injection points to write a simple/serial regression test exercising the nbtree code that completes an incomplete split (following a hard crash/error). Thanks again -- Peter Geoghegan

Re: Can can I make an injection point wait occur no more than once?

2025-07-08 Thread Peter Geoghegan
_lock_and_validate_left as expected. I don't see any second WARNING indicating that we've taken _bt_lock_and_validate_left's unhappy path, though (and the test still fails). This doesn't look like an issue with the planner. I attach the relevant regression test output, that sho

Re: Can can I make an injection point wait occur no more than once?

2025-07-08 Thread Peter Geoghegan
n "meson test --suite setup --suite nbtree -q --print-errorlogs" in a loop 500 times on my Debian workstation without seeing any failures. Seems stable there. Whereas the FreeBSD target hasn't even passed once out of more than a dozen attempts. Seems to be reliably broken on FreeBSD.

Re: Can can I make an injection point wait occur no more than once?

2025-07-07 Thread Peter Geoghegan
; explain what the detach and wait functions actually do, and how and > > why one might want to call them together. > > If you see ways to improve the existing template, please feel free to > propose something, sure. I'll need to figure this out for myself first. -- Peter Geoghegan

Re: Can can I make an injection point wait occur no more than once?

2025-07-07 Thread Peter Geoghegan
7;t seem to explain what the detach and wait functions actually do, and how and why one might want to call them together. -- Peter Geoghegan

Can can I make an injection point wait occur no more than once?

2025-07-07 Thread Peter Geoghegan
all with similar code. That seems borderline unacceptable. It would be far preferable if I could just use some built-in way of waiting exactly once, that can be used directly from SQL, through the injection_points extension. That would allow me to write the isolation test without having to add an

Saving stack space in nbtree's _bt_first function

2025-07-06 Thread Peter Geoghegan
hich seems worthwhile given that the patch is so simple. I'll submit this patch to the next open commitfest for Postgres 19. -- Peter Geoghegan v1-0001-nbtree-Use-only-one-notnullkey-ScanKeyData.patch Description: Binary data

Adding support for Row Compares to nbtree startikey optimization

2025-07-05 Thread Peter Geoghegan
wer-order row compare member keys, at least. I'll submit this to the next open commitfest for Postgres 19. -- Peter Geoghegan v1-0001-Teach-nbtree-to-set-pstate.ikey-beyond-a-row-comp.patch Description: Binary data

Re: Making Row Comparison NULL row member handling more robust during skip scans

2025-07-01 Thread Peter Geoghegan
> PS. Not a new thing, but the "Add coverage..." comments in the tests > seem redundant. All tests add coverage for something. I'll adjust them along those lines. Unless there are any objections, and assuming you're done giving feedback, I'll commit both patches tomorrow. Thanks! -- Peter Geoghegan

Re: Making Row Comparison NULL row member handling more robust during skip scans

2025-07-01 Thread Peter Geoghegan
On Fri, Jun 27, 2025 at 5:35 PM Peter Geoghegan wrote: > Attached is v4, which is largely just a polished version of v3. It has > improved comments and more worked out commit messages. Plus the second > patch (the row compare patch) now teaches _bt_first to fully rely on > scan key

Re: Making Row Comparison NULL row member handling more robust during skip scans

2025-06-27 Thread Peter Geoghegan
On Wed, Jun 25, 2025 at 5:44 PM Peter Geoghegan wrote: > Correction: there *is* a live bug like this on Postgres 18/HEAD, which > involves simple scalar inequalities with an incomplete opfamily. > Attached v3 fixes the bug in its new 0001-* patch. Attached is v4, which is largely just a

Re: Inefficient nbtree behavior with row-comparison quals

2025-06-25 Thread Peter Geoghegan
exactly one best way to order and required-mark the keys output by preprocessing -- even with an incomplete opfamily. (And even with row compares that happen to be redundant or contradictory, which are similar.) -- Peter Geoghegan

Re: Making Row Comparison NULL row member handling more robust during skip scans

2025-06-25 Thread Peter Geoghegan
On Wed, Jun 18, 2025 at 8:41 PM Peter Geoghegan wrote: > In general, when we end a primitive index scan, the code that sets > continuescan=false (any such code, not just _bt_check_rowcompare NULL > row member code) has to make sure that starting the next primitive > index scan will ac

Re: Making Row Comparison NULL row member handling more robust during skip scans

2025-06-19 Thread Peter Geoghegan
On Wed, Jun 18, 2025 at 8:41 PM Peter Geoghegan wrote: > Attached patch shows how this could work. It refines the rules around > NULL row comparison members in _bt_check_rowcompare, and in _bt_first. > The fundamental idea here is to make _bt_check_rowcompare *consistent* > with _bt

Making Row Comparison NULL row member handling more robust during skip scans

2025-06-18 Thread Peter Geoghegan
heck_rowcompare behaviors from a great distance, which is pretty grotty. Seems worth fixing sooner rather than later. -- Peter Geoghegan v1-0001-Improve-_bt_first-row-compare-NULL-member-handlin.patch Description: Binary data

Re: Returning nbtree posting list TIDs in DESC order during backwards scans

2025-06-16 Thread Peter Geoghegan
On Mon, Jun 16, 2025 at 12:46 PM Peter Geoghegan wrote: > Making this change in _bt_readpage creates a problem in _bt_killitems: > it currently expects posting list TIDs in ascending order (the loop > invariant relies on that), which is why the deduplication patch didn't > ju

Returning nbtree posting list TIDs in DESC order during backwards scans

2025-06-16 Thread Peter Geoghegan
te that the qsort is completely useless during standard forwards scans. It's hard to tell those apart from scrollable cursor scans that briefly changed directions "within a page", though, so I'm inclined to just always do the qsort, rather than trying to cleverly avoid it. That'

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-06-11 Thread Peter Geoghegan
On Fri, May 2, 2025 at 3:04 PM Peter Geoghegan wrote: > The second patch is more complicated, and seems like something that > I'll need to spend more time thinking about before proceeding with > commit. It has subtle behavioral implications, in that it causes the > pstate

Removing BTScanPosUnpinIfPinned idiom from nbtree, simplifying mark/restore support

2025-06-11 Thread Peter Geoghegan
ight now is the complex rules around buffer pins, particularly with mark/restore, which is generally a very hard code path to test. [1] https://postgr.es/m/CAH2-Wzm3a8i31aBXi6oQt9uG7m1-xpX-MXjMMYoDJH=sbj1...@mail.gmail.com -- Peter Geoghegan v1-0001-Stop-conditioning-nbtree-actions-on-pin-being-hel.patch Description: Binary data

Re: strange perf regression with data checksums

2025-06-11 Thread Peter Geoghegan
On Tue, Jun 10, 2025 at 3:00 PM Peter Geoghegan wrote: > I have confirmed that this flavor of the problem has existed for a > long time. We'll need to backpatch the fix to all supported branches. > Current plan: Commit 0001 on all supported branches in the next couple > of da

Re: strange perf regression with data checksums

2025-06-10 Thread Peter Geoghegan
On Mon, Jun 9, 2025 at 10:47 AM Peter Geoghegan wrote: > As I said, I think that this is actually an old bug. After all, > _bt_killitems is required to test so->currPos.lsn against the same > page's current LSN if the page pin was dropped since _bt_readpage was > first called

Re: strange perf regression with data checksums

2025-06-09 Thread Peter Geoghegan
On Mon, Jun 9, 2025 at 10:47 AM Peter Geoghegan wrote: > I've always thought that the whole idiom of testing > BTScanPosIsPinned() in a bunch of places was very brittle. I wonder if > it makes sense to totally replace those calls/tests with similar tests > of the new so->

Re: strange perf regression with data checksums

2025-06-09 Thread Peter Geoghegan
This is also why we do things like allocate so->currTuples within btrescan. We don't yet know if the scan will be an index-only scan when btbeginscan is called. -- Peter Geoghegan

Re: strange perf regression with data checksums

2025-06-09 Thread Peter Geoghegan
've always thought that the whole idiom of testing BTScanPosIsPinned() in a bunch of places was very brittle. I wonder if it makes sense to totally replace those calls/tests with similar tests of the new so->dropPin field. -- Peter Geoghegan v1-0001-Fix-issue-with-mark-restore-nbtree-pins.patch Description: Binary data

Re: Batch TIDs lookup in ambulkdelete

2025-06-06 Thread Peter Geoghegan
On Fri, Jun 6, 2025 at 6:58 PM Masahiko Sawada wrote: > Agreed. Given the above test results, it's unlikely always sorting the > array helps speedups. Did you try specializing the sort? In my experience, it makes a big difference. -- Peter Geoghegan

Re: strange perf regression with data checksums

2025-06-06 Thread Peter Geoghegan
On Wed, Jun 4, 2025 at 1:39 PM Peter Geoghegan wrote: > My current plan is to commit this in the next couple of days. Pushed. It would be great if we could also teach BufferGetLSNAtomic() to just call PageGetLSN() (at least on PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY platforms), but my sense is t

Re: strange perf regression with data checksums

2025-06-04 Thread Peter Geoghegan
On Wed, Jun 4, 2025 at 10:21 AM Peter Geoghegan wrote: > On Wed, Jun 4, 2025 at 7:33 AM Tomas Vondra wrote: > > So better to get this in now, otherwise we may have to wait until PG19, > > because of ABI (the patch adds a field into BTScanPosData, but maybe > > it'd

Re: strange perf regression with data checksums

2025-06-04 Thread Peter Geoghegan
. There's a kind of symmetry to how things work with the patch in place, which IMV makes things simpler. -- Peter Geoghegan

Re: Correcting freeze conflict horizon calculation

2025-06-03 Thread Peter Geoghegan
On Tue, Jun 3, 2025 at 1:59 PM Peter Geoghegan wrote: > We do currently end up using OldestXmin-1 as our > snapshotConflictHorizon when this happens, but as I said in the other > email, I don't think that that's really required. Actually, that's not really true, either.

Re: Correcting freeze conflict horizon calculation

2025-06-03 Thread Peter Geoghegan
our snapshotConflictHorizon when this happens, but as I said in the other email, I don't think that that's really required. -- Peter Geoghegan

Re: Correcting freeze conflict horizon calculation

2025-06-02 Thread Peter Geoghegan
lly planning (originally) to write a patch to try > and change the horizon to make it older in more cases when it's > correct. I'm trying to figure out the most straightforward code to > calculate the combined snapshot conflict horizon for a prune/freeze/vm > update record. I figured that that was what this was really about. -- Peter Geoghegan

Re: Correcting freeze conflict horizon calculation

2025-06-02 Thread Peter Geoghegan
the oldest safe value in all cases. How much that actually buys you seems much less clear. -- Peter Geoghegan

Re: Correcting freeze conflict horizon calculation

2025-06-02 Thread Peter Geoghegan
he same place, it might make sense to do it just to make things more consistent. -- Peter Geoghegan

Re: Correcting freeze conflict horizon calculation

2025-06-02 Thread Peter Geoghegan
in cases where we still give up and just use OldestXmin (cases where we just use "OldestXmin - 1", I should say). I'm not sure how much this matters in practice; I imagine that using prunestate->visibility_cutoff_xid is very much the common case with page-level freezing. -- Peter Geoghegan

Re: RelationGetNumberOfBlocks called before vacuum_get_cutoffs

2025-06-02 Thread Peter Geoghegan
On Mon, Jun 2, 2025 at 10:27 AM Melanie Plageman wrote: > Attached what I plan to push shortly. Looks good to me. -- Peter Geoghegan

Re: RelationGetNumberOfBlocks called before vacuum_get_cutoffs

2025-06-01 Thread Peter Geoghegan
o that it took place next to the initialization of other, similar BlockNumber fields from LVRelState. IIRC I wrote a comment about this issue at least in part because I understood the temptation to do that. > I'll push the fix tomorrow. Cool. -- Peter Geoghegan

RelationGetNumberOfBlocks called before vacuum_get_cutoffs

2025-06-01 Thread Peter Geoghegan
ning/unfrozen XID value from one of these unscanned heap pages. -- Peter Geoghegan

Re: Correcting freeze conflict horizon calculation

2025-05-30 Thread Peter Geoghegan
;break" here, it doesn't matter what visibility_cutoff_xid has been set to. It cannot be used for any purpose. -- Peter Geoghegan

Re: Correcting freeze conflict horizon calculation

2025-05-30 Thread Peter Geoghegan
On Fri, May 30, 2025 at 5:16 PM Melanie Plageman wrote: > On Thu, May 29, 2025 at 11:37 AM Peter Geoghegan wrote: > > How can taking the "Avoids false conflicts when hot_standby_feedback > > in use" path more often result in fewer unnecessary conflicts on > > s

Re: Correcting freeze conflict horizon calculation

2025-05-29 Thread Peter Geoghegan
idTransactionId; } else { /* Avoids false conflicts when hot_standby_feedback in use */ snapshotConflictHorizon = vacrel->cutoffs.OldestXmin; TransactionIdRetreat(snapshotConflictHorizon); } How can taking the "Avoids false conflicts when hot_standby_feedback in use" path m

Re: Why our Valgrind reports suck

2025-05-23 Thread Peter Geoghegan
On Fri, May 23, 2025 at 11:42 AM Andres Freund wrote: > > I'm envisioning this patch series as v19 work, were you > > thinking we should be more aggressive? > > Mostly agreed - but I am wondering if the AV fix should be backpatched? I think that it probably should be. -- Peter Geoghegan

  1   2   3   4   5   6   7   8   9   10   >