from:"Peter Geoghegan"

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-09-13 Thread Peter Geoghegan

On Wed, Sep 10, 2025 at 3:41 PM Natalya Aksman wrote: > Fantastic, the patch is working, it fixes our issue! I pushed this patch just now. Thanks -- Peter Geoghegan

Re: Orphan page in _bt_split

2025-09-13 Thread Peter Geoghegan

g 2 local PGAlignedBlock variables, removing its use of PageGetTempPage. I don't think that it is necessary to consider other PageGetTempPage callers. -- Peter Geoghegan

Re: PostgreSQL 18 GA press release draft

2025-09-11 Thread Peter Geoghegan

echanism of the transformation is less important, but the > outcome is that people can benefit from the previous optimziation > without having to rewrite their queries. Sounds good. Thanks -- Peter Geoghegan

Re: PostgreSQL 18 GA press release draft

2025-09-10 Thread Peter Geoghegan

l together, I suggest the following alternative: "It can also automatically transform queries with `OR` constructs in their `WHERE` clause into a logically equivalent IN() representation that can be pushed down to index scan nodes, leading to significantly faster execution". -- Peter Geoghegan

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-09-10 Thread Peter Geoghegan

t us over). I also couldn't see anything like the 50% regression that Tomas reported. And I couldn't recreate any problem unless partitioning was used. -- Peter Geoghegan

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-09-10 Thread Peter Geoghegan

#x27;re seeing? TimescaleDB isn't following the letter of the law here. But I do still see the argument for consistently setting so->skipScan during preprocessing. That at least makes sense on general robustness grounds. -- Peter Geoghegan

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-09-10 Thread Peter Geoghegan

On Wed, Sep 10, 2025 at 12:45 PM Peter Geoghegan wrote: > I don't understand why it is that our not resetting the so->Skipscan > flag within btrescan has any particular significance to Timescaledb, > relative to all of the other fields that are supposed to be set by > _bt

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-09-10 Thread Peter Geoghegan

t seems at odds with the index AM API). [1] https://www.postgresql.org/docs/current/index-functions.html -- Peter Geoghegan

Re: index prefetching

2025-09-07 Thread Peter Geoghegan

> heap_hot_search_buffer()... Maybe, but I don't think that we're all that likely to get that done for 19. -- Peter Geoghegan

Re: Orphan page in _bt_split

2025-09-03 Thread Peter Geoghegan

um for this relation will be performed any more. What error? You showed an assertion failure, but that won't be hit in release builds. -- Peter Geoghegan

Re: index prefetching

2025-09-03 Thread Peter Geoghegan

; And if we want to have more complicated merging, that also seems > like something much easier to develop with some testing infra. Great. -- Peter Geoghegan

Re: index prefetching

2025-09-03 Thread Peter Geoghegan

ery is run with direct I/O, but that's far slower with or without the use of explicit prefetching, so that likely doesn't tell us much.) -- Peter Geoghegan

Re: Orphan page in _bt_split

2025-09-01 Thread Peter Geoghegan

remember that when I worked on what became commit 9b42e71376 back in 2019 (which fixed a similar problem caused by the INCLUDE index patch), Tom suggested that we do things this way defensively (without being specifically aware of the _bt_getbuf hazard). That does seem like the best approach. I'm a lit

Re: Orphan page in _bt_split

2025-09-01 Thread Peter Geoghegan

ng how many times successive inserters will try and inevitably fail to split the same page, creating a new junk page each time. -- Peter Geoghegan

Re: Orphan page in _bt_split

2025-09-01 Thread Peter Geoghegan

On Mon, Sep 1, 2025 at 3:04 PM Peter Geoghegan wrote: > There's just no reason to think that we'd ever be able to tie back one > of these LOG messages from VACUUM to the problem within _bt_split. > There's too many other forms of corruption that might result in VACUUM &

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-08-29 Thread Peter Geoghegan

otally unused options support function, which is probably fine. But since I don't really know why Alexander ever added the "options" support function in the first place (I don't even see a theoretical benefit), I'm not quite prepared to say that I know that it's okay to remove it now. -- Peter Geoghegan

Re: index prefetching

2025-08-28 Thread Peter Geoghegan

nable_indexscan_prefetch=off". So it's hard to believe that the true underlying problem is low queue depth. Though I certainly don't doubt that higher queue depths will help *when io_method=worker*. -- Peter Geoghegan

Re: index prefetching

2025-08-28 Thread Peter Geoghegan

merge conflicts to work around. I'm not sure that Thomas'/your patch to ameliorate the problem on the read stream side is essential here. Perhaps Andres can just take a look at the test case + feature branch, without the extra patches. That way he'll be able to see whatever the immediate problem is, which might be all we need. -- Peter Geoghegan

Re: index prefetching

2025-08-28 Thread Peter Geoghegan

theory of mine was correct, it would reconcile the big differences we see between "worker vs io_uring" with your patch + test case. -- Peter Geoghegan

Re: [WiP] B-tree page merge during vacuum to reduce index bloat

2025-08-27 Thread Peter Geoghegan

ed by other major RDBMSs, despite being a textbook technique that was known about and described early in the history of B-Trees. -- Peter Geoghegan

Re: index prefetching

2025-08-25 Thread Peter Geoghegan

:05:28.915 EDT postmaster[1236725] LOG: all server processes terminated; reinitializing __ Peter Geoghegan

Re: index prefetching

2025-08-25 Thread Peter Geoghegan

uch more sane, at least to me). For context, without your patch from today (but with the base index prefetching patch still applied), the same query takes 3162.195 ms. In spite of "shared read=" time being higher than any other case, and in spite of the fact that distance gets stuck at ~2/just looks wrong. (Like I said, the patch seems to actually make the problem worse on my system.) -- Peter Geoghegan

Re: index prefetching

2025-08-25 Thread Peter Geoghegan

x prefetching), it > looks like this: > So it's more a case of "mitigating a regression" (finding regressions > like this is the purpose of my script). Still, I believe the questions > about the distance heuristics are valid. > > (Another interesting detail is that the regression happens only with > io_method=worker, not with io_uring. I'm not sure why.) I find that the regression happens with io_uring. I also find that your patch doesn't fix it. I have no idea why. -- Peter Geoghegan

Re: index prefetching

2025-08-19 Thread Peter Geoghegan

On Tue, Aug 19, 2025 at 2:22 PM Peter Geoghegan wrote: > That definitely seems like a problem. I think that you're saying that > this problem happens because we have extra buffer hits earlier on, > which is enough to completely change the ramp-up behavior. This seems > to b

Re: index prefetching

2025-08-19 Thread Peter Geoghegan

downside is that that'd require adding logic/more branches to heapam_index_fetch_tuple to detect when to do this. I think that that approach is workable, if we really need it to work -- it's definitely an option. For now I would like to focus on debugging your problematic query (which doesn't sound like the kind of query that could benefit from initializing the read_stream when we're still only half-way through a batch). Does that make sense, do you think? -- Peter Geoghegan

Re: index prefetching

2025-08-17 Thread Peter Geoghegan

On Thu, Aug 14, 2025 at 10:12 PM Peter Geoghegan wrote: > As far as I know, we only have the following unambiguous performance > regressions (that clearly need to be fixed): > > 1. This issue. > > 2. There's about a 3% loss of throughput on pgbench SELECT. Updat

Re: index prefetching

2025-08-15 Thread Peter Geoghegan

an issue with a recent change of mine. -- Peter Geoghegan

Re: index prefetching

2025-08-15 Thread Peter Geoghegan

efetching patch and your patch together, right? Not just your own patch? My shared_buffers is 16GB, with pgbench scale 300. -- Peter Geoghegan

Re: index prefetching

2025-08-15 Thread Peter Geoghegan

e patch should be extremely cheap in > that case. I'm pretty confident. > What precisely were you testing? I'm just running my usual generic pgbench SELECT script, with my usual settings (so no direct I/O, but with iouring). -- Peter Geoghegan

Re: index prefetching

2025-08-15 Thread Peter Geoghegan

On Thu, Aug 14, 2025 at 10:12 PM Peter Geoghegan wrote: > As far as I know, we only have the following unambiguous performance > regressions (that clearly need to be fixed): > > 1. This issue. > > 2. There's about a 3% loss of throughput on pgbench SELECT. I did a quick p

Re: index prefetching

2025-08-15 Thread Peter Geoghegan

ooked at). Since, as you say, the backend didn't have to wait for I/O to complete either way. -- Peter Geoghegan

Re: index prefetching

2025-08-15 Thread Peter Geoghegan

On Fri, Aug 15, 2025 at 1:09 PM Andres Freund wrote: > On 2025-08-15 12:29:25 -0400, Peter Geoghegan wrote: > > FWIW, this development probably completely changes the results of many > > (all?) of your benchmark queries. My guess is that with Andres' patch, > > things

Re: index prefetching

2025-08-15 Thread Peter Geoghegan

On Fri, Aug 15, 2025 at 12:24 PM Peter Geoghegan wrote: > Good news here: with Andres' bufmgr patch applied, the similar forwards scan > query does indeed get more than 2x faster. And I don't mean that it gets > faster on the randomized table -- it actually gets 2x faster w

Re: index prefetching

2025-08-15 Thread Peter Geoghegan

│ Planning Time: 0.767 ms │ │ Execution Time: 279.643 ms │ └─┘ (10 rows) I _think_ that Andres' patch also fixes the EXPLAIN ANALYZE accounting, so that "I/O Timings" is actually correct. That's why EXPLAIN ANALYZE with the bufmgr patch has much higher "shared read" time, despite overall execution time being cut in half. -- Peter Geoghegan

Re: index prefetching

2025-08-14 Thread Peter Geoghegan

ting that process iteratively. It's quite likely that there are more performance issues/bugs that we don't yet know about. IMV it doesn't make sense to closely track individual queries that have only been moderately regressed. -- Peter Geoghegan

Re: index prefetching

2025-08-14 Thread Peter Geoghegan

#x27;ll see improved performance for many different types of queries. Not as big of a benefit as the one that the broken query will get, but still enough to matter. -- Peter Geoghegan

Re: index prefetching

2025-08-14 Thread Peter Geoghegan

On Thu, Aug 14, 2025 at 5:06 PM Peter Geoghegan wrote: > If this same mechanism remembered (say) the last 2 heap blocks it > requested, that might be enough to totally fix this particular > problem. This isn't a serious proposal, but it'll be simple enough to > implement. Ho

Re: index prefetching

2025-08-14 Thread Peter Geoghegan

lock being > read in by a read stream multiple times in close proximity sufficiently often > to make that worth it. We definitely need to be prepared for duplicate prefetch requests in the context of index scans. I'm far from sure how sophisticated that actually needs to be. Obviously the design choices in this area are far from settled right now. [1] dc1g2pkuo9ci.3mk1l3ybz2...@bowt.ie -- Peter Geoghegan

Re: index prefetching

2025-08-14 Thread Peter Geoghegan

ait dareq-sz f/s f_await aqu-sz %util nvme0n1 50401.00393.76 0.00 0.000.20 8.000.00 0.00 0.00 0.000.00 0.000.00 0.00 0.00 0.000.00 0.000.000.00 10.06 41.60 -- Peter Geoghegan

Re: index prefetching

2025-08-14 Thread Peter Geoghegan

On Thu, Aug 14, 2025 at 3:15 PM Peter Geoghegan wrote: > Then why does the exact same pair of runs show "I/O Timings: shared > read=194.629" for the sequential table backwards scan (with total > execution time 1132.360 ms), versus "I/O Timings: shared read=352.88"

Re: index prefetching

2025-08-14 Thread Peter Geoghegan

On Thu Aug 14, 2025 at 3:15 PM EDT, Peter Geoghegan wrote: > On Thu, Aug 14, 2025 at 2:53 PM Andres Freund wrote: >> I think this is just an indicator of being IO bound. > > Then why does the exact same pair of runs show "I/O Timings: shared > read=194.629" for the seq

Re: index prefetching

2025-08-14 Thread Peter Geoghegan

smaller size. I reduced max_sectors_kb from 128 to 8. That had no significant effect. > Could you show iostat for both cases? iostat has lots of options. Can you be more specific? -- Peter Geoghegan

Re: index prefetching

2025-08-14 Thread Peter Geoghegan

On Thu Aug 14, 2025 at 1:57 PM EDT, Peter Geoghegan wrote: > The only interesting thing about the flame graph is just how little > difference there seems to be (at least for this particular perf event > type). I captured method_io_uring.c DEBUG output from running each query in the serve

Re: index prefetching

2025-08-14 Thread Peter Geoghegan

On Wed Aug 13, 2025 at 8:59 PM EDT, Tomas Vondra wrote: > On 8/14/25 01:50, Peter Geoghegan wrote: >> I first made the order of the table random, except among groups of index >> tuples >> that have exactly the same value. Those will still point to the same 1 or 2 >>

Re: index prefetching

2025-08-13 Thread Peter Geoghegan

On Wed Aug 13, 2025 at 7:50 PM EDT, Peter Geoghegan wrote: > pg@regression:5432 [2476413]=# EXPLAIN (ANALYZE ,costs off, timing off) > SELECT * FROM t WHERE a BETWEEN 16336 AND 49103 ORDER BY

Re: index prefetching

2025-08-13 Thread Peter Geoghegan

ersion of the queries that I described to the list this evening -- and yet those are still very fast in terms of overall execution time (somehow, they are about as fast as the original variant, that will manage to combine I/Os, in spite of the obvious disadvantage of requiring random I/O for the heap accesses). -- Peter Geoghegan

Re: index prefetching

2025-08-13 Thread Peter Geoghegan

On Wed, Aug 13, 2025 at 7:51 PM Peter Geoghegan wrote: > Apparently random I/O is twice as fast as sequential I/O in descending order! > In > fact, this test case creates the appearance of random I/O being at least > slightly faster than sequential I/O for pages read in _asce

Re: index prefetching

2025-08-13 Thread Peter Geoghegan

2 │ │ (260994,13) │ 20,002 │ │ (260994,14) │ 20,002 │ │ (260994,15) │ 20,002 │ │ (260994,16) │ 20,002 │ │ (260994,17) │ 20,002 │ │ (260994,18) │ 20,002 │ │ (260994,19) │ 20,002 │ │ (260994,20) │ 20,002 │ │ (260994,21) │ 20,002 │ │ (260995,1) │ 20,002 │ └─┴┘ (96 rows) -- Peter Geoghegan

Re: index prefetching

2025-08-13 Thread Peter Geoghegan

uch faster, no matter the scan order. Should I expect this step to make the effect with duplicates being produced by read_stream_look_ahead to just go away, regardless of the scan direction in use? -- Peter Geoghegan

Re: index prefetching

2025-08-13 Thread Peter Geoghegan

ie-breaker heap column (it *remains* in ASC heap TID order in "t2"). In general, when doing this sort of analysis, I find it useful to manually verify that the data that I generated matches my expectations. Usually a quick check with pageinspect is enough. I'll just randomly select 2 - 3 leaf pages, and make sure that they all more or less match my expectations. -- Peter Geoghegan

Re: index prefetching

2025-08-13 Thread Peter Geoghegan

ore). As I touched on already, this effect can be seen even with perfectly correlated inserts. The effect is caused by the FSM having a tiny bit of space left on one heap page -- not enough space to fit an incoming heap tuple, but still enough to fit a slightly smaller heap tuple that is inserted shortly thereafter. You end up with exactly one index tuple whose heap TID is slightly out-of-order, though only every once in a long while. -- Peter Geoghegan

Re: index prefetching

2025-08-13 Thread Peter Geoghegan

position is undefined, just use the read position". That's just a guess, though. This issue is tricky to debug. I'm not yet used to debugging problems such as these (though I'll probably become an expert on it in the months ahead). -- Peter Geoghegan

Re: index prefetching

2025-08-12 Thread Peter Geoghegan

its" issue seems like an issue with the instrumentation itself. Possibly one that is totally unrelated to everything else we're discussing. -- Peter Geoghegan

Re: index prefetching

2025-08-12 Thread Peter Geoghegan

#x27;s split IO handling") fixed the issue, without anyone >> realizing that the bug in question could manifest like this. > > I can't explain that. If you can consistently reproduce the change at > the two base commits, maybe bisect? Commit b4212231 was a wild guess on my part. Probably should have refrained from that. -- Peter Geoghegan

Re: index prefetching

2025-08-12 Thread Peter Geoghegan

don't need to depend on heuristic-driven OS readahead. Maybe that was wrong. -- Peter Geoghegan

Re: index prefetching

2025-08-11 Thread Peter Geoghegan

Thomas ("Fix rare bug in read_stream.c's split IO handling") fixed the issue, without anyone realizing that the bug in question could manifest like this. -- Peter Geoghegan

Re: index prefetching

2025-08-11 Thread Peter Geoghegan

guess we could re-read the internal page only when prefetching later leaf pages starts to look like a good idea, but that's another complicated code path to maintain. -- Peter Geoghegan

Re: index prefetching

2025-08-11 Thread Peter Geoghegan

y similarly, but somehow have very different performance). I can't really justify that, but my gut feeling is that that's the best place to focus our efforts for the time being. -- Peter Geoghegan

Re: index prefetching

2025-08-06 Thread Peter Geoghegan

t might still be interesting to > think about opportunities to do fuzzier speculative lookahead. I'll > start a new thread. That sounds interesting. I worry that we won't ever be able to get away without some fallback that behaves roughly like OS readahead. -- Peter Geoghegan

Re: index prefetching

2025-08-06 Thread Peter Geoghegan

pin-like mechanism to avoid unsafe concurrent TID recycling hazards). If, in the end, the only solution that really works for GiST is a more aggressive/invasive one than we'd prefer, then making those changes must have been inevitable all along -- even with the old amgettuple interface. That's why I'm not too worried about GiST ordered scans; we're not making that problem any harder to solve. It's even possible that it'll be a bit *easier* to fix the problem with the new batch interface, since it somewhat normalizes the idea of hanging on to buffer pins for longer. -- Peter Geoghegan

Re: index prefetching

2025-08-05 Thread Peter Geoghegan

ranch is subtly broken, but we can't in good conscience ignore those problems while making these kinds of changes. > It doesn't need to be committable, just good enough to be reasonably > certain it's possible. That's what I have in mind, too. If we have support for a second index AM, then we're much less likely to over-optimize for nbtree in a way that doesn't really make sense. > Understood, and I agree in principle. It's just that given the fuzziness > I find it hard how it should look like. I suspect that index AMs are much more similar for the purposes of prefetching than they are in other ways. -- Peter Geoghegan

Re: index prefetching

2025-08-05 Thread Peter Geoghegan

that's what's needed to make optimal choices. > 4) More testing to minimize the risk of regressions. > > 5) Figuring out how to make this work for IOS (the simple patch has some > special logic in the callback, which may not be great, not sure what's > the right solution in the complex patch). I agree that all these items are probably the biggest risks to the project. I'm not sure that I can attribute this to the use of the "complex" approach over the "simple" approach. > 6) I guess that this means "unknown unknowns", which are another significant risk. -- Peter Geoghegan

Re: Changes in inline behavior on O2 optimization level for GCC 10+

2025-07-28 Thread Peter Geoghegan

bute_always_inline" can be used to force the compiler to inline a function. I'd be very surprised if GCC 10 failed to honor the underlying "__attribute__((always_inline))" function attribute. -- Peter Geoghegan

Re: index prefetching

2025-07-24 Thread Peter Geoghegan

ded batches. In > principle you get that from v3 by filtering, but it might be slow on > large indexes. I'll try doing that in v3. Cool. -- Peter Geoghegan

Re: index prefetching

2025-07-24 Thread Peter Geoghegan

ve to the right without doing anything. Note that the rightmost page cannot be P_IGNORE(). This scheme will always succeed, no matter the nblocks argument, provided the initial leaf page is a valid leaf page (and provided the nblocks arg is >= 1). I get that this is just a prototype that might not go anywhere, but the scheme I've described requires few changes. -- Peter Geoghegan

Re: index prefetching

2025-07-23 Thread Peter Geoghegan

On Wed, Jul 23, 2025 at 9:59 PM Peter Geoghegan wrote: > Tomas' index-prefetch-simple-master branch: > │ I/O Timings: shared read=1490.918 > │ Execution Time: 2015.731 ms > Complex patch (same prewarming/eviction are omitted this time): > │ I/O Timings:

Re: index prefetching

2025-07-23 Thread Peter Geoghegan

On Wed, Jul 23, 2025 at 12:36 PM Peter Geoghegan wrote: > * The TPC-C order line table primary key. I tested this for myself. Tomas' index-prefetch-simple-master branch: set max_parallel_workers_per_gather =0; SELECT pg_buffercache_evict_relation('order_line');

Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

2025-07-23 Thread Peter Geoghegan

y memory bloat problems in the common case where the scan isn't running in an EXPLAIN ANALYZE. -- Peter Geoghegan

Re: index prefetching

2025-07-23 Thread Peter Geoghegan

On Tue, Jul 22, 2025 at 9:31 PM Peter Geoghegan wrote: > I'll get back to you on this soon. There are plenty of indexes that > are not perfectly correlated (like pgbench_accounts_pkey is) that'll > nevertheless benefit significantly from the approach taken by the > complex p

Re: index prefetching

2025-07-22 Thread Peter Geoghegan

of data sets from "perfectly clean" to "random" and see how the > patch(es) behave on all of them. Obviously none of your test cases are invalid -- they're all basically reasonable, when considered in isolation. But the "linear_1" test is *far* closer to the &

Re: index prefetching

2025-07-22 Thread Peter Geoghegan

7;t know about Tomas, but I've given almost no thought to INDEX_SCAN_MAX_BATCHES specifically just yet. -- Peter Geoghegan

Re: index prefetching

2025-07-22 Thread Peter Geoghegan

On Tue, Jul 22, 2025 at 5:11 PM Andres Freund wrote: > On 2025-07-18 23:25:38 -0400, Peter Geoghegan wrote: > > > To some degree the table AM will need to care about the index level > > > batching - > > > we have to be careful about how many pages we keep p

Re: index prefetching

2025-07-22 Thread Peter Geoghegan

which will store about 370 when the index is in a pristine state. It does matter, but in the grand scheme of things it's unlikely to be decisive. -- Peter Geoghegan

Re: index prefetching

2025-07-22 Thread Peter Geoghegan

302 │ (356,16) │ │ 303 │ (235,10) │ │304 │ (812,18) │ │305 │ (675,1) │ │306 │ (258,13) │ │307 │ (1187,9) │ │308 │ (185,2) │ │309 │ (179,2) │ │310 │ (951,2) │ └┴───┘ (310 rows) There's actually 55,556 heap blocks in total in the underlying table. So clearly there is some correlation here. Just not enough to ever matter very much to prefetching. Again, the sole test case that has that quality to it is the "linear" test case. -- Peter Geoghegan

Re: index prefetching

2025-07-22 Thread Peter Geoghegan

On Tue, Jul 22, 2025 at 1:35 PM Peter Geoghegan wrote: > What is the difference between cases like "linear / eic=16 / sync" and > "linear_1 / eic=16 / sync"? I figured this out for myself. > One would imagine that these tests are very similar, based on the fact

Re: index prefetching

2025-07-22 Thread Peter Geoghegan

On Tue, Jul 22, 2025 at 1:35 PM Peter Geoghegan wrote: > I attach pgbench_accounts_pkey_nhblks.txt, which shows a query that > (among other things) ouputs "nhblks" for each leaf page from a given > index (while showing the details of each leaf page in index key space > or

Re: index prefetching

2025-07-22 Thread Peter Geoghegan

t that they have very similar names. But we see very different results for each: with the former ("linear") test results, the "complex" patch is 2x-4x faster than the "simple" patch. But, with the latter test results ("linear_1", and other similar pairs of &qu

Re: Issues with hash and GiST LP_DEAD setting for kill_prior_tuple

2025-07-21 Thread Peter Geoghegan

into a new layer that will decide when to release each index page buffer pin (and call _hash_kill_items-like routines) based on its own criteria. -- Peter Geoghegan

Re: index prefetching

2025-07-18 Thread Peter Geoghegan

hat the table AM has to call some indexam function to release > index-batches, whenever it doesn't need the reference anymore? And the > index-batch release can then unpin? It does. But that can be fairly generic -- btfreebatch will probably end up looking very similar to (say) hashfreebatch and gistfreebatch. Again, the indexam.c layer actually gets to decide when it happens -- that's what I meant about it being under its control (I didn't mean that it literally did everything without involving the index AM). -- Peter Geoghegan

Re: index prefetching

2025-07-18 Thread Peter Geoghegan

AM the freedom to do its own kind of batch access at the level of heap pages. We don't necessarily have to figure all that out in the first committed version, though. -- Peter Geoghegan

Re: index prefetching

2025-07-18 Thread Peter Geoghegan

"complex" approach, but as I > said, I'm sure I can't pull that off on my own. With your help, I think > the chance of success would be considerably higher. I can commit to making this project my #1 focus for Postgres 19 (#1 focus by far), provided the "complex" approach is used - just say the word. I cannot promise that we will be successful. But I can say for sure that I'll have skin in the game. If the project fails, then I'll have failed too. > Does this clarify how I think about the complex patch? Yes, it does. BTW, I don't think that there's all that much left to be said about nbtree in particular here. I don't think that there's very much work left there. -- Peter Geoghegan

Re: Returning nbtree posting list TIDs in DESC order during backwards scans

2025-07-17 Thread Peter Geoghegan

On Thu, Jul 17, 2025 at 2:26 PM Peter Geoghegan wrote: > The loop has an early check for this (for non-itemDead entries) here: > > /* Quickly skip over items never ItemDead-set by btgettuple */ > if (!kitem->itemDead) > continue; > > I really do

Re: Returning nbtree posting list TIDs in DESC order during backwards scans

2025-07-17 Thread Peter Geoghegan

ch this patch removes) can be in ascending leaf-page-wise order, descending leaf-page-wise order, or (with a scrollable cursor) some random mix of the two -- even when there's no posting list tuples involved. -- Peter Geoghegan

Re: index prefetching

2025-07-16 Thread Peter Geoghegan

x" approach. We chatted briefly on IM, and he seems more optimistic about it than I thought (in my on-list remarks from earlier). It is definitely his patch, and I don't want to speak for him. -- Peter Geoghegan

Re: index prefetching

2025-07-16 Thread Peter Geoghegan

hat's what heapam expects, and what amgettuple (which I'd like to replace with amgetbatch) does. -- Peter Geoghegan

Re: index prefetching

2025-07-16 Thread Peter Geoghegan

that won't break either patch/approach. The index AM is obligated to pass back heap TIDs, without any external code needing to understand these sorts of implementation details. The on-disk representation of TIDs remains an implementation detail known only to index AMs. -- Peter Geoghegan

Re: index prefetching

2025-07-16 Thread Peter Geoghegan

My guess is that it's due to the less efficient memory allocation with batching. Obviously this isn't acceptable, but I'm not particularly concerned about it right now. I was actually pleased to see that there wasn't a much larger regression there. -- Peter Geoghegan

Re: index prefetching

2025-07-16 Thread Peter Geoghegan

keeps btgettuple in largely its current form. I agree that having such a GUC is important during development, and will try to add it back soon. It'll have to work in some completely different way, but that still shouldn't be difficult. -- Peter Geoghegan

Re: index prefetching

2025-07-16 Thread Peter Geoghegan

that (say) Thomas will be able to add pause-and-resume to the read stream interface some time soon, at which point the regressions we see with the "simple" patch (but not the "complex" patch) go away? -- Peter Geoghegan

Re: index prefetching

2025-07-16 Thread Peter Geoghegan

petergeoghegan/postgres/tree/index-prefetch-2025-pg-revisions-v0.11 I think that the version that Tomas must have used is a few days old, and might be a tiny bit different. But I don't think that that's likely to matter, especially not if you just want to get the general idea. -- Peter Geoghegan

Re: index prefetching

2025-07-16 Thread Peter Geoghegan

On Wed, Jul 16, 2025 at 1:42 PM Tomas Vondra wrote: > On 7/16/25 16:45, Peter Geoghegan wrote: > > I get that index characteristics could be the limiting factor, > > especially in a world where we're not yet eagerly reading leaf pages. > > But that in no way justi

Re: Saving stack space in nbtree's _bt_first function

2025-07-16 Thread Peter Geoghegan

magine how it could possibly be relevant. Thanks for the review -- Peter Geoghegan

Re: index prefetching

2025-07-16 Thread Peter Geoghegan

On Wed, Jul 16, 2025 at 11:29 AM Peter Geoghegan wrote: > For example, with "linear_10 / eic=16 / sync", it looks like "complex" > has about half the latency of "simple" in tests where selectivity is > 10. The advantage for "complex" is even gre

Re: index prefetching

2025-07-16 Thread Peter Geoghegan

t;, it looks like "complex" has about half the latency of "simple" in tests where selectivity is 10. The advantage for "complex" is even greater at higher "selectivity" values. All of the other "linear" test results look about the same. Have I missed something? -- Peter Geoghegan

Re: index prefetching

2025-07-16 Thread Peter Geoghegan

> I think it'll need to do something like that in some cases, when we need > to limit the number of leaf pages kept in memory to something sane. That's the only reason? The memory usage for batches? That doesn't seem like a big deal. It's something to keep an eye on, but I see no reason why it'd be particularly difficult. Doesn't this argue for the "complex" patch's approach? -- Peter Geoghegan

Re: index prefetching

2025-07-16 Thread Peter Geoghegan

uot; the stream, > without resetting the distance etc. But we don't have that, and the > reset thing was suggested to me as a workaround. Does the "complex" patch require a similar workaround? Why or why not? -- Peter Geoghegan

Re: index prefetching

2025-07-16 Thread Peter Geoghegan

t BM_VALID set, > of course). Thus we won't start another IO for that block. Even if it was worth optimizing for, it'd probably still be too far down the list of problems to be worth discussing right now. -- Peter Geoghegan

Re: index prefetching

2025-07-16 Thread Peter Geoghegan

sue with no starting the I/O earlier. The fadvise is just > easier to trace/inspect. It's not at all surprising that you're seeing duplicate prefetch requests. I have no reason to believe that it's important to suppress those ourselves, rather than leaving it up to the OS (though I also have no reason to believe that the opposite is true). -- Peter Geoghegan

Re: index prefetching

2025-07-16 Thread Peter Geoghegan

On Wed, Jul 16, 2025 at 9:36 AM Peter Geoghegan wrote: > Another issue with the "simple" patch: it adds 2 bool fields to > "BTScanPosItem". That increases its size considerably. We're very > sensitive to the size of this struct (I think that you know about this

Re: index prefetching

2025-07-16 Thread Peter Geoghegan

about this already). Bloating it like this will blow up our memory usage, since right now we allocate MaxTIDsPerBTreePage/1358 such structs for so->currPos (and so->markPos). Wasting all that memory on alignment padding is probably going to have consequences beyond memory bloat. -- Peter Geoghegan

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1906 matches

Mail list logo