On Mon, Apr 22, 2024 at 11:13 AM Peter Geoghegan wrote:
> I'm pretty sure that I could fix this by simply removing the
> assertion. But I need to think about it a bit more before I push a
> fix.
>
> The test case you've provided proves that _bt_preprocess_keys's
> new no-op pa
keys call
per btrescan (independent of the presence of array keys) still seems
sound.
--
Peter Geoghegan
is wrong. It is testing behavior that's much older than
commit 5bf748b86b, though. We can just get rid of it, since all of the
information that we'll actually apply when preprocessing scan keys
comes from the operator class.
Pushed a fix removing the assertion just now. Thanks for the report.
--
Peter Geoghegan
The problem is that _bt_fix_scankey_strategy shouldn't have been doing
anything with already-eliminated array scan keys in the first place
(whether or not they're on a DESC index column). I just pushed a fix
along those lines.
Thanks for the report!
--
Peter Geoghegan
n failure within _bt_first -- the
_bt_verify_arrays_bt_first assertion would catch the violation of the
invariant (the_bt_advance_array_keys postcondition invariant/_bt_first
precondition invariant). So we kinda do have some test coverage for
this function already.
--
Peter Geoghegan
On Sun, Apr 14, 2024 at 9:01 PM David Rowley wrote:
> On Mon, 15 Apr 2024 at 11:47, Peter Geoghegan wrote:
> > Technically we don't promise that WAL records won't change in minor
> > versions. In fact, the docs specifically state that the format of any
> > WAL record might c
l that uses its queryid values to accumulate query costs.
While external tools can't understand the provenance of old queryid
values, pg_stat_statements can't either.
--
Peter Geoghegan
version
upgrade), then pg_stat_statements throws away the accumulated query
stats without being asked to.
--
Peter Geoghegan
e
consideration here. Say a week or two, to work through some of the
more complicated issues -- and to take a breather. I just don't see
any upside to rushing through this process, given where we are now.
--
Peter Geoghegan
urse it also
wouldn't be a bad idea to have a BF animal for that, especially
because we already have BF animals that test things far more niche
than this.
--
Peter Geoghegan
that the planner should always prefer index quals over expression
evaluation, on general principle, even when there's no reason to think
it'll work out. At worst the executor has essentially the same
physical access patterns as the expression evaluation case. On the
other hand, providing nbtree with that context might end up being a
great deal faster.
--
Peter Geoghegan
ortance of these sorts of cases. Most of these
cases will only have 2 or 3 constants, just because that's what's most
common in general.
--
Peter Geoghegan
On Sun, Apr 7, 2024 at 9:57 PM Peter Geoghegan wrote:
> On Sun, Apr 7, 2024 at 9:50 PM Tom Lane wrote:
> > those first two Asserts are redundant with the "if" as well.
>
> I'll get rid of those other two assertions as well, then.
Done that way.
--
Peter Geoghegan
>scan_key);
> Assert(!(cur->sk_flags & SK_SEARCHARRAY));
> }
>
> those first two Asserts are redundant with the "if" as well.
I'll get rid of those other two assertions as well, then.
--
Peter Geoghegan
ss_array_keys
* failed to eliminate redundant arrays through array merging.
* _bt_compare_scankey_args just returns false when it sees
* this; it won't even try to examine either array.
*/
Do you think it needs more work?
--
Peter Geoghegan
icious. It's possible that both of those scan keys actually did
have arrays, but _bt_compare_scankey_args just treats that as a case
of being unable to prove which scan key was redundant/contradictory
due to a lack of suitable cross-type support -- so the assertion won't
be reached.
Would Coverity stop complaining if I just removed the assertion? I
could just do that, I suppose, but that seems backwards to me.
--
Peter Geoghegan
rays left at this point, so "so->numArrayKeys == 0".
FWIW I missed this because the tests only cover cases with one SOAP
inequality, which will always return early from _bt_preprocess_keys
(by taking its generic single scan key fast path). Your test case has
2 scan keys, avoiding the fast path, allowing us to reach
_bt_preprocess_array_keys_final().
--
Peter Geoghegan
On Wed, Apr 3, 2024 at 1:04 PM Melanie Plageman
wrote:
> Thanks! And thanks for updating the commitfest entry!
Nice work!
--
Peter Geoghegan
s, but even there it provides
structure that makes things much easier to describe unambiguously.)
--
Peter Geoghegan
ath -- "disobeying" pagefrz.freeze_required creates the
risk that relfrozenxid/relminmxid will be advanced to unsafe values at
the end of the VACUUM. IMV you should stick with that approach now,
even if it is currently safe to do it the other way around.
--
Peter Geoghegan
optimization (at least an implementation along the lines of your
pseudo code). If that was what you intended, then it's not obvious to
me why it is relevant. What, if anything, does it have to do with
making the new checkunique visibility checks happen lazily?
--
Peter Geoghegan
e amount
of pruning performed opportunistically is sometimes much higher than
generally assumed, so having a way of measuring that seems like it
might lead to valuable insights.
--
Peter Geoghegan
pensive other array key infrastructure, and
> only if we're outside the minimum or maximum bounds of the
> non-required scankeys should we trigger advance_array_keys (unless
> scan direction changed).
I've thought about optimizing non-required arrays along those lines.
But it doesn't really seem to be necessary at all.
If we were going to do it, then it ought to be done in a way that
works independent of the trigger condition for array advancement (that
is, it'd work for non-required arrays that have a required scan key
trigger, too).
--
Peter Geoghegan
ent the last proposals could be a real game
> changer for 2 of our biggest databases. We hope that Postgres 17 will contain
> those improvements.
Current plan is to commit this patch in the next couple of weeks,
ahead of Postgres 17 feature freeze.
--
Peter Geoghegan
t failed" errors -- but that's just a symptom. It
just so happens that the hardening added to places like
_bt_swap_posting() and _bt_binsrch_insert() is much more likely to
visibly break than anything else, at least in practice.
--
Peter Geoghegan
flag to indicate whether or not a cleanup lock is needed.
Thanks for confirming.
I realize that this is fairly obvious, but thought it better to ask.
--
Peter Geoghegan
ACUUM record type? Will you preserve
that aspect of the existing design?
--
Peter Geoghegan
On Fri, Mar 8, 2024 at 2:14 PM Peter Geoghegan wrote:
> What benchmarking have you done here?
I think that the memcmp() test is subtly wrong:
> + if (PointerIsValid(rightsep))
> + {
> + /*
> +* Note: we're not in the rightmost page (see bran
rch() (obviously not doable for
non-_bt_first callers, which need to call _bt_binsrch_insert instead).
This whole approach will have been made easier by the refactoring I
did late last year, in commit c9c0589fda.
--
Peter Geoghegan
On Fri, Mar 8, 2024 at 11:00 AM Peter Geoghegan wrote:
> Seems like it might be possible to simplify/consolidate the VM-setting
> code that's now located at the end of lazy_scan_prune. Perhaps the two
> distinct blocks that call visibilitymap_set() could be combined into
> one.
ated at the end of lazy_scan_prune. Perhaps the two
distinct blocks that call visibilitymap_set() could be combined into
one.
--
Peter Geoghegan
t have the
VM's all-frozen bit set for a given block (not the all-visible bit
set) -- which is always wrong. There was good reason to be concerned
about that possibility when 980ae17310 went in.
--
Peter Geoghegan
eprocess_array_keys, but it's kinda the other way around now.
We could probably even get rid of this remaining limited form of
arrayKeyData, but that doesn't seem like it would add much.
--
Peter Geoghegan
On Wed, Mar 6, 2024 at 4:46 PM Matthias van de Meent
wrote:
> On Wed, 6 Mar 2024 at 01:50, Peter Geoghegan wrote:
> > I think that there is supposed to be a closing parenthesis here? So
> > "... (such as those described in ") might
> > perform...".
>
&
lly want to do here is to balance costs and benefits.
That's just what's required. The fact that those costs and benefits
span multiple levels of abstractions makes it a bit awkward, but
doesn't (and can't) change the basic shape of the problem.
--
Peter Geoghegan
more or
less worked as a reference of agreed upon best practices. Can we do
that part first, rather than starting out with a blanket assumption
that everything that happened before now must have been perfect?
--
Peter Geoghegan
ng makes are
false positives. At least in some deeper sense.
--
Peter Geoghegan
ifdef USE_ASSERT_CHECKING
case). It's also just really hard to understand what's going on here.
If I was going to do this kind of thing, I'd use two completely
separate loops, that were obviously completely separate (maybe even
two functions). I'd then memcmp() each array at the end.
--
Peter Geoghegan
move on to the next leaf page
It's possible that we'll need a variety of different strategies.
nbtree already has two such strategies in _bt_killitems(), in a way.
Though its "Modified while not pinned means hinting is not safe" path
(LSN doesn't match canary value path) seems pretty naive. The
prefetching stuff might present us with a good opportunity to replace
that with something fundamentally better.
--
Peter Geoghegan
ork
> for all indexes automatically - but given the current discussion about
> kill_prior_tuple, locking etc. I'm not sure that's really feasible.
>
> The index AM clearly needs to have more control over this.
Cool. I think that that makes the layering question a lot clearer, then.
--
Peter Geoghegan
On Thu, Feb 15, 2024 at 9:36 AM Tomas Vondra
wrote:
> On 2/15/24 00:06, Peter Geoghegan wrote:
> > I suppose that it might be much more important than I imagine it is
> > right now, but it'd be nice to have something a bit more concrete to
> > go on.
> >
>
> This
On Wed, Feb 14, 2024 at 7:28 PM Andres Freund wrote:
> On 2024-02-13 14:54:14 -0500, Peter Geoghegan wrote:
> > This property of index scans is fundamental to how index scans work.
> > Pinning an index page as an interlock against concurrently TID
> > recycling by VACUUM
led analysis. Plus you have problems with things like unlogged
indexes not having an LSN to use as a canary condition, which makes it
a bit messy (it's already kind of weird that we treat unlogged indexes
differently here IMV).
--
Peter Geoghegan
different layers of abstraction, because that's just fundamentally
what's required by the constraints we're operating under. Can't really
prove it, though.
--
Peter Geoghegan
LSNs (like using or not using an
unlogged index) could cause bugs in this area to "accidentally fail to
fail". Since the nbtree index AM has its own optimizations here, which
probably has a tendency to mask problems/bugs. (I sometimes use
unlogged indexes for some of my nbtree related test cases, just to
reduce certain kinds of variability, including variability in this
area.)
[1]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/nbtree/README;h=52e646c7f759a5d9cfdc32b86f6aff8460891e12;hb=3e8235ba4f9cc3375b061fb5d3f3575434539b5f#l443
--
Peter Geoghegan
otally
unsurprising that breaking kill_prior_tuple in some way could be
missed. Andres wrote the MVCC test in question precisely because
certain aspects of kill_prior_tuple were broken for months without
anybody noticing.
[1] https://www.postgresql.org/docs/devel/index-locking.html
--
Peter Geoghegan
On Fri, Jan 26, 2024 at 11:44 AM Robert Haas wrote:
> Thanks to you for the patches, and to Peter for participating in the
> discussion which, IMHO, was very helpful in clarifying things.
Glad I could help.
--
Peter Geoghegan
el(). So from that point on we should record the
free space in every scanned heap page in the "first heap pass" --
including pages that have LP_DEAD stubs that aren't going to be made
LP_UNUSED in the ongoing VACUUM.
--
Peter Geoghegan
On Mon, Jan 22, 2024 at 4:13 PM Matthias van de Meent
wrote:
> On Fri, 19 Jan 2024 at 23:42, Peter Geoghegan wrote:
> And this is where I disagree with your (percieved) implicit argument
> that this should be and always stay this way.
I never said that, and didn't intend to imply
are (which is
what we call _bt_checkkeys on HEAD, basically).
Note that the _bt_merge_arrays code that you've highlighted isn't
iterating through so->keyData[] -- it is iterating through the
function caller's elements array, which actually come from
so->arrayKeys[].
Like every other Postgres contributor, I do my best to follow the
conventions established by existing code. Sometimes that leads to
pretty awkward results, where CamelCase and underscore styles are
closely mixed together, because it works out to be the most consistent
way of doing it overall.
--
Peter Geoghegan
On Thu, Jan 18, 2024 at 11:46 AM Robert Haas wrote:
> On Thu, Jan 18, 2024 at 11:17 AM Peter Geoghegan wrote:
> > True. But the way that PageGetHeapFreeSpace() returns 0 for a page
> > with 291 LP_DEAD stubs is a much older behavior. When that happens it
> > is literally t
conceding that having a high degree of precision about available free
space isn't actually useful (or wouldn't be useful if it was actually
possible at all). Which is something that I generally agree with. I'd
just like it to be clear that you/Melanie are in fact taking one small
step in that direction. We don't need to discuss possible later steps
beyond that first step. Not right now.
--
Peter Geoghegan
rent design is at least correct on its
own terms. And what you propose to do will probably be less correct on
those same terms, silly though they are.
--
Peter Geoghegan
On Wed, Jan 17, 2024 at 5:47 PM Melanie Plageman
wrote:
>
> On Wed, Jan 17, 2024 at 4:31 PM Peter Geoghegan wrote:
> >
> > On Wed, Jan 17, 2024 at 4:25 PM Peter Geoghegan wrote:
> > > I tend to suspect that VACUUM_FSM_EVERY_PAGES is fundamentally the
> > >
On Wed, Jan 17, 2024 at 4:25 PM Peter Geoghegan wrote:
> I tend to suspect that VACUUM_FSM_EVERY_PAGES is fundamentally the
> wrong idea. If it's such a good idea then why not apply it all the
> time? That is, why not apply it independently of whether nindexes==0
> in the current VACU
good idea then why not apply it all the
time? That is, why not apply it independently of whether nindexes==0
in the current VACUUM operation? (You know, just like with
FAILSAFE_EVERY_PAGES.)
--
Peter Geoghegan
n
> > +-- should terminate, when in reality it should jump backwards to the leaf
> > page
> > +-- that we last visited.
>
> I notice this adds a complex test case that outputs many rows. Can we
> do with less rows if we build the index after data insertion, and with
> a lower (non-default) fillfactor?
Probably not. It was actually very hard to come up with these test
cases, which tickle the implementation in just the right way to
demonstrate that the code in places like _bt_steppage() is actually
required. It took me a rather long time to just prove that much. Not
sure that we really need this. But thought I'd include it for the time
being, just so that reviewers could understand those changes.
--
Peter Geoghegan
On Fri, Jan 12, 2024 at 1:52 PM Melanie Plageman
wrote:
> On Fri, Jan 12, 2024 at 1:07 PM Peter Geoghegan wrote:
> > What is "space_freed"? Isn't that something from your uncommitted patch?
>
> Yes, I was mixing the two together.
An understandable mistake.
> I ju
;. But lazy_scan_noprune doesn't get passed a
pointer to prunestate, so clearly you'll need to detect the same
condition some other way.
--
Peter Geoghegan
the
FSM is up-to-date.
In short, we do these things with the free space map because it is a
map of free space (which isn't crash safe) -- nothing more. I happen
to agree that that general design has a lot of problems, but those
seem out of scope here.
--
Peter Geoghegan
we know that there won't be any second heap pass, and so in both cases
we always call PageGetHeapFreeSpace() in the first heap pass. It's
just that it's a bit harder to see that in the lazy_scan_prune case.
No?
--
Peter Geoghegan
me that some PRUNE record must have
taken care of all that earlier on.
--
Peter Geoghegan
o be very effective in
practice. We only need to have the right *general* idea about which
heap pages to visit -- which heap pages will yield some number of
deletable index tuples.
--
Peter Geoghegan
ts
means.
We could even fully reverse heap page line pointer bloat under this
"transaction rollback" scheme -- I bet that aborted xacts are a
disproportionate source of line pointer bloat. Barring a hard crash,
or a very large transaction, we could "undo" the physical changes to
relations before permitting the backend to retry the transaction from
scratch. This would just work as an optimization.
--
Peter Geoghegan
Will you be in Prague this week? If not this might have to wait.
On Tue, Mar 21, 2023 at 3:37 PM Peter Geoghegan wrote:
> I think that we should do something like the attached, to completely
> avoid this ambiguity. This patch adds a new XLOG_HEAP2 bit that's
> similar to XLOG_HEAP_INIT_PAGE -- XLOG_HEAP2_BYVACUUM. This allows all
> XLOG_HEAP2
place from _bt_first (or
from _bt_endpoint). The first _bt_first call is already kind of
special, in a way that is directly related to this issue. I added some
comments about that to today's commit c9c0589fda, in fact -- I think
it's an important issue in general.
--
Peter Geoghegan
tproof, of course (it's not like older
Postgres versions actually provided _bt_checkkeys with opportunities
to "change its mind" in this sense), but it's a useful starting point
IME. It helps to build intuition.
--
Peter Geoghegan
h weird incomplete opfamilies to actually break the optimization --
that was required to tickle the special case in just the right/wrong
way.
--
Peter Geoghegan
On Tue, Dec 5, 2023 at 8:20 PM Peter Geoghegan wrote:
> On Tue, Dec 5, 2023 at 8:06 PM Alexander Korotkov
> wrote:
> > Thank you for raising this issue. Preprocessing of btree scan keys is
> > normally removing the redundant scan keys. However, redundant scan
> >
t
involves both > and < strategy inequalities, since that makes the
symmetry between the two optimizations clearer.
--
Peter Geoghegan
what you're doing here isn't all that different to
the way that we rely on required equality strategy scan keys being
used to build our initial insertion scan key, that determines where
the scan is initially positioned to within _bt_first. Inequalities
aren't all that different to equalities
On Tue, Dec 5, 2023 at 4:41 PM Peter Geoghegan wrote:
> "In general, when inequality keys are present, the initial-positioning
> code only promises to position before the first possible match, not
> exactly at the first match, for a forward scan; or after the last
> match fo
during the _bt_readpage precheck).
That being said, I wouldn't rule out problems for the precheck
optimization in the presence of opfamilies like the one from my test
case. I didn't get as far as exploring that side of things, at least.
--
Peter Geoghegan
test case highlights. (There were
also issues with NULLs, but AFAICT Alexander dealt with that aspect of
the problem already.)
[1]
https://postgr.es/m/CAH2-Wz=BuxYEHxpNH0tPvo=+g1wte1pamrovu1devow1vy9...@mail.gmail.com
--
Peter Geoghegan
require_opposite_dir_repro.sql
Description: Binary data
On Mon, Dec 4, 2023 at 7:25 PM Peter Geoghegan wrote:
> 2. The optimization that has _bt_checkkeys skip non-required scan keys
> that are *only* required in the direction *opposite* the current scan
> direction -- this can work even without any precheck from
> _bt_readpa
can't mix the array stuff with the
optimization stuff. We won't actually call _bt_checkkeys() in an
assertion when it can cause side-effects.
Assuming that we ultimately conclude that the optimizations *aren't*
worth preserving in any form, it might still be worth making it
obvious that the assertions have no side-effects. But that question is
unsettled right now.
Thanks for the review!
I'll try to get the next revision out soon. It'll also have bug fixes
for mark + restore and for a similar issue seen when the scan changes
direction in just the wrong way. (In short, the array key state
machine can be confused about scan direction in certain corner cases.)
--
Peter Geoghegan
to something on the
stack that will go out of scope, in a way that leaves open the
possibility of reading random stuff from the stack once inside
XLogInsert(). AddressSanitizer's use-after-scope can detect that sort
of thing, though not reliably.
Not at all sure about how feasible any of this is...
--
Peter Geoghegan
On Tue, Nov 28, 2023 at 3:52 PM Peter Geoghegan wrote:
> While I still think that we need heuristics that apply speculative
> criteria to decide whether or not going to the next page directly
> (when we have a choice at all), that doesn't mean that the v7
> heuristics can't
On Tue, Nov 28, 2023 at 9:19 AM Peter Geoghegan wrote:
> > I'm not convinced this is a problem we have to solve. It's possible it
> > only affects cases that are implausible in practice (the script forces a
> > particular scan type, and maybe it would not be picked in practi
se cases. But it's something that
deserves further consideration.
--
Peter Geoghegan
On Mon, Nov 27, 2023 at 5:07 PM Peter Geoghegan wrote:
> One of the reasons why we shouldn't do this during parse analysis is
> because query rewriting might matter. But that doesn't mean that the
> transformation/normalization process must fundamentally be the
> responsibility of t
ing if they apply at all).
>
> It's just a matter of figuring out where we can put the logic and have
> the result make sense. We'd like to put it someplace where it's not
> too expensive and gets the right answer.
Agreed.
--
Peter Geoghegan
r already knew about
directly, they still wouldn't really need to be costed. They're pretty
much strictly better at runtime (at most you only have to worry about
the fixed cost of determining if they apply at all).
--
Peter Geoghegan
On Fri, Nov 24, 2023 at 10:58 AM Tom Lane wrote:
> Peter Geoghegan writes:
> > I suppose that amcanorder=true cannot mean that, since we have the
> > SAOP path key thing (at least for now).
>
> As things stand, amcanorder definitely means that the index always
> ret
quot;
can play an important role in how it is interpreted. The fact that
some things remain ambiguous isn't necessarily a problem that needs to
be solved. If there is a real problem, then IMV it should always start
with a concrete complaint about how the API doesn't support certain
requirements. In other words, I'm of the opinion that such a complaint
should end with the API, as opposed to starting with it.
--
Peter Geoghegan
n equality constraint".
This is reliably faster, while also preserving index sort order,
almost as a side-effect.
--
Peter Geoghegan
ther
> in-tree AMs that have facilities where both `amcanorderbyop` and
> `amcanorder` are set.
The general notion of a data type's sort order comes from its default
btree operator class, so the whole idea of a generic sort order is
deeply tied to the nbtree AM. That's why we sometimes have btree
On Tue, Sep 5, 2023 at 2:59 AM Alvaro Herrera wrote:
> Much appreciated! I can put this to good use.
I was just reminded of how our existing backtrace support is lacklustre.
Are you planning on submitting a patch for this?
--
Peter Geoghegan
s not
> working as expected. But in fact, something
> more interesting is had happened.
Was pg_upgrade even run against this database? My guess is that the
underlying problem was caused by the bug fixed by commit 74cf7d46.
--
Peter Geoghegan
t now, and it involves 32-bit XIDs/MXIDs, and these two
functions. And, if we were to change something in this area, we'd
definitely need to provide for the needs of those monitoring queries I
mentioned.
--
Peter Geoghegan
like it might be more thorough.
Just a thought.
--
Peter Geoghegan
On Tue, Nov 7, 2023 at 5:53 PM Peter Geoghegan wrote:
> If you end up finding a bug in this v6, it'll most likely be a case
> where nbtree fails to live up to that. This project is as much about
> robust/predictable performance as anything else -- nbtree needs to be
>
, we already
support tools like pg_filedump, so this isn't a new principle.)
--
Peter Geoghegan
ys.
That code is still the most complicated part of the patch, but it's
much less of a bag of tricks. Another reason for you to hold off for a
few more days.
--
Peter Geoghegan
the really big wins here will come from simple
transformations. I see no reason why we can't take an incremental
approach. In fact I think we almost have to do so, since as I
understand it the transformations are just infeasible in certain
extreme cases.
--
Peter Geoghegan
in
heap page accesses is quite possible. This is particularly likely to
come up if you assume that the nbtree patch that I'm currently working
on is also available. In general, I think that we totally over-rely on
bitmap index scans, especially BitmapOrs.
--
Peter Geoghegan
that wasn't ever in question.)
I withdraw my suggestion about the wording from your patch. It seems
committable.
Thanks
--
Peter Geoghegan
understood to be settled.
>
> Understood. I'll see if I can improve the wording to something that is
> more clear about what the optimization entails.
Cool.
--
Peter Geoghegan
might make it even harder to understand the design of the
bt_child_highkey_check() check. On the other hand...I'm not sure that
I understand every nuance of it myself.
> > Suggest adding a CHECK_FOR_INTERRUPTS() call to the loop, too, just
> > for good luck.
>
> Added. That gave me the idea to check for circular links, like other parts of
> amcheck do.
Good idea.
--
Peter Geoghegan
1 - 100 of 3050 matches
Mail list logo