and get a change in behavior,
then surely any related indexes must have been rebuilt too. The
interesting part may be what that upgrade looks like in detail.
--
Peter Geoghegan
atabase, and involves exactly 2 ICU versions. You
should probably be able to back out of it once it begins, but mostly
it's an inflexible process that just does what we need it to do.
Does something like that seem sensible to you?
--
Peter Geoghegan
have old physical
collations. Defining the problem as a problem with old
indexes/constraints only seems like it might make things a lot easier.
--
Peter Geoghegan
t's more realistically and robustly
> and simply implementable. Hmm.
That may be a decisive reason to go with your proposal. I really don't know.
--
Peter Geoghegan
on for their logical collation). So
directly tackling that seems natural to me.
--
Peter Geoghegan
t it would be to go as far as we can in the direction
of decoupling the concerns that we have as database people from the
concerns of natural language experts. Let's not step on their toes,
and let's avoid having our toes trampled on.
--
Peter Geoghegan
ady what we advise for users that use advanced
tailorings of custom ICU collations, such as a custom collation for
"natural sorting", often used for things like alphanumeric invoice
numbers. That might break if you downgrade ICU version, and maybe even
if you upgrade ICU version.
--
Peter Geoghegan
ld
largely be an implementation detail, perhaps only used to
unambiguously identify which specific ICU version and locale string
relate to which on-disk relfilenode structure currently.
--
Peter Geoghegan
On Thu, Jun 9, 2022 at 2:33 PM Peter Geoghegan wrote:
> My preference is for an approach that builds on that, or at least
> doesn't significantly complicate it. So a cryptographic hash or nonce
> can go in the special area proper (structs like BTPageOpaqueData don't
> need
ial area proper (structs like BTPageOpaqueData don't
need any changes), but at a page offset before the special area proper
-- not after.
What disadvantages does that approach have, if any, from your point of view?
--
Peter Geoghegan
at experience has shown are relatively common.
That's what the search_path case seems like to me.
If somebody else wants to write another patch that adds on that,
great. If not, then having this much still seems useful.
--
Peter Geoghegan
o store the index metapage. (Actually unlogged
indexes that run on a standby don't, but that's accounted for
directly.)
--
Peter Geoghegan
what not to do.
--
Peter Geoghegan
f ICU
will eventually become a "compelling feature" in its own right.
I believe that EDB adopted ICU many years ago, and stuck with one
vendored version for quite a few years. And eventually being on a very
old version of ICU became a real problem.
--
Peter Geoghegan
On Thu, Jun 9, 2022 at 6:40 AM Robert Haas wrote:
> Are you going to code up a patch?
I can, but feel free to fix it yourself if you prefer. Your analysis
seems sound.
--
Peter Geoghegan
On Wed, Jun 8, 2022 at 10:39 PM Peter Geoghegan wrote:
> They simply REINDEX, without changing anything. The details are still
> fuzzy, but at least that's what I was thinking of.
As I said before, BCP47 format tags are incredibly forgiving by
design. So it should be reasonable to
original/old environment (which is likely), you can avoid reindexing,
and so reserve the option of backing out of a complex upgrade until
very late in the process. You're going to have to do it eventually,
but it can probably just be an afterthought.
--
Peter Geoghegan
ly, and necessitates thinking about
multiple evaluation hazards, which is enough to discourage good
defensive coding practices.
--
Peter Geoghegan
ItemSize()
looked like prior to Postgres 12. So the limit on internal pages never
changed, even in Postgres 12. There was no separate leaf page limit
prior to 12. Only the rules on the leaf level ever really changed.
Note also that amcheck has tests for this stuff. Though that probably
doesn't matter at all.
--
Peter Geoghegan
. It took as long as 30 minutes or more to run the
test.
I think that we should fix this on HEAD, on general principle. There
is no reason to believe that this is a live bug, so a backpatch seems
unnecessary.
--
Peter Geoghegan
ficient
approach to implementing strxfrm() is another example of the same
thing. (The Apple strxfrm() produces huge low entropy binary strings,
unlike the glibc version, which is pretty well optimized.)
--
Peter Geoghegan
rhead with mixed
reads and writes, so it's a performance all-rounder that can still be
beaten by specialized techniques that come with their own downsides.
--
Peter Geoghegan
s once we
gain the ability to use multiple versions of ICU at the same time? For
example, do we want to generalize the definition of a collation, so
that it's associated with one particular ICU version and collation for
the purposes of on-disk compatibility, but isn't necessarily tied to
the same ICU version in other contexts, such as on a dump and restore?
--
Peter Geoghegan
le different ICU versions doesn't really seem like
overkill to me. Or if it is then I can easily think of far better
examples of software bloat. Defining "stable behavior for collations"
as "uses exactly the same software artifact over time" is defensive
(compared to always linking to one ICU version that does it all), but
we have plenty that we need to defend against here.
--
Peter Geoghegan
uot;best effort" approach, because throwing a "locale not
found" error message usually isn't helpful from the point of view of
the end user. Note that this is a broader standard than ICU or CLDR or
even Unicode.
[1] https://www.ietf.org/rfc/rfc6067.txt
--
Peter Geoghegan
ant. Even if glibc theoretically does a
perfect job of versioning, I still think that their priorities are
very much unlike our priorities, and that that should be a relevant
consideration for us.
--
Peter Geoghegan
is scheme wouldn't technically be under our direct control, but
would still be something that we could influence. We could have a back
and forth conversation about what's not working in the field.
--
Peter Geoghegan
port by the distro (while
actively discouraging its use in new databases). This isn't the same
thing as forking ICU. It's a compromise between that extreme, and
the current situation.
--
Peter Geoghegan
hat there are many
near-misses that we never get to hear about already. That's rather
beside the point. The index must be assumed to be corrupt.
--
Peter Geoghegan
space efficiency matters, especially with B-Tree
index-only scans that scan a significant fraction of the entire index,
or even the entire index.
--
Peter Geoghegan
more
attributes of scalar types.
The abbreviated keys optimization is very much something that comes
from the world of databases, not the world of sorting. It's pretty much a
domain-specific technique. That seems relevant to me.
--
Peter Geoghegan
otQuicksort.pdf
At one point quite a few years back I planned on investigating it
myself, but never followed through.
--
Peter Geoghegan
ic
mean? That's pretty standard practice when summarizing a set of
benchmark results that are expressed as ratios to some baseline.
If I tweak your spreadsheet to use the geometric mean, the patch looks
slightly better -- 89%.
--
Peter Geoghegan
On Fri, May 27, 2022 at 11:59 AM Andres Freund wrote:
> On 2022-05-27 11:48:45 -0700, Peter Geoghegan wrote:
> > I find it hard to believe that there wasn't even a cursory effort at
> > performance validation before this was committed, but that's what it
> > looks
speed up anything useful! There's not a
> single benchmark for the patch.
I find it hard to believe that there wasn't even a cursory effort at
performance validation before this was committed, but that's what it
looks like.
--
Peter Geoghegan
On Thu, May 19, 2022 at 1:12 PM Justin Pryzby wrote:
> Should these debug lines be removed ?
>
> elog(DEBUG1, "qsort_tuple");
I agree -- DEBUG1 seems too chatty for something like this. DEBUG2
would be more appropriate IMV. Though I don't feel very strongly about
it.
--
Peter Geoghegan
that's just obviously false.
+1
--
Peter Geoghegan
ble for a
variety of reasons. All of which boil down to "the current FSM design
cannot be totally trusted, so we verify redundantly".
--
Peter Geoghegan
-- I'm really
looking for bottlenecks, where Postgres does entirely the wrong thing.
It's especially interesting to me as somebody that focuses on B-Tree
indexing.
--
Peter Geoghegan
mpossible to overlook), then I
would object -- why even take a small chance? Fortunately I don't
believe that we're even taking a small chance here, all things
considered. And so I agree; this issue isn't a concern.
--
Peter Geoghegan
On Thu, Apr 21, 2022 at 4:28 PM Peter Geoghegan wrote:
> I don't think that there is any risk of one user of either variable
> "clobbering" some other user -- the current values of the variables
> are not actually meaningful at all. They're only useful as a way that
On Wed, Apr 20, 2022 at 8:00 PM Peter Geoghegan wrote:
> I knew about pgBufferUsage, and I knew about
> VacuumPage{Hit,Miss,Dirty} for a long time. But somehow I didn't make
> the very obvious connection between the two until today. I am probably
> not the only
n that path and added the others too?
I knew about pgBufferUsage, and I knew about
VacuumPage{Hit,Miss,Dirty} for a long time. But somehow I didn't make
the very obvious connection between the two until today. I am probably
not the only one.
--
Peter Geoghegan
On Tue, Apr 12, 2022 at 11:01 AM Peter Geoghegan wrote:
> Attached patch fixes the issue, and includes the test case that you posted.
Pushed a similar patch just now. Backpatched to all supported branches.
--
Peter Geoghegan
ack_io_timing is off, so are fields like
pgBufferUsage.shared_blks_hit (i.e. those that don't have a
time/duration component) officially okay to rely on across the board?
It looks like they are okay to rely on (even when track_io_timing is
off), but it would be nice to put that on a formal footing, if it
isn't already.
--
Peter Geoghegan
ith using DBT5 on a modern
Linux distribution. Perhaps I gave up too easily at the time, but I'm
definitely still interested. Has there been work on that since?
Thanks
--
Peter Geoghegan
s correct according to the
spec.
--
Peter Geoghegan
On Mon, Apr 18, 2022 at 1:12 PM Peter Geoghegan wrote:
> I would argue that it would be correct for the first time -- at least
> if we take the behavior within heapam_index_build_range_scan (and
> everywhere else) as authoritative. That's a feature, not a bug.
Attached draft patc
rgue that it would be correct for the first time -- at least
if we take the behavior within heapam_index_build_range_scan (and
everywhere else) as authoritative. That's a feature, not a bug.
--
Peter Geoghegan
uumInfo.num_heap_tuples value
in the amvacuumcleanup path (instead of new_rel_tuples). That way the
rule about IndexVacuumInfo.num_heap_tuples is simple: it's always
taken from pg_class.reltuples (for the heap rel). Either the existing
value, or the new value.
--
Peter Geoghegan
.num_index_tuples, which is related. Granted,
that won't be used to update pg_class for the index in the case where
it's just an estimate anyway.
--
Peter Geoghegan
. I believe that the "pg_class.reltuples is -1 even after a
VACUUM" case is completely impossible following the Postgres 15 work
on VACUUM, but we should still clamp for safety in
update_relstats_all_indexes (though not in the amvacuumcleanup path).
--
Peter Geoghegan
> And FreezeLimit doesn't affect "dead but not yet removable".
But OldestXmin affects FreezeLimit.
Anyway, I'm not opposed to showing the age at the start as well. But
from the point of view of issues like this tenk1 issue, it would be
more useful to just report on new_rel_allvisible. It would also be
more useful to users.
--
Peter Geoghegan
eneral approach to calculating
FreezeLimit makes little sense.
--
Peter Geoghegan
ally in the kinds of extreme cases I'm thinking about.
--
Peter Geoghegan
what's currently running.
As well as the age of OldestXmin at the start of VACUUM.
--
Peter Geoghegan
ind, though. Your new wording is fine.
I'll update the log output some time today.
--
Peter Geoghegan
mples of both. This could easily be changed to "XIDs".
--
Peter Geoghegan
fsync = off'. And did so in the
script as well.
That seems like it definitely could matter.
--
Peter Geoghegan
t theory (just putting
dinner on here). Just a wild guess at this point.
--
Peter Geoghegan
but I thought that the syncronous_commit thing was new
information that made that worth revisiting.
--
Peter Geoghegan
could plausibly have had that effect, whose
commit fits with our timeline for the problems seen on wrasse?
--
Peter Geoghegan
On Thu, Apr 14, 2022 at 3:28 PM Peter Geoghegan wrote:
> A bunch of autovacuums that ran between "2022-04-14 22:49:16.274" and
> "2022-04-14 22:49:19.088" all have the same "removable cutoff".
Are you aware of Andres' commit 02fea8fd? That work preven
o cannot go up as we're doing it (or it'll be less of an
issue, at least).
It would also help if VACUUM didn't scan pages that it already knows
don't have any dead tuples. The current SKIP_PAGES_THRESHOLD rule
could easily be improved. That's almost the same problem.
--
Peter Geoghegan
ee seconds
(likely more) where something held back OldestXmin generally.
That does seem a bit fishy to me, even though it happened about a
minute after the failure itself took place.
--
Peter Geoghegan
o XIDs to work
off of in the log_line_prefix that's in use on wrasse.
The CITester log_line_prefix is pretty useful -- I wonder if we can
standardize on that within the buildfarm, too.
--
Peter Geoghegan
On Thu, Apr 14, 2022 at 10:07 AM Peter Geoghegan wrote:
> It looks like you're changing the elevel convention for these "extra"
> messages with this patch. That might be fine, but don't forget about
> similar ereports() in vacuumparallel.c. I think that the elevel sho
, but don't forget about
similar ereports() in vacuumparallel.c. I think that the elevel should
probably remain uniform across all of these messages. Though I don't
particular care if it's DEBUG2 or DEBUG5.
--
Peter Geoghegan
e'd know what the xid horizon is, whether pages were
> skipped, etc.
I like the idea of making VACUUM log the VERBOSE output as a
configurable user-visible feature. We'll then be able to log all
VACUUM statements (not just autovacuum worker VACUUMs).
--
Peter Geoghegan
to the
> horizon potentially going backwards (in otherwise harmless ways)?
I agree, since vacuumlazy.c would need to either be given its own
OldestXmin, or knowledge of a wait-up-to XID. Either way we have to
make non-trivial changes to vacuumlazy.c.
--
Peter Geoghegan
On Wed, Apr 13, 2022 at 6:03 PM Peter Geoghegan wrote:
> I think that it's more likely that FREEZE will correct problems, out of the
> two:
>
> * FREEZE forces an aggressive VACUUM whose FreezeLimit is as recent a
> cutoff value as possible (FreezeLimit will be equal to Olde
an SQL function for other
reasons, though. Users already think that there are several different
flavors of VACUUM, which isn't really true.
--
Peter Geoghegan
tgr.es/m/cah2-wzkib-qcsbmwrpzp0nxvrqexouts1d7tyshg_drkohe...@mail.gmail.com
--
Peter Geoghegan
problem, really. I wonder if it's worth inventing
a comprehensive solution. Some kind of infrastructure that makes
VACUUM establish a next XID up-front (by calling
ReadNextTransactionId()), and then find a way to run with an
OldestXmin that's >= the earleir "next" XID value. If necessary by
waiting.
--
Peter Geoghegan
rel->NewRelfrozenXid == OldestXmin", and run the
regression tests, the remaining assertion will fail quite easily.
Though perhaps not with a serial "make check".
--
Peter Geoghegan
ct on test stability). I would expect that to be
the case, at least, since VACUUM now does almost all of the same work
for any individual page that it cannot get a cleanup lock on. There is
surprisingly little difference between a page that gets processed by
lazy_scan_prune and a page that gets processed by lazy_scan_noprune.
--
Peter Geoghegan
kier to make sure that wrasse reliably reported on
all relevant VACUUMs, since that would have to include manual VACUUMs
(which would really have to use VACUUM VERBOSE), as well as
autovacuums.
--
Peter Geoghegan
On Wed, Apr 13, 2022 at 1:25 PM Robert Haas wrote:
> On Wed, Apr 13, 2022 at 12:34 PM Peter Geoghegan wrote:
> > What do you think of the idea of relating freezing to removing tuples
> > by VACUUM at this point? This would be a basis for explaining how
> > freezing
nvalid relfrozenxid values was
flagrantly just a bug (adding a WARNING for this recently, in commit
e83ebfe6). So while I accept that the distinction you're making here
is valid, maybe we can fix the single user mode doc bug too, removing
the need to discuss "true wraparound" as a general phenomenon. You
shouldn't ever see it in practice anymore. If you do then either
you've done something that "invalidated the warranty", or you've run
into a legitimate bug.
--
Peter Geoghegan
is the simple reality.
We should say so.
> I am wondering, for the more technical details, is there an existing place to
> send xrefs, do you plan to create one, or is it likely unnecessary?
I might end up doing that, but just want to get a general sense of how
other hackers feel about it for now.
--
Peter Geoghegan
raparound autovacuums really
aren't all that special. Which makes them seem scarier than they
should be.
[1]
https://postgr.es/m/CAH2-Wzk_FxfJvs4TnUtj=dcsokbik0cxfjz9jjrfsx8stwk...@mail.gmail.com
--
Peter Geoghegan
es the issue, and includes the test case that you posted.
There is only a one line change to tuplesort.c. This is arguably the
same bug -- abbreviation is just another "haveDatum1 optimization"
that needs to be accounted for.
--
Peter Geoghegan
v1-0001-Fix-CLUSTER-sort-on-ab
that you try to "work backwards". If the patch was already
committed today, but had subtle bugs, then how would we be able to
identify the bugs relatively easily? What would our strategy be then?
--
Peter Geoghegan
ething that we
treat in a rather naive way currently.
Can you demonstrate that with a custom test case? (The result I cited
before was from a '(varlen,varlen,varlen)' index, which is important,
but less relevant.)
[1]
https://www.postgresql.org/message-id/flat/CAEze2Whwvr8aYcBf0BeBuPy8mJGtwxGvQYA9OGR5eLFh6Q_ZvA%40mail.gmail.com
--
Peter Geoghegan
On Sun, Apr 10, 2022 at 2:44 PM Peter Geoghegan wrote:
> Can you post a version of this that compiles?
I forgot to add: the patch also bitrot due to recent commit dbafe127.
I didn't get stuck at this point (this is minor bitrot), but no reason
not to rebase.
--
Peter Geoghegan
r comparator.
That's not good.
The B&M quicksort implementation that we adopted is generally
extremely fast for that case, since it uses 3 way partitioning (based
on the Dutch National Flag algorithm). This essentially makes sorting
large groups of duplicates take only linear time (not linearith
by the patch series, with an explanation of where the benefit
comes from. You had some on the original thread, but that included
dynamic prefix truncation stuff as well.
Ideally you would also describe where the adversized improvements come
from for each test case -- which patch, which enhancement (perhaps
only in rough terms for now).
--
Peter Geoghegan
ges becoming empty, thus allowing their line pointer arrays to be
> reset.
I agree. Sometimes the problem is that we don't cut our losses when we
should -- sometimes just accepting a limited downside is the right
thing to do. Like with the FSM; we diligently use every last scrap of
free space, without concern for the bigger picture. It's penny-wise,
pound-foolish.
--
Peter Geoghegan
ed later on.
> c) What if we left some percentage of ItemIds unused, when looking for the
>OffsetNumber of a new HOT row version? That'd make it more likely for
>non-HOT updates and inserts to fit onto the page, without permanently
>increasing the size of the line pointer array.
That sounds promising.
[1]
https://postgr.es/m/cah2-wzm-vhveqyth8hlyyho2wdg8ecrm0upqjwjap6bovfe...@mail.gmail.com
--
Peter Geoghegan
On Fri, Apr 8, 2022 at 2:18 PM Andres Freund wrote:
> It's 4 bytes per line pointer, right?
Yeah, it's 4 bytes in Postgres. Most other DB systems only need 2
bytes, which is implemented in exactly the way that you're imagining.
--
Peter Geoghegan
rPage limit of 291? If we
had only been able to "absorb" just a few extra versions in the short
term, we would have had stability (in the sense of being able to
preserve locality among related logical rows) in the long term. We
could have kept everything together, if only we didn'
On Fri, Apr 8, 2022 at 9:44 AM Peter Geoghegan wrote:
> On Fri, Apr 8, 2022 at 4:38 AM Matthias van de Meent
> wrote:
> > Yeah, I think we should definately support more line pointers on a
> > heap page, but abusing MaxHeapTuplesPerPage for that is misleading:
> >
mbers. I cut down on that in the B-Tree code, reducing it to
MaxIndexTuplesPerPage (which is typically 407) in a few places. So
anything close to our current MaxIndexTuplesPerPage ought to be fine
for most individual arrays stored on the stack.
--
Peter Geoghegan
d point. Sounds like it might be the right approach.
I suppose that it will depend on how much use of MaxHeapTuplesPerPage
remains once it is split in two like this.
--
Peter Geoghegan
On Fri, Apr 8, 2022 at 5:58 AM Alvaro Herrera wrote:
> Thanks for herding through the CF!
+1
--
Peter Geoghegan
eady (*), but if it
> grows
> further...
No arguments here. There are probably quite a few places that won't
need to be fixed, because it just doesn't matter, but
lazy_scan_prune() will.
--
Peter Geoghegan
On Mon, Apr 4, 2022 at 7:24 PM Peter Geoghegan wrote:
> I am sympathetic to the idea that giving the system a more accurate
> picture of how much free space is available on each heap page is an
> intrinsic good. This might help us in a few different areas. For
> example, the FSM
t; even individual tables, which would all be very cool), but I don't think
> this approach would make that possible..?
That would be the main advantage, yes. But I also tend to doubt that
we should make it completely impossible to know anything at all about
the page without fully decrypting it.
It was just a suggestion. I will leave it at that.
--
Peter Geoghegan
On Thu, Apr 7, 2022 at 12:37 PM Robert Haas wrote:
> On Thu, Apr 7, 2022 at 3:27 PM Peter Geoghegan wrote:
> > I just meant that it wouldn't be reasonable to impose a fixed cost on
> > every user, even those not using the feature. Which you said yourself.
>
> Unfortunat
ogically come after" the special space under this scheme. You
wouldn't have a simple constant offset into the page, but you'd have
something not too far removed from such a constant. It could work as a
constant with minimal context (just the AM type). Just like with
Matthias' patch.
--
Peter Geoghegan
901 - 1000 of 2620 matches
Mail list logo