ities
for optimization as well, such as:
* reducing the need for palloc and extra buffers, perhaps by using
buffers on the stack for small strings
* operate more directly on UTF-8 data rather than decoding and re-
encoding the entire string
Regards,
Jeff Davis
>
> Works for me.
Sounds good. We can document compatibility notes around this point.
If normalization becomes important, we can take the time to work out
the performance implications more carefully, and potentially introduce
an NCASEFOLD() if needed.
Regards,
Jeff Davis
type.
> I guess I don't feel strongly about it either
> way.
Are you a user of citext? I'm genuinely interested in the use cases,
and whether the separate-data-type approach has merits that are missing
in the other approaches.
Regards,
Jeff Davis
he entry for EXCLUDE? I also merged your wording with
some similar wording from the entry about UNIQUE. Attached.
Regards,
Jeff Davis
From 0988ec1bac79055899fb555ac0c0441333888c83 Mon Sep 17 00:00:00 2001
From: "Paul A. Jungwirth"
Date: Tue, 17 Jun 2025 20:48:56 -0700
Subject:
R(), so that
sounds like a good idea. I'd be interested to hear from users of
citext.
Regards,
Jeff Davis
ot sure whether we'd want to standardize one or both of
those functions.
And if you think there's likely to be a collision with the standard
that's hard to anticipate and fix now, then we should consider
reverting CASEFOLD() for 18 and wait for more progress on the
standardization. W
tisfy Robert's concern about
the --help output. But Robert also wants stats off by default for
pg_dump and on by default for pg_restore, which I think means we need
both --with-statistics and --no-statistics anyway. Robert, comments?
Regards,
Jeff Davis
override that and I'm not sure we have one right now.
Regards,
Jeff Davis
On Thu, 2025-06-12 at 08:58 -0700, Jeff Davis wrote:
> On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote:
> > If the idea is to remove all options for default behavior, we'd be
> > removing
> > --no-statistics, --with-data, and --with-schema at this point.
>
&
folding would also
want normalization, but it's hard to weigh that against the performance
cost. It might not matter outside of a few edge cases, though I'm not
sure exactly how many.
Regards,
Jeff Davis
but the "--x-only" options
also put us in a tough spot.
If --data-only had always been spelled "--no-schema" (or "--without-
data" or whatever), and --schema-only had always been spelled "--no-
data", then I think it would be a lot easier to add statistics into the
mix.
Regards,
Jeff Davis
On Mon, 2025-06-16 at 16:09 -0500, Nathan Bossart wrote:
> So perhaps there's not as strong of a
> consensus as we thought. Maybe we should ask for any new/updated
> votes.
Does it make any sense to be off by default in 18 and on in some later
release?
Regards
Jeff Davis
Fixed.
Regards,
Jeff Davis
isible changes in the past, and
> regenerating tsvectors because of that were merely a suggestion.
Interesting, thank you for looking into the history here. It would
certainly be simpler to just make FTS fully collation-aware.
Regards,
Jeff Davis
ther options,
we don't need to worry about consistency with them, and I think we
should just use "--statistics".
Regards,
Jeff Davis
y.
To me, "last option wins" means that you don't raise an error; the
latter option simply overrides the earlier one.
Given that the pg_dump options are not order-sensitive now (unless I'm
missing something), I'm worried about the consequences of trying to
make them so now.
Regards,
Jeff Davis
simple to start using "last option wins" behavior
now. There are probably some combinations of options where it's not
clear whether a later option is an extra constraint or will override a
previous option.
Regards,
Jeff Davis
On Thu, 2025-06-12 at 15:57 -0500, Nathan Bossart wrote:
> FWIW I don't have a tremendously strong opinion about --statistics-
> only.
Same here. I won't cast a vote on this particular issue, as long as the
functionality is available.
Regards,
Jeff Davis
rip out --statistics-only (in favor
> of
> --no-schema --no-data --with-statistics).
I'd probably keep --statistics-only.
Regards,
Jeff Davis
On Thu, 2025-06-12 at 10:18 -0400, Robert Haas wrote:
> Am I too late to propose ripping this out?
As long as we keep the functionality, I'm fine changing the
options/names around at this point.
Regards,
Jeff Davis
ndexes,
which are in SECTION_POST_DATA).
Regards,
Jeff Davis
On Fri, 2025-02-07 at 11:19 -0800, Jeff Davis wrote:
>
> Attached v15. Just a rebase.
Attached v16.
> * commit this on the grounds that it's a desirable code improvement
> and
> the worst-case regression isn't a major concern; or
I plan to commit this soon after bra
NCTION
statements that come from other places (e.g. direct from applications,
or migration scripts, or extension scripts).
>
Regards,
Jeff Davis
ger of accidentally depending on that setting. Can the encoding be
controlled with LC_MESSAGES instead of LC_CTYPE?
Do you have an example of how things can go wrong?
> For the LC_COLLATE settings, I think we could just
> do the setting in main(), where the other non-database-speci
We could try to create a GUC to control this behavior, but behavior-
changing GUCs don't have a great history, and it would probably last
quite some time before we could really turn off libc for good.
There would be similar challenges for downcase_identifier() and maybe
pg_strcasecmp().
Regards,
Jeff Davis
o.
I guess "CTYPE" works, but it's too technical and feels libc-specific.
Regards,
Jeff Davis
we need is the right encoding, do
we need a proper locale?
Regards,
Jeff Davis
On Fri, 2025-06-06 at 15:47 -0700, Jeff Davis wrote:
> > > * Force the environment variables LC_COLLATE=C and LC_CTYPE=C
> > > unconditionally, and pg_perm_setlocale() them
> >
> > Currently that would be a regression for some people, because
> > when
On Thu, 2025-06-05 at 22:15 -0700, Jeff Davis wrote:
> To continue this thread, I did a symbol search in the meson build
> directory like (patterns.txt attached):
Attached a rough patch series which does what everyone seemed to agree
on:
* Change some trivial ASCII cases to use pg_
on datctype, and I could have offered a more clear reply to
the user.
Regards,
Jeff Davis
/ comments. Another caller is
get_iso_localename().
There are also a couple false positives where mbstowcs_l/wcstombs_l are
emulated with uselocale() and mbstowcs/wcstombs. In that case, it's not
actually sensitive to the global setting.
---
copyfromparse.c - the input is
, then ignore LC_COLLATE/LC_CTYPE and emit a
WARNING, rather than trying to set it based on LOCALE and getting an
error.
Regards,
Jeff Davis
[1]
https://www.postgresql.org/message-id/cd3517c7-ddb8-454e-9dd5-70e3d84ff6a2%40eisentraut.org
From fea7ab4f0495330fae56f069520de374d75ae0b8 Mon Sep 17
On Tue, 2025-06-03 at 20:22 -0700, Jeff Davis wrote:
> EQUALITY marker: indicates that the function or index AM depends on
> CollOid for the equality semantics of the input expression. Examples:
> texteq(), btree AM, hash AM. (Note: EQUALITY is only important for
> non-
> determini
a strong opinion on which route to
take, but I chose the above names from existing keywords so we wouldn't
have to add any.
Regards,
Jeff Davis
ted behavior.
If we make the opposite assumption, that none are ordering-sensitive
unless we mark them so, that would allow properly-marked functions to
fail at parse time, and the rest to fail at runtime. But this
assumption doesn't work as well for recording dependencies, because
we'd miss the dependencies for UDFs that aren't properly marked.
Thoughts?
Regards,
Jeff Davis
that a UDF with collatable inputs depends on
all of the behaviors.
Regards,
Jeff Davis
ct users to create their own functions which depend on our
normalization tables, we can add a fourth marker UNICODE. Otherwise, we
can just special case the few builtin functions we have to create those
dependency entries.
Regards,
Jeff Davis
t execute any non-superuser-owned code"
would be very useful at a practical level, e.g. for pg_dump.
Regards,
Jeff Davis
e database, and we've had plenty of fixes involving
> the startup process and a different process, mostly the checkpointer.
> That's an annoying limitation.
If you have in mind some other ways to use it than I like it a lot
more. And I don't have a better idea.
Regards,
Jeff Davis
ficant performance overhead
to wrapping the function as is done for SECURITY DEFINER, so if the
function is obviously safe, it would be nice to avoid that. And it
would be another tool to help us mitigate the various related problems
we have with selecting from views, etc.
Regards,
Jeff Davis
as SECURITY DEFINER and then someone changes it
later?
Regards,
Jeff Davis
to an
"upgrade_warnings" directory sounds like a reasonable way to go.
Regards,
Jeff Davis
g infrastructure is a lot less of a
problem than other kinds of complexity, so it might be OK. But it would
be nice if there were a couple cases that would benefit rather than
one.
Regards,
Jeff Davis
ile.
Should we automatically retain files associated with warnings, or copy
them to a different location?
Regards,
Jeff Davis
itly specifies --with-statistics.
Regards,
Jeff Davis
From 5b73253f8848638f1754f4b9da82e90e8814b4b1 Mon Sep 17 00:00:00 2001
From: Jeff Davis
Date: Thu, 22 May 2025 11:03:03 -0700
Subject: [PATCH v2] Change defaults for statistics export.
Set the default behavior of pg_dump, pg_dumpall, and
low for most call sites? Which
call sites are the most interesting ones that need special attention?
Regards,
Jeff Davis
dn't want to change the test results as a part of this
commit.
Regards,
Jeff Davis
From b76cb91441e2eefe278249e23fcd703d27a85a06 Mon Sep 17 00:00:00 2001
From: Jeff Davis
Date: Thu, 22 May 2025 11:03:03 -0700
Subject: [PATCH v1] Change defaults for statistics export.
Set the default be
e
> import side here.
That's fine with me. Perhaps we should just say that pre-18 behavior
differences can be fixed up during export, and post-18 behavior
differences are fixed up during import?
Regards,
Jeff Davis
That might be fine, but it would be good
to understand where the line is between things we should reinterpret
during export vs things we should reinterpret during import.
Regards,
Jeff Davis
gt; passed, so I think this is a reasonable alternative to that design.
I'd have to see the patch to see whether I liked the end result. But
I'm guessing that involves a lot of non-mechanical changes in the call
sites, and also relies on test coverage for all of them.
Regards,
Jeff Davis
e recordDependencyOn() take a LOCKMODE
parameter, which would both inform the caller that a lock will be
taken, and allow the caller to do it their own way and specify NoLock
if necessary. That still results in a huge diff, but the end result
would not be any more complex than the current code.
Regards,
Jeff Davis
surprising to
me. Assuming that heavyweight locks are the right approach, the locks
need to be taken somewhere. And expecting all the callers to get it
right seems error-prone.
This is a long thread so I must be missing some problem or complication
here.
Regards,
Jeff Davis
t that
still doesn't quite capture ICU's more complex definition of word
boundaries.
Or, we could remove those unused functions for now, and figure out if
there's a reason to add them back later. They are probably adding more
confusion than anything.
Regards,
Jeff Davis
From ff
ge.
I tried that in v2-0003, but I think it ended up worse. Most
pg_wc_xyz() functions don't care if it's the default collation or not,
so there are a lot of duplicate cases.
The previous approach is still there as v2-0002.
Regrads,
Jeff Davis
From 9724181f715ce3468e9342763fad
) = 'I';
?column? | ?column? | ?column? | ?column?
--+--+------+--
t| t| f| f
That behavior goes back a long way, so I'm not suggesting that we
change it.
Regards,
Jeff Davis
From e8a68f42f5802d138ba04043b25b7d
On Wed, 2025-04-02 at 17:58 +0530, Shlok Kyal wrote:
> I reviewed the patch and I have a comment:
Thank you and vignesh for the feedback. This patch didn't quite make it
for v18, but I will address it for the next CF.
Regards,
Jeff Davis
On Wed, 2025-03-19 at 15:17 -0700, Jeff Davis wrote:
> On Sat, 2025-03-15 at 21:37 -0400, Corey Huinker wrote:
> > > 0001 - no changes, but the longer I go the more I'm certain this
> > > is
> > > something we want to do.
>
> This replaces regclassin
to
fetch the next batch), and have a single static variable that points to
that.
Also in 0003, the "next_te" variable is a bit confusing, because it's
actually the last TocEntry, until it's advanced to point to the current
one.
Other than that, looks good to me.
Regards,
Jeff Davis
der parallelism, which might
defeat the batching work that we're trying to do.
Regards,
Jeff Davis
uld
use the same $src_dump for both restoration and comparison, but it
looks like you wanted coverage of the --create option. (Aside: why
parallel restore there? Is that just for test coverage or was there a
performance reason?)
Regards,
Jeff Davis
by
> a
> previous call). Does that sound like a strong enough check?
Again, I'd just be practical here and do the check if it feels natural,
and if not, improve the comments so that someone modifying the code
would know where to look.
Regards,
Jeff Davis
ot;? Isn't
that already implied by "JOIN unnest($1, $2) ... s.tablename =
u.tablename"?
Regards,
Jeff Davis
ke it in, or
waiting for beta reports, may yield some new information that could
change minds.
Mid-beta might be too long, but let's wait for the final CF to settle
and give people the chance to respond to a top-level thread?
Regards,
Jeff Davis
make the decision now for some reason?
Regards,
Jeff Davis
ore and after dumps, and if the
"before" version is 17, then it will not have the relallfrozen argument
to pg_restore_relation_stats. We might need a filtering step in
adjust_new_dumpfile?
Attached new v11j-0001
Regards,
Jeff Davis
From 154b8b5c10ec330c26ccd9006c434a7db1feef04
to unblock
your work.
Regards,
Jeff Davis
suite for me.
Are you saying that the tests don't work for you even when v2j-0003 is
applied? Or are you saying that your tests are failing on master, and
that v2j-0002 should be committed?
Regards,
Jeff Davis
From 6fc3b98dc9a2589b9943e075b492b4c31044c14e Mon Sep 17 00:00:00 2001
Fro
e can wait until beta to see what kinds of
problems people encounter.
Regards,
Jeff Davis
On Sat, 2025-03-22 at 09:39 -0700, Jeff Davis wrote:
> For some reason I'm getting a decline of about 3% in the c.sql test
> that seems to be associated with the accessor functions, even when
> inlined. I'm also not seeing as much benefit from the inlining of the
> MemoryCont
le, you get what you asked for.
> >
>
>
> They *asked for* that because they didn't have the mechanism to say
> "hold the mayo" or "everything except pickles". That's reducing their
> choice, and then blaming them for their choice.
Can we reach a decision here and move forward?
Regards,
Jeff Davis
On Tue, 2025-03-04 at 17:28 -0800, Jeff Davis wrote:
> My results (with wide tables):
>
> GROUP BY EXCEPT
> master: 2151 1732
> entire v8 series: 2054 1740
I'm not sure what I did with the EXCEPT test,
less risky than not updating: if you don't update Unicode,
then the code points could end up in the database treated as
unassigned, and then cause a problem for future updates.
Regards,
Jeff Davis
might* be DDL
happening while I'm trying to do a simple SELECT query. But probably
not, so let's make it the responsibility of DDL to warn others that
it's doing something, rather than the responsibility of the SELECT
query.
Regards,
Jeff Davis
any other multi-lib work (which I am not promising to do) might
slip to PG20, which users will see at the end of 2027. Ugh.
Regards,
Jeff Davis
lems with newly-assigned code
points?
And, if possible, how we might extend this user experience to libc or
ICU updates?
Regards,
Jeff Davis
ExplicitNamespace().
Regards,
Jeff Davis
at need fixing,
and reindex just those few tuples. In theory, it should be possible:
there are a finite number of codepoints that change each Unicode
version, and we can just search for them in the data and fix up derived
structures.
Regards,
Jeff Davis
an actual problem, etc. If you disagree, I'd like to hear more.
Regards,
Jeff Davis
the concerns raised in this thread,
but I'd like others to understand that what they are asking for is a
lot of work, and that the builtin collation provider solves 99% of it
already. All this effort is to solve that last 1%.
Regards,
Jeff Davis
else does it need?
It's an upgrade-time check rather than a GUC, but it basically seems to
match what you want. See:
https://www.postgresql.org/message-id/16c4e37d4c89e63623b009de9ad6fb90e7456ed8.ca...@j-davis.com
Regards,
Jeff Davis
choice remains to remain on the older one.
What do you think of Tom's argument that waiting to update Unicode is
what creates the problem in the first place?
"by then they might well have instances of the newly-assigned code
points in their database"[1]
Regards,
Jeff Davi
+static const pg_wchar * const casekind_map[NCaseKind] =
Fixed also (except pgindent had a slightly different opinion about
spaces).
Was this a general suggestion, or did you see something in particular
that would make it more optimizable this way?
Regards,
Jeff Davis
n:
U&'\0363' ~ '[[:alpha:]]' COLLATE PG_C_UTF8
from false to true, even though U+0363 is assigned in both Unicode
15.1.0 and 16.0.0. That might plausibly matter, but such cases would be
more obscure than case folding.
Regards,
Jeff Davis
[1] https://commitfest.postgresql.org/patch/4876/
e's no "withdrawn
-- duplicate", so it might send the wrong message.
Regards,
Jeff Davis
ng about ways we can express the right dependencies,
and I may be making some proposals along those lines.
Regards,
Jeff Davis
On Sat, Mar 15, 2025 at 1:11 PM Tom Lane wrote:
> Jeff Davis writes:
> > Committed. Thank you!
>
> crake doesn't like your perl style:
>
> ./src/common/unicode/generate-unicode_case_table.pl: Loop iterator is not
> lexical at line 638, column 2. See page 108 o
e possible to
simplify branch() a bit, but I'm fine with the way it's done.
When looking around, I didn't find a lot of material discussing this
generated-branches approach. While it's mentioned a few places, I
couldn't even find a consistent name for it. If you know o
that it's
> faster
> than ICU?
It doesn't break primary keys.
Also, it's stable within a major version, we can document and test its
behavior, it solves 99% of the upgrade problem, and what problems
remains are much more manageable.
And yes, collation is way, way faster than ICU.
Regards,
Jeff Davis
On Sat, 2025-03-15 at 12:15 -0400, Tom Lane wrote:
> On the other hand, if we keep up with the Joneses by updating the
> Unicode data, we can hopefully put those behavioral changes into
> effect *before* they'd affect any real data.
That's a good point.
Regards,
Jeff Davis
On Fri, 2025-03-14 at 13:16 +0200, Heikki Linnakangas wrote:
> Attached are fixes for those and some other minor things.
Thank you, I agree and I have applied your changes.
Regards,
Jeff Davis
mposition table remains the same, getting used for the binary
search in the frontend code, where we care more about the size of the
libraries like libpq over performance..."
>
Regards,
Jeff Davis
From ed4d2803aa32add7c05726286b94e78e49bb1257 Mon Sep 17 00:00:00 2001
From: Jeff Davis
Date
lines.
Did you collect performance results for 0004?
Regards,
Jeff Davis
sier than I expected, so I'm open to other suggestions.
The reason is because materialized view data is also pushed to
RESTORE_PASS_POST_ACL, so we need to do the same for the statistics
(otherwise the dependency is just ignored).
Regards,
Jeff Davis
From 4e84889cb890e5e89191e6ca8d1
dency on the relation and a boundary dependency on
the postDataBound (unless it's an index, or an MV that got pushed to
SECTION_POST_DATA).
I suspect what we need here is a dependency on the MV *data*, because
that's doing a heap swap, which resets the stats. Looking into it.
Regards,
Jeff Davis
e fixes the cross-version-upgrade
> failure in local testing, and pushed it.
Ah, thank you.
Regards,
Jeff Davis
ema-only and --data-only
* --include overrides any default
is that right?
Thoughts on how we should document when/how to use --section vs --
include? Granted, that might be a point of confusion regardless of the
options we offer.
Regards,
Jeff Davis
ng any
exclusions.
But I agree the previous code was hard to read in one place, and
redundant in another, so I will commit a fixup.
Regards,
Jeff Davis
clude=data,statistics <=> --data-only --statistics
--include=schema,data <=> --no-statistics
Not sure which approach is better.
Regards,
Jeff Davis
OWER(t) < U&'\')
RETURNING *
) INSERT INTO tpart SELECT * FROM d;
COMMIT;
The order of operations should be to fix indexes, unique constraints,
and check constraints first; and then to fix partitioned tables. That
way the above partitioned table queries get correc
s mean include statistics also or statistics only? Can
you explicitly request that data be included but rely on the default
for statistics? What options would it override or conflict with?
Regards,
Jeff Davis
1 - 100 of 1367 matches
Mail list logo