(b) Some of those previously-unassigned code points must be assigned
to a Cased character in the newer version.
It's a smaller problem than a libc or ICU upgrade, which can cause
differences in sort order for the same reason (unassigned codepoints
later being assigned) as well as many other r
On Thu, 2025-09-11 at 20:51 +0300, Alexander Borisov wrote:
>
> > Hey.
> >
> > I've looked into these patches.
>
> Hi Victor,
>
> Thank you for reviewing the patch and testing it!
Heikki, do you have thoughts on this thread?
Regards,
Jeff Davis
orker saturation or
lock contention) from my test case appear in real workloads, but
affected users can change to sync mode until we sort these things out
in 19.
Regards,
Jeff Davis
bug before, anyway).
Regards,
Jeff Davis
case-insensitive comparisons.
Perfect, thank you.
Regards,
Jeff Davis
On Fri, 2025-09-12 at 13:21 -0500, Nico Williams wrote:
> On Fri, Sep 12, 2025 at 10:11:59AM -0700, Jeff Davis wrote:
> > The name PG_UNICODE_FAST is meant to convey that it provides full
> > unicode semantics for case mapping and pattern matching, while also
> > being fast b
ZE(b).
This is getting a little off-topic from what's actually being released,
so please reopen a more relevant thread or start a new one, and CC me.
Regards,
Jeff Davis
convey that it provides full
unicode semantics for case mapping and pattern matching, while also
being fast because it uses memcmp for comparisons. While the name
PG_C_UTF8 is meant to convey that it's closer to what users of the libc
"C.UTF-8" locale might expect.
Regards,
Jeff Davis
inconsistency between the two functions
could have been a bug but wasn't. Also, I generally use past tense for
the stuff that's being fixed and present tense for what the patch does.
One loose end: there was some discussion about the times when
LocalMinRecoveryPoint is valid/correct. I'm not sure I entirely
understood -- is that no longer a concern?
Regards,
Jeff Davis
e:
https://www.postgresql.org/message-id/b4ad535a72fc02ea43076cf525e4dbaa72b00d5b.ca...@j-davis.com
It seems like XLogFlush() and XLogNeedsFlush() should use the same
test, otherwise you could always get some confusing inconsistency.
Right?
Regards,
Jeff Davis
he WALs and once it is done it will be updated in the
> > ControlFile at PerformRecoveryXLogAction()-
> > >CreateEndOfRecoveryRecord().
...
>
> Though, it seems like LocalMinRecoveryPoint must be getting
> incorrectly set elsewhere, otherwise this would have guarded us from
> examining the control file:
I am confused about whether we are discussing "incorrect" or "invalid".
Regards,
Jeff Davis
> >
uldn't the code be consistent between XLogNeedsFlush() and
XLogFlush()? The latter only checks for !XLogInsertAllowed(), whereas
the former also checks for RecoveryInProgress().
I'm still not sure I understand the problem situation this is fixing,
but that's being discussed in another thread.
Regards,
Jeff Davis
sed a couple
subsystems and found some interesting effects. I think it will be
helpful context for potential improvements in 19. If I find other
interesting cases, I'll post them.
Regards,
Jeff Davis
changes?
I think you are right, I can't see any harm in backporting it, and
arguably it's a bug: it would be annoying to compare performance across
different versions with different build flags.
I backported through 16.
Regards,
Jeff Davis
On Mon, 2025-09-08 at 15:14 -0500, Nathan Bossart wrote:
> I think we need to do something similar for numeric.c.
Good catch. Patch LGTM.
Regards,
Jeff Davis
incore() and
> RWF_NOWAIT
> are too expensive). The maximum gain from using AIO when the data is
> already
> in the page cache is just not very big, and it can cause slowdowns
> due to IPC
> overhead etc.
Also interesting. It might be worth mentioning in the docs and/or
README.
Regards,
Jeff Davis
On Fri, 2025-09-05 at 13:25 -0700, Jeff Davis wrote:
> As an aside, I'm building with meson using -Dc_args="-msse4.2 -Wtype-
> limits -Werror=missing-braces". But I notice that the meson build
> doesn't seem to use -funroll-loops or -ftree-vectorize when building
>
using -Dc_args="-msse4.2 -Wtype-
limits -Werror=missing-braces". But I notice that the meson build
doesn't seem to use -funroll-loops or -ftree-vectorize when building
checksums.c. Is that intentional? If not, perhaps slower checksum
calculations explain my results.
Regards,
cessary given how hard it is to measure meaningful lock
> contention so far;
Andres suggested that the case I brought up at the top of the thread is
due to lock contention, so a lock free queue also sounds like a
potential improvement. If the code is working and can be applied to
REL_18_STABLE,
to help, and were in some cases lower, but I didn't
look into why yet. It might have something to do with crowding out the
page cache.
Regards,
Jeff Davis
#x27;s from a very brief look,
I can try to get more precise numbers, but there seem to be enough 8kB
ones to support your theory.
Regards,
Jeff Davis
in very much, but it seemed to be at least as good as "sync"
mode for this workload.
Regards,
Jeff Davis
Test summary: 32 connections each perform repeated sequential scans.
Each connection scans a different 1GB partition of the same table. I
used partitioning and a predica
picture may get more complicated. While I expect the performance to be
much better overall, I wouldn't be surprised if "sync" ends up still
being useful for some purposes.
Regards,
Jeff Davis
sed changes based on my understanding.
Regards,
Jeff Davis
diff --git a/src/backend/storage/aio/README.md b/src/backend/storage/aio/README.md
index 72ae3b3737d..8fa6bd6e9ca 100644
--- a/src/backend/storage/aio/README.md
+++ b/src/backend/storage/aio/README.md
@@ -4,27 +4,38 @@
### Why Async
t the design choices to assumptions about
hardware, where appropriate.
Regards,
Jeff Davis
type as well as
regtype.
Regards,
Jeff Davis
From e7ee22230d4c14f0d1e83982647c7b04e1be3bf9 Mon Sep 17 00:00:00 2001
From: Jeff Davis
Date: Fri, 22 Aug 2025 13:40:36 -0700
Subject: [PATCH v1] Check for more Unicode functions during upgrade.
When checking for expression indexes that may be
can check the eflags for EXEC_FLAG_REWIND. That might not be the
only condition we need to check, but we should know at plan time
whether a subtree might be executed more than once.
Regards,
Jeff Davis
ludes
fragmentation, chunk headers, and free space. In the current code,
fragmentation is ignored most places, so (for example) switching to the
Bump allocator doesn't show the savings.
This isn't ready yet, but I'd appreciate some thoughts on the overall
architectural direction.
Regards
it's unlikely to
be called with the same parameters again, so whatever state it already
has might be useless anyway.
Also, are there any major challenges making this work with parallel
query?
Regards,
Jeff Davis
Example:
CREATE TABLE t1(id1 int8 primary key);
CREATE TABLE t2(id2 i
e interface, as well. Why does the caller
need to separately generate the ranges, then generate the table, then
generate the branches? It's really all the same action and can be based
on an input hash with a certain structure, and then return both the
table and the branches, right?
Regards,
Jeff Davis
hat's worth discussing in more detail to see what kinds of
issues it can help with and how it complements other approaches. I
suspect when we get into the details, different people (or different
situations) would want slightly different information out of that view.
Regards,
Jeff Davis
On Mon, 2025-08-11 at 13:54 -0400, Greg Sabino Mullane wrote:
> Great idea. +1. Here is a quick overall review to get things started.
Can you describe your use case? I'd like to understand whether this is
useful for users, hackers, or both.
Regards,
Jeff Davis
_mem several times over by using multiple data
structures.
* We should free the memory from a node when execution is complete, not
wait until ExecutorEnd(). What really matters is the maximum
*concurrent* memory usage.
Regards,
Jeff Davis
for 19 though.
Regards,
Jeff Davis
On Fri, 2025-08-01 at 17:46 -0500, Nathan Bossart wrote:
> On Fri, Aug 01, 2025 at 12:42:16PM -0700, Jeff Davis wrote:
> > - --with-statistics
> > + --statistics
>
> > - --with-statistics
> > + --statistics
>
> > - --with-statisti
people generally
think it's an improvement over what we have now.
Otherwise, we should just proceed with:
https://www.postgresql.org/message-id/40cedfc22da152928a74d472708aaadb8855d8d9.ca...@j-davis.com
and close the open item.
Regards,
Jeff Davis
On Tue, 2025-07-29 at 11:24 -0700, Jeff Davis wrote:
> On Wed, 2025-06-18 at 10:21 -0700, Jeff Davis wrote:
> > On Wed, 2025-06-18 at 10:43 -0500, Nathan Bossart wrote:
> > > IIUC the current proposal is to:
> > >
> > > * Dump/restore stats by default.
>
&
On Thu, 2025-07-31 at 17:21 +0200, Tomas Vondra wrote:
> On 7/31/25 15:39, Greg Burd wrote:
> > I recall a conversation at the last PGConf.dev (2025) with a
> > representative
> > from Intel and Jeff Davis (CC’ed) that had to do with checksums and
> > a vast
> &g
On Wed, 2025-07-30 at 12:21 -0500, Nathan Bossart wrote:
> Here is what I have staged for commit.
That's more clear to me. I also like that it shows that the options
work well together, because that was not obvious before.
Regards,
Jeff Davis
uot;? Because you currently can't do "--data-
only --schema-only". So that would make it not quite an alias.
If we go in this direction, it might be easier to just say that --
include conflicts with --schema-only and --data-only.
Regards,
Jeff Davis
On Tue, 2025-07-29 at 20:22 +0200, Álvaro Herrera wrote:
> Please move the switches themselves out of the translatable message,
> otherwise there are too many of them. For instance,
Thank you for looking, v2 attached.
Regards,
Jeff Davis
From 61b0239f17a1c7220de32699e95c6b365a
be builtin in that case, I suppose.
Another annoyance is that, if INITDB_LOCALE_PROVIDER=builtin, and
LC_CTYPE is not UTF-8-compatible, then we need to force LC_CTYPE=C.
That affects fewer things than it would with the libc provider, but it
still affects some things.
Regards,
Jeff Davis
On Wed, 2025-06-18 at 10:21 -0700, Jeff Davis wrote:
> On Wed, 2025-06-18 at 10:43 -0500, Nathan Bossart wrote:
> > IIUC the current proposal is to:
> >
> > * Dump/restore stats by default.
We don't have a consensus for that, so unless a few people make an
abrupt turnar
On Thu, 2025-07-10 at 10:42 -0700, Jeff Davis wrote:
> On Wed, 2025-06-18 at 10:21 -0700, Jeff Davis wrote:
> > * reject the combination of an "only" option and a "with" option
>
> There seems to be a rough consensus on this point.
Patch attached.
lt
> behavior about statistics in pg_dump, though.
I don't see a consensus to make stats the default.
Regards,
Jeff Davis
On Wed, 2025-07-23 at 19:11 -0700, Jeff Davis wrote:
> The patch feels a bit over-engineered, but I'd like to know what you
> think. It would be great if you could test/debug the windows NLS-
> enabled paths.
Let me explain how it ended up looking over-engineered, and perhaps
On Fri, 2025-07-11 at 11:48 +1200, Thomas Munro wrote:
> On Fri, Jul 11, 2025 at 6:22 AM Jeff Davis wrote:
> > I don't have a great windows development environment, and it
> > appears CI
> > and the buildfarm don't offer great coverage either. Can I ask for
> &
ocale. The current proposal doesn't attempt that kind of
cleverness.
Comments?
Regards,
Jeff Davis
From 8ba8f74d28a64bfb006a76fbec64638f55f3660c Mon Sep 17 00:00:00 2001
From: Jeff Davis
Date: Thu, 17 Jul 2025 13:07:50 -0700
Subject: [PATCH] initdb: default to builtin C.UTF-8
Disc
much milk if we only convert ASCII correctly.
>
> But perhaps I am just being paranoid.
That's a reasonable concern, and I don't mean to dismiss it. But I
believe that problem is two orders of magnitude smaller than the
problems we have with the status quo.
Regards,
Jeff Davis
hem when either --statistics-only or --no-
> > schema is used.
Thank you.
>
> +1, pending resolution of the defaults issue.
I went ahead and committed this as it clearly needs to be fixed. We can
continue the options discussion.
Regards,
Jeff Davis
SQL standard seems to require Unicode Full Case Mapping.
Regards,
Jeff Davis
[1] https://www.postgresql.org/docs/devel/locale.html#LOCALE-PROVIDERS
elease of the provider, it seems less likely to cause a problem
for equality searches, and therefore carries a lower risk for PKs. The
downside is that the keys will be larger and there are still some
risks, including bugs in the implementation (which is not just a
theoretical concern).
Othe
7b25c86f).
The revert seems to be related to pgport_shlib. At least for my current
work, I'm focused on removing setlocale() dependencies in the backend,
and a PG_C_LOCALE should work fine there.
Regards,
Jeff Davis
On Thu, 2025-07-10 at 11:53 +1200, Thomas Munro wrote:
> On Thu, Jul 10, 2025 at 10:52 AM Jeff Davis
> wrote:
> > The first problem -- how to affect the encoding of strings returned
> > by
> > strerror() on windows -- may be solvable as well. It looks like
> > LC_ME
o-statistics and reject --statistics.
Other options are mostly the same between them, so I'm not sure it's a
good idea for them to diverge.
Regards,
Jeff Davis
On Wed, 2025-06-18 at 10:21 -0700, Jeff Davis wrote:
> * reject the combination of an "only" option and a "with" option
There seems to be a rough consensus on this point. Should we move ahead
with this small change and see if we can get consensus to go further?
Regards,
Jeff Davis
On Mon, 2025-07-07 at 17:56 -0700, Jeff Davis wrote:
> I looked into this a bit, and if I understand correctly, the only
> problem is with strerror() and strerror_r(), which depend on
> LC_MESSAGES for the language but LC_CTYPE to find the right encoding.
...
> Windows would be a dif
I was trying to exercise the function IsoLocaleName(), which is
surrounded by:
#if defined(WIN32) && defined(LC_MESSAGES)
but, at least in CI, that combination never seems to be true, which
surprised me. What platforms exercise this code path?
Regards,
Jeff Davis
On Tue, 2025-07-01 at 08:06 -0700, Jeff Davis wrote:
> Attached rebased v3.
And here's v4.
I changed the global variable to only hold the LC_CTYPE (not
LC_COLLATE), because windows doesn't support a _locale_t that
represents multiple categories with different locales.
This pa
On Wed, 2025-06-11 at 12:15 -0700, Jeff Davis wrote:
> > v1-0008-Set-process-LC_COLLATE-C-and-LC_CTYPE-C.patch
> >
> > As I mentioned earlier in the thread, I don't think we can do this
> > for
> > LC_CTYPE, because otherwise system error messages would not
On Wed, 2025-06-11 at 12:15 -0700, Jeff Davis wrote:
> I changed this to a global_libc_locale that includes both LC_COLLATE
> and LC_CTYPE (from datcollate and datctype), in case an extension is
> relying on strcoll for some reason.
..
> This patch series, at least so far, is desi
d be confusing, but maybe it's fine.
Regards,
Jeff Davis
one.
Regards,
Jeff Davis
s from pg_locale.h but instead put
> them in the .c files as needed, and explain why this is possible or
> suitable now.
It goes with v16-0003, so I will hold this back for now as well.
Regards,
Jeff Davis
pen-source Unicode normalization? If so, that would be very
cool.
The reason I'm asking is because, if there are multiple open source
implementations, we should either have the best one, or just borrow
another one as long as it has a suitable license (perhaps translating
to C as necessary).
Regards,
Jeff Davis
ize _or_ use form-
> insensitive string comparison, but nothing did that 20 years ago.
> Thus
> doing the form-insensitivity in the filesystem seemed best, and if
> you
> do that you can be form-preserving to enable the optimization
> described
> above.
Databases have similar concerns as a filesystem in this respect.
Regards,
Jeff Davis
ities
for optimization as well, such as:
* reducing the need for palloc and extra buffers, perhaps by using
buffers on the stack for small strings
* operate more directly on UTF-8 data rather than decoding and re-
encoding the entire string
Regards,
Jeff Davis
>
> Works for me.
Sounds good. We can document compatibility notes around this point.
If normalization becomes important, we can take the time to work out
the performance implications more carefully, and potentially introduce
an NCASEFOLD() if needed.
Regards,
Jeff Davis
type.
> I guess I don't feel strongly about it either
> way.
Are you a user of citext? I'm genuinely interested in the use cases,
and whether the separate-data-type approach has merits that are missing
in the other approaches.
Regards,
Jeff Davis
he entry for EXCLUDE? I also merged your wording with
some similar wording from the entry about UNIQUE. Attached.
Regards,
Jeff Davis
From 0988ec1bac79055899fb555ac0c0441333888c83 Mon Sep 17 00:00:00 2001
From: "Paul A. Jungwirth"
Date: Tue, 17 Jun 2025 20:48:56 -0700
Subject:
R(), so that
sounds like a good idea. I'd be interested to hear from users of
citext.
Regards,
Jeff Davis
ot sure whether we'd want to standardize one or both of
those functions.
And if you think there's likely to be a collision with the standard
that's hard to anticipate and fix now, then we should consider
reverting CASEFOLD() for 18 and wait for more progress on the
standardization. W
tisfy Robert's concern about
the --help output. But Robert also wants stats off by default for
pg_dump and on by default for pg_restore, which I think means we need
both --with-statistics and --no-statistics anyway. Robert, comments?
Regards,
Jeff Davis
override that and I'm not sure we have one right now.
Regards,
Jeff Davis
On Thu, 2025-06-12 at 08:58 -0700, Jeff Davis wrote:
> On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote:
> > If the idea is to remove all options for default behavior, we'd be
> > removing
> > --no-statistics, --with-data, and --with-schema at this point.
>
&
folding would also
want normalization, but it's hard to weigh that against the performance
cost. It might not matter outside of a few edge cases, though I'm not
sure exactly how many.
Regards,
Jeff Davis
but the "--x-only" options
also put us in a tough spot.
If --data-only had always been spelled "--no-schema" (or "--without-
data" or whatever), and --schema-only had always been spelled "--no-
data", then I think it would be a lot easier to add statistics into the
mix.
Regards,
Jeff Davis
On Mon, 2025-06-16 at 16:09 -0500, Nathan Bossart wrote:
> So perhaps there's not as strong of a
> consensus as we thought. Maybe we should ask for any new/updated
> votes.
Does it make any sense to be off by default in 18 and on in some later
release?
Regards
Jeff Davis
Fixed.
Regards,
Jeff Davis
isible changes in the past, and
> regenerating tsvectors because of that were merely a suggestion.
Interesting, thank you for looking into the history here. It would
certainly be simpler to just make FTS fully collation-aware.
Regards,
Jeff Davis
ther options,
we don't need to worry about consistency with them, and I think we
should just use "--statistics".
Regards,
Jeff Davis
y.
To me, "last option wins" means that you don't raise an error; the
latter option simply overrides the earlier one.
Given that the pg_dump options are not order-sensitive now (unless I'm
missing something), I'm worried about the consequences of trying to
make them so now.
Regards,
Jeff Davis
simple to start using "last option wins" behavior
now. There are probably some combinations of options where it's not
clear whether a later option is an extra constraint or will override a
previous option.
Regards,
Jeff Davis
On Thu, 2025-06-12 at 15:57 -0500, Nathan Bossart wrote:
> FWIW I don't have a tremendously strong opinion about --statistics-
> only.
Same here. I won't cast a vote on this particular issue, as long as the
functionality is available.
Regards,
Jeff Davis
rip out --statistics-only (in favor
> of
> --no-schema --no-data --with-statistics).
I'd probably keep --statistics-only.
Regards,
Jeff Davis
On Thu, 2025-06-12 at 10:18 -0400, Robert Haas wrote:
> Am I too late to propose ripping this out?
As long as we keep the functionality, I'm fine changing the
options/names around at this point.
Regards,
Jeff Davis
ndexes,
which are in SECTION_POST_DATA).
Regards,
Jeff Davis
On Fri, 2025-02-07 at 11:19 -0800, Jeff Davis wrote:
>
> Attached v15. Just a rebase.
Attached v16.
> * commit this on the grounds that it's a desirable code improvement
> and
> the worst-case regression isn't a major concern; or
I plan to commit this soon after bra
NCTION
statements that come from other places (e.g. direct from applications,
or migration scripts, or extension scripts).
>
Regards,
Jeff Davis
ger of accidentally depending on that setting. Can the encoding be
controlled with LC_MESSAGES instead of LC_CTYPE?
Do you have an example of how things can go wrong?
> For the LC_COLLATE settings, I think we could just
> do the setting in main(), where the other non-database-speci
We could try to create a GUC to control this behavior, but behavior-
changing GUCs don't have a great history, and it would probably last
quite some time before we could really turn off libc for good.
There would be similar challenges for downcase_identifier() and maybe
pg_strcasecmp().
Regards,
Jeff Davis
o.
I guess "CTYPE" works, but it's too technical and feels libc-specific.
Regards,
Jeff Davis
we need is the right encoding, do
we need a proper locale?
Regards,
Jeff Davis
On Fri, 2025-06-06 at 15:47 -0700, Jeff Davis wrote:
> > > * Force the environment variables LC_COLLATE=C and LC_CTYPE=C
> > > unconditionally, and pg_perm_setlocale() them
> >
> > Currently that would be a regression for some people, because
> > when
On Thu, 2025-06-05 at 22:15 -0700, Jeff Davis wrote:
> To continue this thread, I did a symbol search in the meson build
> directory like (patterns.txt attached):
Attached a rough patch series which does what everyone seemed to agree
on:
* Change some trivial ASCII cases to use pg_
on datctype, and I could have offered a more clear reply to
the user.
Regards,
Jeff Davis
/ comments. Another caller is
get_iso_localename().
There are also a couple false positives where mbstowcs_l/wcstombs_l are
emulated with uselocale() and mbstowcs/wcstombs. In that case, it's not
actually sensitive to the global setting.
---
copyfromparse.c - the input is
, then ignore LC_COLLATE/LC_CTYPE and emit a
WARNING, rather than trying to set it based on LOCALE and getting an
error.
Regards,
Jeff Davis
[1]
https://www.postgresql.org/message-id/cd3517c7-ddb8-454e-9dd5-70e3d84ff6a2%40eisentraut.org
From fea7ab4f0495330fae56f069520de374d75ae0b8 Mon Sep 17
On Tue, 2025-06-03 at 20:22 -0700, Jeff Davis wrote:
> EQUALITY marker: indicates that the function or index AM depends on
> CollOid for the equality semantics of the input expression. Examples:
> texteq(), btree AM, hash AM. (Note: EQUALITY is only important for
> non-
> determini
a strong opinion on which route to
take, but I chose the above names from existing keywords so we wouldn't
have to add any.
Regards,
Jeff Davis
1 - 100 of 1433 matches
Mail list logo