On Tue, Mar 10, 2026 at 3:04 PM Jeff Davis <[email protected]> wrote:
> If their environment's LC_CTYPE is UTF8-based, they already get UTF-8.
> If it isn't, we can either:
>
> (a) Fall back to LC_CTYPE=C, which is the only UTF8-compatible locale
> available everywhere. C is actually not a terrible fallback: it doesn't
> actually affect many things, because I have moved almost everything to
> use the database default locale.
>
> (b) Warn or error unless they explicitly specify the encoding with -E.
> But the former is likely to be ignored and the latter is not what I'd
> call "gentle".
>
> Which of these do you think is the right approach?

I'm a little confused as to how this relates to what you were asking
before. I thought you were proposing to pick UTF-8 rather than
SQL_ASCII when LC_CTYPE=C, but that's not on this list of options. To
be honest, I'd probably be ready to support making the default
encoding UTF8 regardless of the environment, and you have to use -E if
you want anything else. I think there are still people using other
encodings, but I believe it to be a small minority at this point.

> There's narrower question about what we do with LC_CTYPE=C. Currently
> we use SQL_ASCII encoding, which doesn't seem like a great default, and
> we could change that to default to UTF8. And another question about
> whether we change the meaning of --no-locale.

I think SQL_ASCII is a terrible default. Nobody actually wants that
unless they're trying to get out of a sticky situation. Making it
opt-in must be right. I do not know what the question about
--no-locale is.

> We sweat over single-digit performance regressions in fairly specific
> cases all the time, but here we're 3X slower for index builds:
>
> https://www.depesz.com/2024/06/11/how-much-speed-youre-leaving-at-the-table-if-you-use-default-locale/
>
> and 2-5X slower for Sort:
>
> https://www.postgresql.org/message-id/[email protected]
>
> and others don't seem very concerned, so I feel like I'm missing
> something.

<insert shrug emoji here>

At the end of the day, we're all just guessing. My experience working
for EDB is that we have a number of customers who care about sort
order quite a lot, and we've had to sweat blood to make them happy.
And, on a personal level, I have a hard time understanding why anyone
would be OK with a sort order that puts Álvaro after Zebra instead of
between Alvaro and Beatriz, because that seems extremely frustrating.
However, these are just personal biases. I'm much more likely to hear
from the customers who care a lot about the details of how something
works than I am to hear from the customers who are perfectly happy to
take the defaults, because people who are happy don't contact support
at all and people who are unhappy about relatively normal things get
handled by support; I get the weird cases. And everybody is going to
have different experiences. Presumably, your experience is that the
indexing and sorting performance is a big concern for the users you
support, and that's why you favor prioritizing that part of the
experience. That's perfectly legitimate, but it's different from my
experience. My experience is that when I tell people they can use
collate "C" to speed up sorting, they tell me that's a stupid
workaround that doesn't give them the answers that they want, which
obviously colors my viewpoint on this question in the same way that
your experiences color yours.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


Reply via email to