On 15.03.22 18:28, Robert Haas wrote:
On Tue, Mar 15, 2022 at 12:58 PM Peter Eisentraut
<peter.eisentr...@enterprisedb.com> wrote:
On 14.03.22 19:57, Robert Haas wrote:
1. What will happen if I set the ICU collation to something that
doesn't match the libc collation? How bad are the consequences?

These are unrelated, so there are no consequences.

Can you please elaborate on this?

The code that is aware of ICU generally works like this:

if (locale_provider == ICU)
  result = call ICU code
else
  result = call libc code
return result

However, there is code out there, both within PostgreSQL itself and in extensions, that does not do that yet. Ideally, we would eventually change all that over, but it's not happening now. So we ought to preserve the ability to set the libc to keep that legacy code working for now.

This legacy code by definition doesn't know about ICU, so it doesn't care whether the ICU setting "matches" the libc setting or anything like that. It will just do its thing depending on its own setting.

The only consequence of settings that don't match is that the different pieces of code behave semantically inconsistently (e.g., some routine thinks the data is Greek and other code thinks the data is French). But that's up to the user to set correctly. And the actual scenarios where you can actually do anything semantically relevant this way are very limited.

A second point is that the LC_CTYPE setting tells other parts of libc what the current encoding is. This affects gettext for example. So you need to set this to something sensible even if you don't use libc locale routines otherwise.

2. If I want to avoid a mismatch between the two, then I will need a
way to figure out which libc collation corresponds to a given ICU
collation. How do I do that?

You can specify the same name for both.

Hmm. If every name were valid in both systems, I don't think you'd be
proposing two fields.

Earlier versions of this patch and predecessor patches indeed had common fields. But in fact the two systems accept different values if you want to delve into the advanced features. But for basic usage something like "en_US" will work for both.


Reply via email to