On Mon, Oct 2, 2023 at 3:42 PM Peter Eisentraut <pe...@eisentraut.org> wrote: > I think a better direction here would be to work toward making > nondeterministic collations usable on the global/database level and then > encouraging users to use those.
It seems to me that this overlooks one of the major points of Jeff's proposal, which is that we don't reject text input that contains unassigned code points. That decision turns out to be really painful. Here, Jeff mentions normalization, but I think it's a major issue with collation support. If new code points are added, users can put them into the database before they are known to the collation library, and then when they become known to the collation library the sort order changes and indexes break. Would we endorse a proposal to make pg_catalog.text with encoding UTF-8 reject code points that aren't yet known to the collation library? To do so would be tighten things up considerably from where they stand today, and the way things stand today is already rigid enough to cause problems for some users. But if we're not willing to do that then I find it easy to understand why Jeff wants an alternative type that does. Now, there is still the question of whether such a data type would properly belong in core or even contrib rather than being an out-of-core project. It's not obvious to me that such a data type would get enough traction that we'd want it to be part of PostgreSQL itself. But at the same time I can certainly understand why Jeff finds the status quo problematic. -- Robert Haas EDB: http://www.enterprisedb.com