Re: Pre-proposal: unicode normalized text

Peter Eisentraut Mon, 02 Oct 2023 01:52:42 -0700

On 13.09.23 00:47, Jeff Davis wrote:

The idea is to have a new data type, say "UTEXT", that normalizes the
input so that it can have an improved notion of equality while still
using memcmp().

I think a new type like this would obviously be suboptimal because it'snonstandard and most people wouldn't use it.

I think a better direction here would be to work toward makingnondeterministic collations usable on the global/database level and thenencouraging users to use those.


It's also not clear which way the performance tradeoffs would fall.

Nondeterministic collations are obviously going to be slower, but by howmuch? People have accepted moving from C locale to "real" localesbecause they needed those semantics. Would it be any worse moving fromreal locales to "even realer" locales?

On the other hand, a utext type would either require a large set of itsown functions and operators, or you would have to inject text-to-utextcasts in places, which would also introduce overhead.

Re: Pre-proposal: unicode normalized text

Reply via email to