On Tue, 2025-06-03 at 00:51 +0300, Alexander Borisov wrote: > As promised, I continue to improve/speed up Unicode in Postgres. > Last time, we improved the lower(), upper(), and casefold() > functions. [1] > Now it's time for Unicode Normalization Forms, specifically > the normalize() function.
Did you compare against other implementations, such as ICU's normalization functions? There's also a rust crate here: https://github.com/unicode-rs/unicode-normalization that might have been optimized. In addition to the lookups themselves, there are other opportunities for optimization as well, such as: * reducing the need for palloc and extra buffers, perhaps by using buffers on the stack for small strings * operate more directly on UTF-8 data rather than decoding and re- encoding the entire string Regards, Jeff Davis