Re: Improve the performance of Unicode Normalization Forms.

2025-06-20 Thread Nico Williams
On Fri, Jun 20, 2025 at 10:15:47AM -0700, Jeff Davis wrote: > On Fri, 2025-06-20 at 11:31 -0500, Nico Williams wrote: > > In the slow path you only normalize the _current character_, so you > > only need enough buffer space for that. > > That's a clear win for UTF8 data. Also, if there are no chan

Re: Improve the performance of Unicode Normalization Forms.

2025-06-20 Thread Jeff Davis
On Fri, 2025-06-20 at 17:51 +0300, Alexander Borisov wrote: > I don't quite see how this compares to the implementation on Rust. In > the link provided, they use perfect hash, which I get rid of and get > a x2 boost. > If you take ICU implementations in C++, I have always considered them > slow, at

Re: Improve the performance of Unicode Normalization Forms.

2025-06-20 Thread Jeff Davis
On Fri, 2025-06-20 at 11:31 -0500, Nico Williams wrote: > In the slow path you only > normalize the _current character_, so you only need enough buffer > space > for that. That's a clear win for UTF8 data. Also, if there are no changes, then you can just return the input buffer and not bother allo

Re: Improve the performance of Unicode Normalization Forms.

2025-06-20 Thread Nico Williams
On Thu, Jun 19, 2025 at 10:41:57AM -0700, Jeff Davis wrote: > In addition to the lookups themselves, there are other opportunities > for optimization as well, such as: > > * reducing the need for palloc and extra buffers, perhaps by using > buffers on the stack for small strings > > * operate mor

Re: Improve the performance of Unicode Normalization Forms.

2025-06-20 Thread Alexander Borisov
19.06.2025 20:41, Jeff Davis wrote: On Tue, 2025-06-03 at 00:51 +0300, Alexander Borisov wrote: As promised, I continue to improve/speed up Unicode in Postgres. Last time, we improved the lower(), upper(), and casefold() functions. [1] Now it's time for Unicode Normalization Forms, specifically

Re: Improve the performance of Unicode Normalization Forms.

2025-06-19 Thread Jeff Davis
On Tue, 2025-06-03 at 00:51 +0300, Alexander Borisov wrote: > As promised, I continue to improve/speed up Unicode in Postgres. > Last time, we improved the lower(), upper(), and casefold() > functions. [1] > Now it's time for Unicode Normalization Forms, specifically > the normalize() function. Di

Re: Improve the performance of Unicode Normalization Forms.

2025-06-12 Thread John Naylor
On Wed, Jun 11, 2025 at 7:27 PM Alexander Borisov wrote: > > 11.06.2025 10:13, John Naylor wrote: > > On Tue, Jun 3, 2025 at 1:51 PM Alexander Borisov > > wrote: > >> 5. The server part "lost weight" in the binary, but the frontend > >> "gained weight" a little. > >> > >> I read the old com

Re: Improve the performance of Unicode Normalization Forms.

2025-06-11 Thread Alexander Borisov
11.06.2025 10:13, John Naylor wrote: On Tue, Jun 3, 2025 at 1:51 PM Alexander Borisov wrote: 5. The server part "lost weight" in the binary, but the frontend "gained weight" a little. I read the old commits, which say that the size of the frontend is very important and that speed is not i

Re: Improve the performance of Unicode Normalization Forms.

2025-06-11 Thread John Naylor
On Tue, Jun 3, 2025 at 1:51 PM Alexander Borisov wrote: > 5. The server part "lost weight" in the binary, but the frontend > "gained weight" a little. > > I read the old commits, which say that the size of the frontend is very > important and that speed is not important > (speed is important o