ср, 3 сент. 2025 г. в 09:35, Alexander Borisov <lex.bori...@gmail.com>:

> Hi, Jeff, hackers!
>
> As promised, refactoring the C code for Unicode Normalization Forms.
>
> In general terms, here's what has changed:
> 1. Recursion has been removed; now data is generated using
>      a Perl script.
> 2. Memory is no longer allocated for uint32 for the entire size,
>      but uint8 is allocated for the entire size for the CCC cache, which
>      boosts performance significantly.
> 3. The code for the unicode_normalize() function has been completely
>      rewritten.
>
> I am confident that we have achieved excellent results.
>

Hey.

I've looked into these patches.

Patches apply, compilation succeedes, make check and make installcheck shows
no errors.

Code quality is good, although I suggest a native english speaker to review
comments and commit messages — a bit difficult to follow.

Description of the Sparse Array approach is done in the newly introduced
GenerateSparseArray.pm module.  Perhaps it'd be valuable to add a section
into
the src/common/unicode/README, it'll get more visibility.
( Not insisting here. )

For performance testing I've used an approach by Jeff Davis. [1]
I've prepared NFC and NFD files, loaded them into UNLOGGED tables and
measured
normalize() calls.

    CREATE UNLOGGED TABLE strings_nfd (
      str   text STORAGE PLAIN NOT NULL
    );
    COPY strings_nfd FROM '/var/lib/postgresql/strings.nfd.txt';

    CREATE UNLOGGED TABLE strings_nfc (
      str   text STORAGE PLAIN NOT NULL
    );
    COPY strings_nfc FROM '/var/lib/postgresql/strings.nfc.txt';

    SELECT count( normalize( str, NFD ) ) FROM strings_nfd,
generate_series( 1, 10 ) x;
    SELECT count( normalize( str, NFC ) ) FROM strings_nfc,
generate_series( 1, 10 ) x;

And I've got the following numbers:

Master
NFD Time: 2954.630 ms / 295ms
NFC Time: 3929.939 ms / 330ms

Patched
NFD Time: 1658.345 ms / 166ms / +78%
NFC Time: 1862.757 ms / 186ms / +77%

Overall, I find these patches and performance very nice and valuable.
I've added myself as a reviewer and marked this patch as Ready for
Committer.

[1]
https://postgr.es/m/adffa1fbdb867d5a11c9a8211cde3bdb1e208823.ca...@j-davis.com

-- 
Victor Yegorov

Reply via email to