ср, 3 сент. 2025 г. в 09:35, Alexander Borisov <lex.bori...@gmail.com>:
> Hi, Jeff, hackers! > > As promised, refactoring the C code for Unicode Normalization Forms. > > In general terms, here's what has changed: > 1. Recursion has been removed; now data is generated using > a Perl script. > 2. Memory is no longer allocated for uint32 for the entire size, > but uint8 is allocated for the entire size for the CCC cache, which > boosts performance significantly. > 3. The code for the unicode_normalize() function has been completely > rewritten. > > I am confident that we have achieved excellent results. > Hey. I've looked into these patches. Patches apply, compilation succeedes, make check and make installcheck shows no errors. Code quality is good, although I suggest a native english speaker to review comments and commit messages — a bit difficult to follow. Description of the Sparse Array approach is done in the newly introduced GenerateSparseArray.pm module. Perhaps it'd be valuable to add a section into the src/common/unicode/README, it'll get more visibility. ( Not insisting here. ) For performance testing I've used an approach by Jeff Davis. [1] I've prepared NFC and NFD files, loaded them into UNLOGGED tables and measured normalize() calls. CREATE UNLOGGED TABLE strings_nfd ( str text STORAGE PLAIN NOT NULL ); COPY strings_nfd FROM '/var/lib/postgresql/strings.nfd.txt'; CREATE UNLOGGED TABLE strings_nfc ( str text STORAGE PLAIN NOT NULL ); COPY strings_nfc FROM '/var/lib/postgresql/strings.nfc.txt'; SELECT count( normalize( str, NFD ) ) FROM strings_nfd, generate_series( 1, 10 ) x; SELECT count( normalize( str, NFC ) ) FROM strings_nfc, generate_series( 1, 10 ) x; And I've got the following numbers: Master NFD Time: 2954.630 ms / 295ms NFC Time: 3929.939 ms / 330ms Patched NFD Time: 1658.345 ms / 166ms / +78% NFC Time: 1862.757 ms / 186ms / +77% Overall, I find these patches and performance very nice and valuable. I've added myself as a reviewer and marked this patch as Ready for Committer. [1] https://postgr.es/m/adffa1fbdb867d5a11c9a8211cde3bdb1e208823.ca...@j-davis.com -- Victor Yegorov