11.06.2025 10:13, John Naylor wrote:
On Tue, Jun 3, 2025 at 1:51 PM Alexander Borisov <lex.bori...@gmail.com> wrote:
5. The server part "lost weight" in the binary, but the frontend
"gained weight" a little.
I read the old commits, which say that the size of the frontend is very
important and that speed is not important
(speed is important on the server).
I'm not quite sure what to do if this is really the case. Perhaps
we should leave the slow version for the frontend.
In the "small" patch, the frontend files got a few kB bigger, but the
backend got quite a bit smaller. If we decided to go with this patch,
I'd say it's preferable to do it in a way that keeps both paths the
same.
Okay, then I'll leave the frontend unchanged so that the size remains
the same. The changes will only affect the backend.
How was it tested?
Four files were created for each normalization form: NFC, NFD, NFKC,
and NFKD.
The files were sent via pgbench. The files contain all code points that
need to be normalized.
Unfortunately, the patches are already quite large, but if necessary,
I can send these files in a separate email or upload them somewhere.
What kind of workload do they present?
Did you consider running the same tests from the thread that lead to
the current implementation?
I found performance tests in this discussion
https://www.postgresql.org/message-id/CAFBsxsHUuMFCt6-pU+oG-F1==cmep8wr+o+brouxwu6i8kx...@mail.gmail.com
Below are performance test results.
* Ubuntu 24.04.1 (Intel(R) Xeon(R) Gold 6140) (gcc version 13.3.0)
1.
Normalize, decomp only
select count(normalize(t, NFD)) from (
select md5(i::text) as t from
generate_series(1,100000) as i
) s;
Patch (big table): 279,858 ms
Patch (small table): 282,925 ms
Without: 444,118 ms
2.
select count(normalize(t, NFD)) from (
select repeat(U&'\00E4\00C5\0958\00F4\1EBF\3300\1FE2\3316\2465\322D', i % 3
+ 1) as t from
generate_series(1,100000) as i
) s;
Patch (big table): 219,858 ms
Patch (small table): 247,893 ms
Without: 376,906 ms
3.
Normalize, decomp+recomp
select count(normalize(t, NFC)) from (
select md5(i::text) as t from
generate_series(1,1000) as i
) s;
Patch (big table): 7,553 ms
Patch (small table): 7,876 ms
Without: 13,177 ms
4.
select count(normalize(t, NFC)) from (
select repeat(U&'\00E4\00C5\0958\00F4\1EBF\3300\1FE2\3316\2465\322D', i % 3
+ 1) as t from
generate_series(1,1000) as i
) s;
Patch (big table): 5,765 ms
Patch (small table): 6,782 ms
Without: 10,800 ms
5.
Quick check has not changed because these patches do not affect it:
-- all chars are quickcheck YES
select count(*) from (
select md5(i::text) as t from
generate_series(1,100000) as i
) s;
Patch (big table): 29,477 ms
Patch (small table): 29,436 ms
Without: 29,378 ms
From these tests, we see 2x in some tests.
--
Best regards,
Alexander Borisov