Currently the escape_json() function takes a cstring and char-by-char checks each character in the string up to the NUL and adds the escape sequence if the character requires it.
Because this function requires a NUL terminated string, we're having to do a little more work in some places. For example, in jsonb_put_escaped_value() we call pnstrdup() on the non-NUL-terminated string to make a NUL-terminated string to pass to escape_json(). To make this faster, we can just have a version of escape_json which takes a 'len' and stops after doing that many chars rather than stopping when the NUL char is reached. Now there's no need to pnstrdup() which saves some palloc()/memcpy() work. There are also a few places where we do escape_json() with a "text" typed Datum where we go and convert the text to a NUL-terminated cstring so we can pass that along to ecape_json(). That's wasteful as we could just pass the payload of the text Datum directly, and only allocate memory if the text Datum needs to be de-toasted. That saves a useless palloc/memcpy/pfree cycle. Now, to make this more interesting, since we have a version of escape_json which takes a 'len', we could start looking at more than 1 character at a time. If you look closely add escape_json() all the special chars apart from " and \ are below the space character. pg_lfind8() and pg_lfind8_le() allow processing of 16 bytes at a time, so we only need to search the 16 bytes 3 times to ensure that no special chars exist within. When that test fails, just go into byte-at-a-time processing first copying over the portion of the string that passed the vector test up until that point. I've attached 2 patches: 0001 does everything I've described aside from SIMD. 0002 does SIMD I've not personally done too much work in the area of JSON, so I don't have any canned workloads to throw at this. I did try the following: create table j1 (very_long_column_name_to_test_json_escape text); insert into j1 select repeat('x', x) from generate_series(0,1024)x; vacuum freeze j1; bench.sql: select row_to_json(j1)::jsonb from j1; Master: $ pgbench -n -f bench.sql -T 10 -M prepared postgres | grep tps tps = 362.494309 (without initial connection time) tps = 363.182458 (without initial connection time) tps = 362.679654 (without initial connection time) Master + 0001 + 0002 $ pgbench -n -f bench.sql -T 10 -M prepared postgres | grep tps tps = 426.456885 (without initial connection time) tps = 430.573046 (without initial connection time) tps = 431.142917 (without initial connection time) About 18% faster. It would be much faster if we could also get rid of the escape_json_cstring() call in the switch default case of datum_to_json_internal(). row_to_json() would be heaps faster with that done. I considered adding a special case for the "text" type there, but in the end felt that we should just fix that with some hypothetical other patch that changes how output functions work. Others may feel it's worthwhile. I certainly could be convinced of it. I did add a new regression test. I'm not sure I'd want to keep that, but felt it's worth leaving in there for now. Other things I considered were if doing 16 bytes at a time is too much as it puts quite a bit of work into byte-at-a-time processing if just 1 special char exists in a 16-byte chunk. I considered doing SWAR [1] processing to do the job of vector8_has_le() and vector8_has() byte maybe with just uint32s. It might be worth doing that. However, I've not done it yet as it raises the bar for this patch quite a bit. SWAR vector processing is pretty much write-only code. Imagine trying to write comments for the code in [2] so that the average person could understand what's going on!? I'd be happy to hear from anyone that can throw these patches at a real-world JSON workload to see if it runs more quickly. Parking for July CF. David [1] https://en.wikipedia.org/wiki/SWAR [2] https://dotat.at/@/2022-06-27-tolower-swar.html
v1-0001-Add-len-parameter-to-escape_json.patch
Description: Binary data
v1-0002-Use-SIMD-processing-for-escape_json.patch
Description: Binary data