Hi, On 2022-06-24 08:47:09 +0000, Jelte Fennema wrote: > To test performance of this change I used COPY BINARY from a JSONB table > into another, containing fairly JSONB values of ~15kB.
This will have a lot of other costs included (DML is expensive). I'd suggest storing the json in a text column and casting it to json[b], with a filter ontop of the json[b] result that cheaply filters it away. That should end up spending nearly all the time somewhere around json parsing. It's useful for things like this to include a way for others to use the same benchmark... I tried your patch with: DROP TABLE IF EXISTS json_as_text; CREATE TABLE json_as_text AS SELECT (SELECT json_agg(row_to_json(pd)) as t FROM pg_description pd) FROM generate_series(1, 100); VACUUM FREEZE json_as_text; SELECT 1 FROM json_as_text WHERE jsonb_typeof(t::jsonb) = 'not me'; Which the patch improves from 846ms to 754ms (best of three). A bit smaller than your improvement, but still nice. I think your patch doesn't quite go far enough - we still end up looping for each character, have the added complication of needing to flush the "buffer". I'd be surprised if a "dedicated" loop to see until where the string last isn't faster. That then obviously could be SIMDified. Separately, it seems pretty awful efficiency / code density wise to have the NULL checks for ->strval all over. Might be worth forcing json_lex() and json_lex_string() to be inlined, with a constant parameter deciding whether ->strval is expected. That'd likely be enough to get the compiler specialize the code for us. Might also be worth to maintain ->strval using appendBinaryStringInfoNT(). Greetings, Andres Freund