When parsing JSON strings need to be converted from the JSON string format to a c-style string. A simple copy of the buffer does not suffice because of the various escape sequences that that JSON supports. Because of this our JSON parser wrote characters into the c-style string buffer one at a time.
However, this is only necessary for these escaped sequences that map to another character. This patch changes the behaviour for non-escaped characters. These are now copied in batches instead of one character at a time. To test performance of this change I used COPY BINARY from a JSONB table into another, containing fairly JSONB values of ~15kB. The JSONB values are a JSON object with a single level. They contain a few small keys and values, but one very big value that's a stringified JSON blob. So this JSON blob contains a relatively high number of escape characters, to escape all the " characters. This change improves performance for workload this workload on my machine by ~18% (going from 1m24s to 1m09s). @Andres, there was indeed some low hanging fruit. @John Naylor, SSE2 indeed sounds like another nice improvement. I'll leave that to you.
0001-Optimize-json_lex_string-by-batching-character-copie.patch
Description: 0001-Optimize-json_lex_string-by-batching-character-copie.patch