On Tue, May 07, 2024 at 02:06:10PM -0700, Jacob Champion wrote: > Maybe I've misunderstood, but isn't that what's being done in v2?
Something a bit different.. I was wondering if it could be possible to tweak this code to truncate the data in the generated error string so as the incomplete multi-byte sequence is entirely cut out, which would come to setting token_terminator to "s" (last byte before the incomplete byte sequence) rather than "term" (last byte available, even if incomplete): #define FAIL_AT_CHAR_END(code) \ do { \ char *term = s + pg_encoding_mblen(lex->input_encoding, s); \ lex->token_terminator = (term <= end) ? term : s; \ return code; \ } while (0) But looking closer, I can see that in the JSON_INVALID_TOKEN case, when !tok_done, we set token_terminator to point to the end of the token, and that would include an incomplete byte sequence like in your case. :/ At the end of the day, I think that I'm OK with your patch and avoid the overread for now in the back-branches. This situation makes me uncomfortable and we should put more effort in printing error messages in a readable format, but that could always be tackled later as a separate problem.. And I don't see something backpatchable at short sight for v16. Thoughts and/or objections? -- Michael
signature.asc
Description: PGP signature