On Tue, Apr 30, 2024 at 11:09 PM Michael Paquier <mich...@paquier.xyz> wrote: > Not sure to like much the fact that this advances token_terminator > first. Wouldn't it be better to calculate pg_encoding_mblen() first, > then save token_terminator? I feel a bit uneasy about saving a value > in token_terminator past the end of the string. That a nit in this > context, still..
v2 tries it that way; see what you think. Is the concern that someone might add code later that escapes that macro early? > Hmm. Keeping it around as currently designed means that it could > cause more harm than anything in the long term, even in the stable > branches if new code uses it. There is a risk of seeing this new code > incorrectly using it again, even if its top comment tells that we rely > on the string being nul-terminated. A safer alternative would be to > redesign it so as the length of the string is provided in input, > removing the dependency of strlen in its internals, perhaps. Anyway, > without any callers of it, I'd be tempted to wipe it from HEAD and > call it a day. Removed in v2. > > The new test needs to record two versions of the error message, one > > for invalid token and one for invalid escape sequence. This is > > because, for smaller chunk sizes, the partial-token logic in the > > incremental JSON parser skips the affected code entirely when it can't > > find an ending double-quote. > > Ah, that makes sense. That looks OK here. A comment around the test > would be adapted to document that, I guess. Done. > Advancing the tracking pointer 's' before reporting an error related > the end of the string is a bad practive, IMO, and we should avoid > that. json_lex_string() does not offer a warm feeling regarding that > with escape characters, at least :/ Yeah... I think some expansion of the json_errdetail test coverage is probably in my future. :) Thanks, --Jacob
v2-0001-json_lex_string-don-t-overread-on-bad-UTF8.patch
Description: Binary data