Hi, On 2022-07-15 14:01:53 -0700, Jacob Champion wrote: > On 7/15/22 13:35, Andres Freund wrote: > >> (And do we want to fix it now, regardless?) > > > > Yes. > > Cool. I can get on board with that. > > >> What guarantees are we supposed to be making for log encoding? > > > > I don't know, but I don't think not caring at all is a good > > option. Particularly for unauthenticated data I'd say that escaping > > everything > > but printable ascii chars is a sensible approach. > > It'll also be painful for anyone whose infrastructure isn't in a Latin > character set... Maybe that's worth the tradeoff for a v1.
I don't think it's a huge issue, or really avoidable, pre-authentication. Don't we require all server-side encodings to be supersets of ascii? We already have pg_clean_ascii() and use it for application_name, fwiw. > Is there an acceptable approach that could centralize it, so we fix it > once and are done? E.g. a log_encoding GUC and either conversion or > escaping in send_message_to_server_log()? Introducing escaping to ascii for all log messages seems like it'd be incredibly invasive, and would remove a lot of worthwhile information. Nor does it really address the whole scope - consider e.g. the truncation in this patch, that can't be done correctly by the time send_message_to_server_log() is reached - just chopping in the middle of a multi-byte string would have made the string invalidly encoded. And we can't perform encoding conversion from client data until we've gone further into the authentication process, I think. Always escaping ANSI escape codes (or rather the non-printable ascii range) is more convincing. Then we'd just need to make sure that client controlled data is properly encoded before handing it over to other parts of the system. I can see a point in a log_encoding GUC at some point, but it seems a bit separate from the discussion here. Greetings, Andres Freund