On Tue, Sep 12, 2017 at 3:19 AM, Andres Freund <and...@anarazel.de> wrote: > Therefore I wonder if we couldn't just store a querystring that's > essentially just a memcpy()ed prefix, and do a pg_mbcliplen() on the > read side. I think that should work because all *server side* encodings > store character lengths in the *first* byte of a multibyte character > (at least one clientside encoding, gb18030, doesn't behave that way). > > That'd necessitate an added memory copy in pg_stat_get_activity(), but > that seems fairly harmless.
Interesting idea. I was (ha, ha, what a coincidence) also thinking about this problem and was wondering if we couldn't be a lot smarter about pg_mbcliplen(). I mean, pg_mbcliplen is basically just being used here to trim away any partial character that would have to get chopped off to fit within the length limit. But right now it's scanning the whole string to do this, which is unnecessary. At least for UTF-8, we could do that much more directly: if the string is short enough, stop, else, look at cmd_str[pgstat_track_activity_query_size]. If that character has (c & 0xc0) != 0x80, write a '\0' and stop; else, back up until you find a character that for which that continuation holds, write a '\0', and stop. This kind of approach only works if we have a definitive test for whether something is a "continuation character" which probably isn't true in all encodings, but maybe it's still worth considering. Your idea is probably a lot simpler to implement, though, and I definitely agree that shifting the work from the write side to the read side makes sense. Updating pg_stat_activity is a lot more common than reading it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers