Re: [HACKERS] More efficient truncation of pg_stat_activity query strings

Tatsuo Ishii Tue, 12 Sep 2017 02:02:24 -0700

> Check the information the pg_*_mblen use / how the relevant encodings
> work. Will be something like
> int
> pg_utf_mblen(const unsigned char *s)
> {
>       int                     len;
> 
>       if ((*s & 0x80) == 0)
>               len = 1;
>       else if ((*s & 0xe0) == 0xc0)
>               len = 2;
>       else if ((*s & 0xf0) == 0xe0)
>               len = 3;
>       else if ((*s & 0xf8) == 0xf0)
>               len = 4;
> #ifdef NOT_USED
>       else if ((*s & 0xfc) == 0xf8)
>               len = 5;
>       else if ((*s & 0xfe) == 0xfc)
>               len = 6;
> #endif
>       else
>               len = 1;
>       return len;
> }
> 
> As you can see, only the first character (*s) is accessed to determine
> the length/width of the multibyte-character.  That's afaict the case for
> all server-side encodings.


So your idea is just storing cmd_str into st_activity, which might be
clipped in the middle of multibyte character. And the reader of the
string will call pg_mbclipen() when it needs to read the string. Yes,
I think it works except for gb18030 by the reason you said.

However, if we could have variants of mblen functions with additional
parameter: input string length, then we could remove this inconstancy.
I don't know if this is overkill or not, though.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More efficient truncation of pg_stat_activity query strings

Reply via email to