Re: [HACKERS] More efficient truncation of pg_stat_activity query strings

2017-09-19 Thread Andres Freund
Hi,

On 2017-09-15 17:43:35 +0530, Kuntal Ghosh wrote:
> The patch looks good to me. I've done some regression testing with a
> custom script on my local system. The script contains the following
> statement:

> SELECT 'aaa..' as col;
> 
> Test 1
> ---
> duration: 300 seconds
> clients/threads: 1

> With the patch TPS:13628 (+3.39%)
> +0.36% 0.21%  postgres  postgres  [.] pgstat_report_activity
> 
> Test 2
> ---
> duration: 300 seconds
> clients/threads: 8

> With the patch TPS: 63949 (+20.4%)
> +0.40% 0.25%  postgres  postgres  [.] pgstat_report_activity
> 
> This shows the significance of this patch in terms of performance
> improvement of pgstat_report_activity. Is there any other tests I
> should do for the same?

Thanks for the test! I think this looks good, no further tests
necessary.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More efficient truncation of pg_stat_activity query strings

2017-09-15 Thread Robert Haas
On Fri, Sep 15, 2017 at 4:49 PM, Andres Freund  wrote:
> On 2017-09-15 16:45:47 -0400, Tom Lane wrote:
>> Andres Freund  writes:
>> > Version correcting these is attached. Thanks!
>>
>> I'd vote for undoing the s/st_activity/st_activity_raw/g business.
>> That may have been useful while writing the patch, to ensure you
>> found all the references; but it's just creating a lot of unnecessary
>> delta in the final code, with the attendant hazards for back-patches.
>
> I was wondering about that too (see [1]). My only concern is that some
> extensions out there might be accessing the string expecting it to be
> properly truncated. The rename at least forces them to look for the new
> name...  I'm slightly in favor of keeping the rename, it doesn't seem
> likely to cause a lot of backpatch pain.

I tend to agree with you, but it's not a huge deal either way.  Even
if somebody fails to update third-party code that touches this, a lot
of times it'll work anyway.  But that very fact is, of course, why it
would be slightly better to break it explicitly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More efficient truncation of pg_stat_activity query strings

2017-09-15 Thread Andres Freund
On 2017-09-15 16:45:47 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Version correcting these is attached. Thanks!
> 
> I'd vote for undoing the s/st_activity/st_activity_raw/g business.
> That may have been useful while writing the patch, to ensure you
> found all the references; but it's just creating a lot of unnecessary
> delta in the final code, with the attendant hazards for back-patches.

I was wondering about that too (see [1]). My only concern is that some
extensions out there might be accessing the string expecting it to be
properly truncated. The rename at least forces them to look for the new
name...  I'm slightly in favor of keeping the rename, it doesn't seem
likely to cause a lot of backpatch pain.

Regards,

Andres Freund


[1] 
http://archives.postgresql.org/message-id/20170914060329.j7lt7oyyzquxiut6%40alap3.anarazel.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More efficient truncation of pg_stat_activity query strings

2017-09-15 Thread Tom Lane
Andres Freund  writes:
> Version correcting these is attached. Thanks!

I'd vote for undoing the s/st_activity/st_activity_raw/g business.
That may have been useful while writing the patch, to ensure you
found all the references; but it's just creating a lot of unnecessary
delta in the final code, with the attendant hazards for back-patches.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More efficient truncation of pg_stat_activity query strings

2017-09-15 Thread Andres Freund
On 2017-09-15 08:25:09 -0400, Robert Haas wrote:
> On Thu, Sep 14, 2017 at 2:03 AM, Andres Freund  wrote:
> > Here's a patch that implements that idea.
> 
> From the department of cosmetic nitpicking:
> 
> + * All supported server-encodings allow to determine the length of a
> 
> make it possible to determine
> 
> + * multi-byte character from it's first byte (not the case for client
> 
> its
> 
> this is not the case
> 
> + * encodings, see GB18030). As st_activity always is stored using server
> 
> is always stored using a server
> 
> + * pgstat_clip_activity() to trunctae correctly.

Version correcting these is attached. Thanks!

Greetings,

Andres Freund
>From a2325a2649da403dc562b8e93df972839823d924 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 13 Sep 2017 19:25:34 -0700
Subject: [PATCH] Speedup pgstat_report_activity by moving mb-aware truncation
 to read side.

Previously multi-byte aware truncation was done on every
pgstat_report_activity() call - proving to be a bottleneck for
workloads with long query strings that execute quickly.

Instead move the truncation to the read side, which is commonly
executed far less frequently. That's possible because all server
encodings allow to determine the length of a multi-byte string from
the first byte.

Author: Andres Freund
Discussion: https://postgr.es/m/20170912071948.pa7igbpkkkvie...@alap3.anarazel.de
---
 src/backend/postmaster/pgstat.c | 63 -
 src/backend/utils/adt/pgstatfuncs.c | 17 +++---
 src/include/pgstat.h| 12 +--
 3 files changed, 72 insertions(+), 20 deletions(-)

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index accf302cf7..1ffdac5448 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -2701,7 +2701,7 @@ CreateSharedBackendStatus(void)
 		buffer = BackendActivityBuffer;
 		for (i = 0; i < NumBackendStatSlots; i++)
 		{
-			BackendStatusArray[i].st_activity = buffer;
+			BackendStatusArray[i].st_activity_raw = buffer;
 			buffer += pgstat_track_activity_query_size;
 		}
 	}
@@ -2922,11 +2922,11 @@ pgstat_bestart(void)
 #endif
 	beentry->st_state = STATE_UNDEFINED;
 	beentry->st_appname[0] = '\0';
-	beentry->st_activity[0] = '\0';
+	beentry->st_activity_raw[0] = '\0';
 	/* Also make sure the last byte in each string area is always 0 */
 	beentry->st_clienthostname[NAMEDATALEN - 1] = '\0';
 	beentry->st_appname[NAMEDATALEN - 1] = '\0';
-	beentry->st_activity[pgstat_track_activity_query_size - 1] = '\0';
+	beentry->st_activity_raw[pgstat_track_activity_query_size - 1] = '\0';
 	beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
 	beentry->st_progress_command_target = InvalidOid;
 
@@ -3017,7 +3017,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 			pgstat_increment_changecount_before(beentry);
 			beentry->st_state = STATE_DISABLED;
 			beentry->st_state_start_timestamp = 0;
-			beentry->st_activity[0] = '\0';
+			beentry->st_activity_raw[0] = '\0';
 			beentry->st_activity_start_timestamp = 0;
 			/* st_xact_start_timestamp and wait_event_info are also disabled */
 			beentry->st_xact_start_timestamp = 0;
@@ -3034,8 +3034,12 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	start_timestamp = GetCurrentStatementStartTimestamp();
 	if (cmd_str != NULL)
 	{
-		len = pg_mbcliplen(cmd_str, strlen(cmd_str),
-		   pgstat_track_activity_query_size - 1);
+		/*
+		 * Compute length of to-be-stored string unaware of multi-byte
+		 * characters. For speed reasons that'll get corrected on read, rather
+		 * than computed every write.
+		 */
+		len = Min(strlen(cmd_str), pgstat_track_activity_query_size - 1);
 	}
 	current_timestamp = GetCurrentTimestamp();
 
@@ -3049,8 +3053,8 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 
 	if (cmd_str != NULL)
 	{
-		memcpy((char *) beentry->st_activity, cmd_str, len);
-		beentry->st_activity[len] = '\0';
+		memcpy((char *) beentry->st_activity_raw, cmd_str, len);
+		beentry->st_activity_raw[len] = '\0';
 		beentry->st_activity_start_timestamp = start_timestamp;
 	}
 
@@ -3278,8 +3282,8 @@ pgstat_read_current_status(void)
  */
 strcpy(localappname, (char *) beentry->st_appname);
 localentry->backendStatus.st_appname = localappname;
-strcpy(localactivity, (char *) beentry->st_activity);
-localentry->backendStatus.st_activity = localactivity;
+strcpy(localactivity, (char *) beentry->st_activity_raw);
+localentry->backendStatus.st_activity_raw = localactivity;
 localentry->backendStatus.st_ssl = beentry->st_ssl;
 #ifdef USE_SSL
 if (beentry->st_ssl)
@@ -3945,10 +3949,13 @@ pgstat_get_backend_current_activity(int pid, bool checkUser)
 			/* Now it is safe to use the non-volatile pointer */
 			if (checkUser && !superuser() && beentry->st_userid != GetUserId())
 return "";
-			else if (*(beentry->st_activity) == '\0')
+			

Re: [HACKERS] More efficient truncation of pg_stat_activity query strings

2017-09-15 Thread Robert Haas
On Thu, Sep 14, 2017 at 2:03 AM, Andres Freund  wrote:
> Here's a patch that implements that idea.

>From the department of cosmetic nitpicking:

+ * All supported server-encodings allow to determine the length of a

make it possible to determine

+ * multi-byte character from it's first byte (not the case for client

its

this is not the case

+ * encodings, see GB18030). As st_activity always is stored using server

is always stored using a server

+ * pgstat_clip_activity() to trunctae correctly.

truncate

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More efficient truncation of pg_stat_activity query strings

2017-09-15 Thread Kuntal Ghosh
On Thu, Sep 14, 2017 at 11:33 AM, Andres Freund  wrote:
> On 2017-09-12 00:19:48 -0700, Andres Freund wrote:
>> Hi,
>>
>> I've recently seen a benchmark in which pg_mbcliplen() showed up
>> prominently. Which it will basically in any benchmark with longer query
>> strings, but fast queries. That's not that uncommon.
>>
>> I wonder if we could avoid the cost of pg_mbcliplen() from within
>> pgstat_report_activity(), by moving some of the cost to the read
>> side. pgstat values are obviously read far less frequently in nearly all
>> cases that are performance relevant.
>>
>> Therefore I wonder if we couldn't just store a querystring that's
>> essentially just a memcpy()ed prefix, and do a pg_mbcliplen() on the
>> read side.  I think that should work because all *server side* encodings
>> store character lengths in the *first* byte of a multibyte character
>> (at least one clientside encoding, gb18030, doesn't behave that way).
>>
>> That'd necessitate an added memory copy in pg_stat_get_activity(), but
>> that seems fairly harmless.
>>
>> Faults in my thinking?
>
> Here's a patch that implements that idea.  Seems to work well.  I'm a
> bit loathe to add proper regression tests for this, seems awfully
> dependent on specific track_activity_query_size settings.  I did confirm
> using gdb that I see incomplete characters before
> pgstat_clip_activity(), but not after.
>
> I've renamed st_activity to st_activity_raw to increase the likelihood
> that potential external users of st_activity notice and adapt. Increases
> the noise, but imo to a very bareable amount. Don't feel strongly
> though.
>
Hello,

The patch looks good to me. I've done some regression testing with a
custom script on my local system. The script contains the following
statement:
SELECT 'aaa..' as col;

Test 1
---
duration: 300 seconds
clients/threads: 1

On HEAD TPS: 13181
+9.30% 0.27%  postgres  postgres [.] pgstat_report_activity
+8.88% 0.02%  postgres  postgres [.] pg_mbcliplen
+7.76% 4.77%  postgres  postgres [.] pg_encoding_mbcliplen
+4.06% 4.06%  postgres  postgres [.] pg_utf_mblen

With the patch TPS:13628 (+3.39%)
+0.36% 0.21%  postgres  postgres  [.] pgstat_report_activity

Test 2
---
duration: 300 seconds
clients/threads: 8

On HEAD TPS: 53084
+   12.17% 0.20%  postgres  postgres [.]
pgstat_report_activity
+   11.83% 0.02%  postgres  postgres [.] pg_mbcliplen
+   11.19% 8.03%  postgres  postgres [.] pg_encoding_mbcliplen
+3.74% 3.73%  postgres  postgres [.] pg_utf_mblen

With the patch TPS: 63949 (+20.4%)
+0.40% 0.25%  postgres  postgres  [.] pgstat_report_activity

This shows the significance of this patch in terms of performance
improvement of pgstat_report_activity. Is there any other tests I
should do for the same?


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More efficient truncation of pg_stat_activity query strings

2017-09-14 Thread Andres Freund
On 2017-09-12 00:19:48 -0700, Andres Freund wrote:
> Hi,
> 
> I've recently seen a benchmark in which pg_mbcliplen() showed up
> prominently. Which it will basically in any benchmark with longer query
> strings, but fast queries. That's not that uncommon.
> 
> I wonder if we could avoid the cost of pg_mbcliplen() from within
> pgstat_report_activity(), by moving some of the cost to the read
> side. pgstat values are obviously read far less frequently in nearly all
> cases that are performance relevant.
> 
> Therefore I wonder if we couldn't just store a querystring that's
> essentially just a memcpy()ed prefix, and do a pg_mbcliplen() on the
> read side.  I think that should work because all *server side* encodings
> store character lengths in the *first* byte of a multibyte character
> (at least one clientside encoding, gb18030, doesn't behave that way).
> 
> That'd necessitate an added memory copy in pg_stat_get_activity(), but
> that seems fairly harmless.
> 
> Faults in my thinking?

Here's a patch that implements that idea.  Seems to work well.  I'm a
bit loathe to add proper regression tests for this, seems awfully
dependent on specific track_activity_query_size settings.  I did confirm
using gdb that I see incomplete characters before
pgstat_clip_activity(), but not after.

I've renamed st_activity to st_activity_raw to increase the likelihood
that potential external users of st_activity notice and adapt. Increases
the noise, but imo to a very bareable amount. Don't feel strongly
though.

Greetings,

Andres Freund
>From 9c9503f0dfe1babb21e81c1955e996ad06c93608 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 13 Sep 2017 19:25:34 -0700
Subject: [PATCH 1/8] Speedup pgstat_report_activity by moving mb-aware
 truncation to read side.

Previously multi-byte aware truncation was done on every
pgstat_report_activity() call - proving to be a bottleneck for
workloads with long query strings that execute quickly.

Instead move the truncation to the read side, which is commonly
executed far less frequently. That's possible because all server
encodings allow to determine the length of a multi-byte string from
the first byte.

Author: Andres Freund
Discussion: https://postgr.es/m/20170912071948.pa7igbpkkkvie...@alap3.anarazel.de
---
 src/backend/postmaster/pgstat.c | 63 -
 src/backend/utils/adt/pgstatfuncs.c | 17 +++---
 src/include/pgstat.h| 12 +--
 3 files changed, 72 insertions(+), 20 deletions(-)

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index accf302cf7..ccb528e627 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -2701,7 +2701,7 @@ CreateSharedBackendStatus(void)
 		buffer = BackendActivityBuffer;
 		for (i = 0; i < NumBackendStatSlots; i++)
 		{
-			BackendStatusArray[i].st_activity = buffer;
+			BackendStatusArray[i].st_activity_raw = buffer;
 			buffer += pgstat_track_activity_query_size;
 		}
 	}
@@ -2922,11 +2922,11 @@ pgstat_bestart(void)
 #endif
 	beentry->st_state = STATE_UNDEFINED;
 	beentry->st_appname[0] = '\0';
-	beentry->st_activity[0] = '\0';
+	beentry->st_activity_raw[0] = '\0';
 	/* Also make sure the last byte in each string area is always 0 */
 	beentry->st_clienthostname[NAMEDATALEN - 1] = '\0';
 	beentry->st_appname[NAMEDATALEN - 1] = '\0';
-	beentry->st_activity[pgstat_track_activity_query_size - 1] = '\0';
+	beentry->st_activity_raw[pgstat_track_activity_query_size - 1] = '\0';
 	beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
 	beentry->st_progress_command_target = InvalidOid;
 
@@ -3017,7 +3017,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 			pgstat_increment_changecount_before(beentry);
 			beentry->st_state = STATE_DISABLED;
 			beentry->st_state_start_timestamp = 0;
-			beentry->st_activity[0] = '\0';
+			beentry->st_activity_raw[0] = '\0';
 			beentry->st_activity_start_timestamp = 0;
 			/* st_xact_start_timestamp and wait_event_info are also disabled */
 			beentry->st_xact_start_timestamp = 0;
@@ -3034,8 +3034,12 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	start_timestamp = GetCurrentStatementStartTimestamp();
 	if (cmd_str != NULL)
 	{
-		len = pg_mbcliplen(cmd_str, strlen(cmd_str),
-		   pgstat_track_activity_query_size - 1);
+		/*
+		 * Compute length of to-be-stored string unaware of multi-byte
+		 * characters. For speed reasons that'll get corrected on read, rather
+		 * than computed every write.
+		 */
+		len = Min(strlen(cmd_str), pgstat_track_activity_query_size - 1);
 	}
 	current_timestamp = GetCurrentTimestamp();
 
@@ -3049,8 +3053,8 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 
 	if (cmd_str != NULL)
 	{
-		memcpy((char *) beentry->st_activity, cmd_str, len);
-		beentry->st_activity[len] = '\0';
+		memcpy((char *) beentry->st_activity_raw, cmd_str, len);
+		beentry->st_activity_raw[len] = '\0';
 		

Re: [HACKERS] More efficient truncation of pg_stat_activity query strings

2017-09-12 Thread Tom Lane
Robert Haas  writes:
> On Tue, Sep 12, 2017 at 3:19 AM, Andres Freund  wrote:
>> Therefore I wonder if we couldn't just store a querystring that's
>> essentially just a memcpy()ed prefix, and do a pg_mbcliplen() on the
>> read side.

> Interesting idea.  I was (ha, ha, what a coincidence) also thinking
> about this problem and was wondering if we couldn't be a lot smarter
> about pg_mbcliplen().  ...

> for UTF-8, we could do that much more directly: if the string is short
> enough, stop, else, look at cmd_str[pgstat_track_activity_query_size].
> If that character has (c & 0xc0) != 0x80, write a '\0' and stop; else,
> back up until you find a character that for which that continuation
> holds, write a '\0', and stop.  This kind of approach only works if we
> have a definitive test for whether something is a "continuation
> character" which probably isn't true in all encodings, but maybe it's
> still worth considering.

Offhand I think it doesn't work in anything but UTF8.  A way that would
work in all encodings is to back up to the first non-high-bit-set
byte and then mbcliplen from there.  Probably, there's enough ASCII
in typical SQL commands that this would often be a win.

> Your idea is probably a lot simpler to implement, though, and I
> definitely agree that shifting the work from the write side to the
> read side makes sense.  Updating pg_stat_activity is a lot more common
> than reading it.

+1.  I kinda doubt that it is worth optimizing further than that,
really.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More efficient truncation of pg_stat_activity query strings

2017-09-12 Thread Robert Haas
On Tue, Sep 12, 2017 at 3:19 AM, Andres Freund  wrote:
> Therefore I wonder if we couldn't just store a querystring that's
> essentially just a memcpy()ed prefix, and do a pg_mbcliplen() on the
> read side.  I think that should work because all *server side* encodings
> store character lengths in the *first* byte of a multibyte character
> (at least one clientside encoding, gb18030, doesn't behave that way).
>
> That'd necessitate an added memory copy in pg_stat_get_activity(), but
> that seems fairly harmless.

Interesting idea.  I was (ha, ha, what a coincidence) also thinking
about this problem and was wondering if we couldn't be a lot smarter
about pg_mbcliplen().  I mean, pg_mbcliplen is basically just being
used here to trim away any partial character that would have to get
chopped off to fit within the length limit.  But right now it's
scanning the whole string to do this, which is unnecessary.  At least
for UTF-8, we could do that much more directly: if the string is short
enough, stop, else, look at cmd_str[pgstat_track_activity_query_size].
If that character has (c & 0xc0) != 0x80, write a '\0' and stop; else,
back up until you find a character that for which that continuation
holds, write a '\0', and stop.  This kind of approach only works if we
have a definitive test for whether something is a "continuation
character" which probably isn't true in all encodings, but maybe it's
still worth considering.

Your idea is probably a lot simpler to implement, though, and I
definitely agree that shifting the work from the write side to the
read side makes sense.  Updating pg_stat_activity is a lot more common
than reading it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More efficient truncation of pg_stat_activity query strings

2017-09-12 Thread Tatsuo Ishii
> Check the information the pg_*_mblen use / how the relevant encodings
> work. Will be something like
> int
> pg_utf_mblen(const unsigned char *s)
> {
>   int len;
> 
>   if ((*s & 0x80) == 0)
>   len = 1;
>   else if ((*s & 0xe0) == 0xc0)
>   len = 2;
>   else if ((*s & 0xf0) == 0xe0)
>   len = 3;
>   else if ((*s & 0xf8) == 0xf0)
>   len = 4;
> #ifdef NOT_USED
>   else if ((*s & 0xfc) == 0xf8)
>   len = 5;
>   else if ((*s & 0xfe) == 0xfc)
>   len = 6;
> #endif
>   else
>   len = 1;
>   return len;
> }
> 
> As you can see, only the first character (*s) is accessed to determine
> the length/width of the multibyte-character.  That's afaict the case for
> all server-side encodings.

So your idea is just storing cmd_str into st_activity, which might be
clipped in the middle of multibyte character. And the reader of the
string will call pg_mbclipen() when it needs to read the string. Yes,
I think it works except for gb18030 by the reason you said.

However, if we could have variants of mblen functions with additional
parameter: input string length, then we could remove this inconstancy.
I don't know if this is overkill or not, though.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More efficient truncation of pg_stat_activity query strings

2017-09-12 Thread Andres Freund
On 2017-09-12 16:53:49 +0900, Tatsuo Ishii wrote:
> > read side.  I think that should work because all *server side* encodings
> > store character lengths in the *first* byte of a multibyte character
> 
> What do you mean? I don't see such data in a multibyte string.

Check the information the pg_*_mblen use / how the relevant encodings
work. Will be something like
int
pg_utf_mblen(const unsigned char *s)
{
int len;

if ((*s & 0x80) == 0)
len = 1;
else if ((*s & 0xe0) == 0xc0)
len = 2;
else if ((*s & 0xf0) == 0xe0)
len = 3;
else if ((*s & 0xf8) == 0xf0)
len = 4;
#ifdef NOT_USED
else if ((*s & 0xfc) == 0xf8)
len = 5;
else if ((*s & 0xfe) == 0xfc)
len = 6;
#endif
else
len = 1;
return len;
}

As you can see, only the first character (*s) is accessed to determine
the length/width of the multibyte-character.  That's afaict the case for
all server-side encodings.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More efficient truncation of pg_stat_activity query strings

2017-09-12 Thread Tatsuo Ishii
> read side.  I think that should work because all *server side* encodings
> store character lengths in the *first* byte of a multibyte character

What do you mean? I don't see such data in a multibyte string.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers