Re: [HACKERS] More efficient truncation of pg_stat_activity query strings
Hi, On 2017-09-15 17:43:35 +0530, Kuntal Ghosh wrote: > The patch looks good to me. I've done some regression testing with a > custom script on my local system. The script contains the following > statement: > SELECT 'aaa..' as col; > > Test 1 > --- > duration: 300 seconds > clients/threads: 1 > With the patch TPS:13628 (+3.39%) > +0.36% 0.21% postgres postgres [.] pgstat_report_activity > > Test 2 > --- > duration: 300 seconds > clients/threads: 8 > With the patch TPS: 63949 (+20.4%) > +0.40% 0.25% postgres postgres [.] pgstat_report_activity > > This shows the significance of this patch in terms of performance > improvement of pgstat_report_activity. Is there any other tests I > should do for the same? Thanks for the test! I think this looks good, no further tests necessary. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] More efficient truncation of pg_stat_activity query strings
On Fri, Sep 15, 2017 at 4:49 PM, Andres Freundwrote: > On 2017-09-15 16:45:47 -0400, Tom Lane wrote: >> Andres Freund writes: >> > Version correcting these is attached. Thanks! >> >> I'd vote for undoing the s/st_activity/st_activity_raw/g business. >> That may have been useful while writing the patch, to ensure you >> found all the references; but it's just creating a lot of unnecessary >> delta in the final code, with the attendant hazards for back-patches. > > I was wondering about that too (see [1]). My only concern is that some > extensions out there might be accessing the string expecting it to be > properly truncated. The rename at least forces them to look for the new > name... I'm slightly in favor of keeping the rename, it doesn't seem > likely to cause a lot of backpatch pain. I tend to agree with you, but it's not a huge deal either way. Even if somebody fails to update third-party code that touches this, a lot of times it'll work anyway. But that very fact is, of course, why it would be slightly better to break it explicitly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] More efficient truncation of pg_stat_activity query strings
On 2017-09-15 16:45:47 -0400, Tom Lane wrote: > Andres Freundwrites: > > Version correcting these is attached. Thanks! > > I'd vote for undoing the s/st_activity/st_activity_raw/g business. > That may have been useful while writing the patch, to ensure you > found all the references; but it's just creating a lot of unnecessary > delta in the final code, with the attendant hazards for back-patches. I was wondering about that too (see [1]). My only concern is that some extensions out there might be accessing the string expecting it to be properly truncated. The rename at least forces them to look for the new name... I'm slightly in favor of keeping the rename, it doesn't seem likely to cause a lot of backpatch pain. Regards, Andres Freund [1] http://archives.postgresql.org/message-id/20170914060329.j7lt7oyyzquxiut6%40alap3.anarazel.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] More efficient truncation of pg_stat_activity query strings
Andres Freundwrites: > Version correcting these is attached. Thanks! I'd vote for undoing the s/st_activity/st_activity_raw/g business. That may have been useful while writing the patch, to ensure you found all the references; but it's just creating a lot of unnecessary delta in the final code, with the attendant hazards for back-patches. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] More efficient truncation of pg_stat_activity query strings
On 2017-09-15 08:25:09 -0400, Robert Haas wrote: > On Thu, Sep 14, 2017 at 2:03 AM, Andres Freundwrote: > > Here's a patch that implements that idea. > > From the department of cosmetic nitpicking: > > + * All supported server-encodings allow to determine the length of a > > make it possible to determine > > + * multi-byte character from it's first byte (not the case for client > > its > > this is not the case > > + * encodings, see GB18030). As st_activity always is stored using server > > is always stored using a server > > + * pgstat_clip_activity() to trunctae correctly. Version correcting these is attached. Thanks! Greetings, Andres Freund >From a2325a2649da403dc562b8e93df972839823d924 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Wed, 13 Sep 2017 19:25:34 -0700 Subject: [PATCH] Speedup pgstat_report_activity by moving mb-aware truncation to read side. Previously multi-byte aware truncation was done on every pgstat_report_activity() call - proving to be a bottleneck for workloads with long query strings that execute quickly. Instead move the truncation to the read side, which is commonly executed far less frequently. That's possible because all server encodings allow to determine the length of a multi-byte string from the first byte. Author: Andres Freund Discussion: https://postgr.es/m/20170912071948.pa7igbpkkkvie...@alap3.anarazel.de --- src/backend/postmaster/pgstat.c | 63 - src/backend/utils/adt/pgstatfuncs.c | 17 +++--- src/include/pgstat.h| 12 +-- 3 files changed, 72 insertions(+), 20 deletions(-) diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index accf302cf7..1ffdac5448 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -2701,7 +2701,7 @@ CreateSharedBackendStatus(void) buffer = BackendActivityBuffer; for (i = 0; i < NumBackendStatSlots; i++) { - BackendStatusArray[i].st_activity = buffer; + BackendStatusArray[i].st_activity_raw = buffer; buffer += pgstat_track_activity_query_size; } } @@ -2922,11 +2922,11 @@ pgstat_bestart(void) #endif beentry->st_state = STATE_UNDEFINED; beentry->st_appname[0] = '\0'; - beentry->st_activity[0] = '\0'; + beentry->st_activity_raw[0] = '\0'; /* Also make sure the last byte in each string area is always 0 */ beentry->st_clienthostname[NAMEDATALEN - 1] = '\0'; beentry->st_appname[NAMEDATALEN - 1] = '\0'; - beentry->st_activity[pgstat_track_activity_query_size - 1] = '\0'; + beentry->st_activity_raw[pgstat_track_activity_query_size - 1] = '\0'; beentry->st_progress_command = PROGRESS_COMMAND_INVALID; beentry->st_progress_command_target = InvalidOid; @@ -3017,7 +3017,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str) pgstat_increment_changecount_before(beentry); beentry->st_state = STATE_DISABLED; beentry->st_state_start_timestamp = 0; - beentry->st_activity[0] = '\0'; + beentry->st_activity_raw[0] = '\0'; beentry->st_activity_start_timestamp = 0; /* st_xact_start_timestamp and wait_event_info are also disabled */ beentry->st_xact_start_timestamp = 0; @@ -3034,8 +3034,12 @@ pgstat_report_activity(BackendState state, const char *cmd_str) start_timestamp = GetCurrentStatementStartTimestamp(); if (cmd_str != NULL) { - len = pg_mbcliplen(cmd_str, strlen(cmd_str), - pgstat_track_activity_query_size - 1); + /* + * Compute length of to-be-stored string unaware of multi-byte + * characters. For speed reasons that'll get corrected on read, rather + * than computed every write. + */ + len = Min(strlen(cmd_str), pgstat_track_activity_query_size - 1); } current_timestamp = GetCurrentTimestamp(); @@ -3049,8 +3053,8 @@ pgstat_report_activity(BackendState state, const char *cmd_str) if (cmd_str != NULL) { - memcpy((char *) beentry->st_activity, cmd_str, len); - beentry->st_activity[len] = '\0'; + memcpy((char *) beentry->st_activity_raw, cmd_str, len); + beentry->st_activity_raw[len] = '\0'; beentry->st_activity_start_timestamp = start_timestamp; } @@ -3278,8 +3282,8 @@ pgstat_read_current_status(void) */ strcpy(localappname, (char *) beentry->st_appname); localentry->backendStatus.st_appname = localappname; -strcpy(localactivity, (char *) beentry->st_activity); -localentry->backendStatus.st_activity = localactivity; +strcpy(localactivity, (char *) beentry->st_activity_raw); +localentry->backendStatus.st_activity_raw = localactivity; localentry->backendStatus.st_ssl = beentry->st_ssl; #ifdef USE_SSL if (beentry->st_ssl) @@ -3945,10 +3949,13 @@ pgstat_get_backend_current_activity(int pid, bool checkUser) /* Now it is safe to use the non-volatile pointer */ if (checkUser && !superuser() && beentry->st_userid != GetUserId()) return ""; - else if (*(beentry->st_activity) == '\0') +
Re: [HACKERS] More efficient truncation of pg_stat_activity query strings
On Thu, Sep 14, 2017 at 2:03 AM, Andres Freundwrote: > Here's a patch that implements that idea. >From the department of cosmetic nitpicking: + * All supported server-encodings allow to determine the length of a make it possible to determine + * multi-byte character from it's first byte (not the case for client its this is not the case + * encodings, see GB18030). As st_activity always is stored using server is always stored using a server + * pgstat_clip_activity() to trunctae correctly. truncate -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] More efficient truncation of pg_stat_activity query strings
On Thu, Sep 14, 2017 at 11:33 AM, Andres Freundwrote: > On 2017-09-12 00:19:48 -0700, Andres Freund wrote: >> Hi, >> >> I've recently seen a benchmark in which pg_mbcliplen() showed up >> prominently. Which it will basically in any benchmark with longer query >> strings, but fast queries. That's not that uncommon. >> >> I wonder if we could avoid the cost of pg_mbcliplen() from within >> pgstat_report_activity(), by moving some of the cost to the read >> side. pgstat values are obviously read far less frequently in nearly all >> cases that are performance relevant. >> >> Therefore I wonder if we couldn't just store a querystring that's >> essentially just a memcpy()ed prefix, and do a pg_mbcliplen() on the >> read side. I think that should work because all *server side* encodings >> store character lengths in the *first* byte of a multibyte character >> (at least one clientside encoding, gb18030, doesn't behave that way). >> >> That'd necessitate an added memory copy in pg_stat_get_activity(), but >> that seems fairly harmless. >> >> Faults in my thinking? > > Here's a patch that implements that idea. Seems to work well. I'm a > bit loathe to add proper regression tests for this, seems awfully > dependent on specific track_activity_query_size settings. I did confirm > using gdb that I see incomplete characters before > pgstat_clip_activity(), but not after. > > I've renamed st_activity to st_activity_raw to increase the likelihood > that potential external users of st_activity notice and adapt. Increases > the noise, but imo to a very bareable amount. Don't feel strongly > though. > Hello, The patch looks good to me. I've done some regression testing with a custom script on my local system. The script contains the following statement: SELECT 'aaa..' as col; Test 1 --- duration: 300 seconds clients/threads: 1 On HEAD TPS: 13181 +9.30% 0.27% postgres postgres [.] pgstat_report_activity +8.88% 0.02% postgres postgres [.] pg_mbcliplen +7.76% 4.77% postgres postgres [.] pg_encoding_mbcliplen +4.06% 4.06% postgres postgres [.] pg_utf_mblen With the patch TPS:13628 (+3.39%) +0.36% 0.21% postgres postgres [.] pgstat_report_activity Test 2 --- duration: 300 seconds clients/threads: 8 On HEAD TPS: 53084 + 12.17% 0.20% postgres postgres [.] pgstat_report_activity + 11.83% 0.02% postgres postgres [.] pg_mbcliplen + 11.19% 8.03% postgres postgres [.] pg_encoding_mbcliplen +3.74% 3.73% postgres postgres [.] pg_utf_mblen With the patch TPS: 63949 (+20.4%) +0.40% 0.25% postgres postgres [.] pgstat_report_activity This shows the significance of this patch in terms of performance improvement of pgstat_report_activity. Is there any other tests I should do for the same? -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] More efficient truncation of pg_stat_activity query strings
On 2017-09-12 00:19:48 -0700, Andres Freund wrote: > Hi, > > I've recently seen a benchmark in which pg_mbcliplen() showed up > prominently. Which it will basically in any benchmark with longer query > strings, but fast queries. That's not that uncommon. > > I wonder if we could avoid the cost of pg_mbcliplen() from within > pgstat_report_activity(), by moving some of the cost to the read > side. pgstat values are obviously read far less frequently in nearly all > cases that are performance relevant. > > Therefore I wonder if we couldn't just store a querystring that's > essentially just a memcpy()ed prefix, and do a pg_mbcliplen() on the > read side. I think that should work because all *server side* encodings > store character lengths in the *first* byte of a multibyte character > (at least one clientside encoding, gb18030, doesn't behave that way). > > That'd necessitate an added memory copy in pg_stat_get_activity(), but > that seems fairly harmless. > > Faults in my thinking? Here's a patch that implements that idea. Seems to work well. I'm a bit loathe to add proper regression tests for this, seems awfully dependent on specific track_activity_query_size settings. I did confirm using gdb that I see incomplete characters before pgstat_clip_activity(), but not after. I've renamed st_activity to st_activity_raw to increase the likelihood that potential external users of st_activity notice and adapt. Increases the noise, but imo to a very bareable amount. Don't feel strongly though. Greetings, Andres Freund >From 9c9503f0dfe1babb21e81c1955e996ad06c93608 Mon Sep 17 00:00:00 2001 From: Andres FreundDate: Wed, 13 Sep 2017 19:25:34 -0700 Subject: [PATCH 1/8] Speedup pgstat_report_activity by moving mb-aware truncation to read side. Previously multi-byte aware truncation was done on every pgstat_report_activity() call - proving to be a bottleneck for workloads with long query strings that execute quickly. Instead move the truncation to the read side, which is commonly executed far less frequently. That's possible because all server encodings allow to determine the length of a multi-byte string from the first byte. Author: Andres Freund Discussion: https://postgr.es/m/20170912071948.pa7igbpkkkvie...@alap3.anarazel.de --- src/backend/postmaster/pgstat.c | 63 - src/backend/utils/adt/pgstatfuncs.c | 17 +++--- src/include/pgstat.h| 12 +-- 3 files changed, 72 insertions(+), 20 deletions(-) diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index accf302cf7..ccb528e627 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -2701,7 +2701,7 @@ CreateSharedBackendStatus(void) buffer = BackendActivityBuffer; for (i = 0; i < NumBackendStatSlots; i++) { - BackendStatusArray[i].st_activity = buffer; + BackendStatusArray[i].st_activity_raw = buffer; buffer += pgstat_track_activity_query_size; } } @@ -2922,11 +2922,11 @@ pgstat_bestart(void) #endif beentry->st_state = STATE_UNDEFINED; beentry->st_appname[0] = '\0'; - beentry->st_activity[0] = '\0'; + beentry->st_activity_raw[0] = '\0'; /* Also make sure the last byte in each string area is always 0 */ beentry->st_clienthostname[NAMEDATALEN - 1] = '\0'; beentry->st_appname[NAMEDATALEN - 1] = '\0'; - beentry->st_activity[pgstat_track_activity_query_size - 1] = '\0'; + beentry->st_activity_raw[pgstat_track_activity_query_size - 1] = '\0'; beentry->st_progress_command = PROGRESS_COMMAND_INVALID; beentry->st_progress_command_target = InvalidOid; @@ -3017,7 +3017,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str) pgstat_increment_changecount_before(beentry); beentry->st_state = STATE_DISABLED; beentry->st_state_start_timestamp = 0; - beentry->st_activity[0] = '\0'; + beentry->st_activity_raw[0] = '\0'; beentry->st_activity_start_timestamp = 0; /* st_xact_start_timestamp and wait_event_info are also disabled */ beentry->st_xact_start_timestamp = 0; @@ -3034,8 +3034,12 @@ pgstat_report_activity(BackendState state, const char *cmd_str) start_timestamp = GetCurrentStatementStartTimestamp(); if (cmd_str != NULL) { - len = pg_mbcliplen(cmd_str, strlen(cmd_str), - pgstat_track_activity_query_size - 1); + /* + * Compute length of to-be-stored string unaware of multi-byte + * characters. For speed reasons that'll get corrected on read, rather + * than computed every write. + */ + len = Min(strlen(cmd_str), pgstat_track_activity_query_size - 1); } current_timestamp = GetCurrentTimestamp(); @@ -3049,8 +3053,8 @@ pgstat_report_activity(BackendState state, const char *cmd_str) if (cmd_str != NULL) { - memcpy((char *) beentry->st_activity, cmd_str, len); - beentry->st_activity[len] = '\0'; + memcpy((char *) beentry->st_activity_raw, cmd_str, len); + beentry->st_activity_raw[len] = '\0';
Re: [HACKERS] More efficient truncation of pg_stat_activity query strings
Robert Haaswrites: > On Tue, Sep 12, 2017 at 3:19 AM, Andres Freund wrote: >> Therefore I wonder if we couldn't just store a querystring that's >> essentially just a memcpy()ed prefix, and do a pg_mbcliplen() on the >> read side. > Interesting idea. I was (ha, ha, what a coincidence) also thinking > about this problem and was wondering if we couldn't be a lot smarter > about pg_mbcliplen(). ... > for UTF-8, we could do that much more directly: if the string is short > enough, stop, else, look at cmd_str[pgstat_track_activity_query_size]. > If that character has (c & 0xc0) != 0x80, write a '\0' and stop; else, > back up until you find a character that for which that continuation > holds, write a '\0', and stop. This kind of approach only works if we > have a definitive test for whether something is a "continuation > character" which probably isn't true in all encodings, but maybe it's > still worth considering. Offhand I think it doesn't work in anything but UTF8. A way that would work in all encodings is to back up to the first non-high-bit-set byte and then mbcliplen from there. Probably, there's enough ASCII in typical SQL commands that this would often be a win. > Your idea is probably a lot simpler to implement, though, and I > definitely agree that shifting the work from the write side to the > read side makes sense. Updating pg_stat_activity is a lot more common > than reading it. +1. I kinda doubt that it is worth optimizing further than that, really. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] More efficient truncation of pg_stat_activity query strings
On Tue, Sep 12, 2017 at 3:19 AM, Andres Freundwrote: > Therefore I wonder if we couldn't just store a querystring that's > essentially just a memcpy()ed prefix, and do a pg_mbcliplen() on the > read side. I think that should work because all *server side* encodings > store character lengths in the *first* byte of a multibyte character > (at least one clientside encoding, gb18030, doesn't behave that way). > > That'd necessitate an added memory copy in pg_stat_get_activity(), but > that seems fairly harmless. Interesting idea. I was (ha, ha, what a coincidence) also thinking about this problem and was wondering if we couldn't be a lot smarter about pg_mbcliplen(). I mean, pg_mbcliplen is basically just being used here to trim away any partial character that would have to get chopped off to fit within the length limit. But right now it's scanning the whole string to do this, which is unnecessary. At least for UTF-8, we could do that much more directly: if the string is short enough, stop, else, look at cmd_str[pgstat_track_activity_query_size]. If that character has (c & 0xc0) != 0x80, write a '\0' and stop; else, back up until you find a character that for which that continuation holds, write a '\0', and stop. This kind of approach only works if we have a definitive test for whether something is a "continuation character" which probably isn't true in all encodings, but maybe it's still worth considering. Your idea is probably a lot simpler to implement, though, and I definitely agree that shifting the work from the write side to the read side makes sense. Updating pg_stat_activity is a lot more common than reading it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] More efficient truncation of pg_stat_activity query strings
> Check the information the pg_*_mblen use / how the relevant encodings > work. Will be something like > int > pg_utf_mblen(const unsigned char *s) > { > int len; > > if ((*s & 0x80) == 0) > len = 1; > else if ((*s & 0xe0) == 0xc0) > len = 2; > else if ((*s & 0xf0) == 0xe0) > len = 3; > else if ((*s & 0xf8) == 0xf0) > len = 4; > #ifdef NOT_USED > else if ((*s & 0xfc) == 0xf8) > len = 5; > else if ((*s & 0xfe) == 0xfc) > len = 6; > #endif > else > len = 1; > return len; > } > > As you can see, only the first character (*s) is accessed to determine > the length/width of the multibyte-character. That's afaict the case for > all server-side encodings. So your idea is just storing cmd_str into st_activity, which might be clipped in the middle of multibyte character. And the reader of the string will call pg_mbclipen() when it needs to read the string. Yes, I think it works except for gb18030 by the reason you said. However, if we could have variants of mblen functions with additional parameter: input string length, then we could remove this inconstancy. I don't know if this is overkill or not, though. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] More efficient truncation of pg_stat_activity query strings
On 2017-09-12 16:53:49 +0900, Tatsuo Ishii wrote: > > read side. I think that should work because all *server side* encodings > > store character lengths in the *first* byte of a multibyte character > > What do you mean? I don't see such data in a multibyte string. Check the information the pg_*_mblen use / how the relevant encodings work. Will be something like int pg_utf_mblen(const unsigned char *s) { int len; if ((*s & 0x80) == 0) len = 1; else if ((*s & 0xe0) == 0xc0) len = 2; else if ((*s & 0xf0) == 0xe0) len = 3; else if ((*s & 0xf8) == 0xf0) len = 4; #ifdef NOT_USED else if ((*s & 0xfc) == 0xf8) len = 5; else if ((*s & 0xfe) == 0xfc) len = 6; #endif else len = 1; return len; } As you can see, only the first character (*s) is accessed to determine the length/width of the multibyte-character. That's afaict the case for all server-side encodings. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] More efficient truncation of pg_stat_activity query strings
> read side. I think that should work because all *server side* encodings > store character lengths in the *first* byte of a multibyte character What do you mean? I don't see such data in a multibyte string. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers