Hi,

On large scripts, pgbench happens to consume a lot of CPU time.
For instance, with a script consisting of 50000 "SELECT 1;"
I see "pgbench -f 50k-select.sql" taking about 5.8 secs of CPU time,
out of a total time of 6.7 secs. When run with perf, this profile shows up:

  81,10%  pgbench  pgbench           [.] expr_scanner_get_lineno
   0,36%  pgbench  [unknown]         [k] 0xffffffffac90410b
   0,33%  pgbench  [unknown]         [k] 0xffffffffac904109
   0,23%  pgbench  libpq.so.5.18     [.] pqParseInput3
   0,21%  pgbench  [unknown]         [k] 0xffffffffac800080
   0,17%  pgbench  pgbench           [.] advanceConnectionState
   0,15%  pgbench  [unknown]         [k] 0xffffffffac904104
   0,14%  pgbench  libpq.so.5.18     [.] PQmakeEmptyPGresult

In ParseScript(), expr_scanner_get_lineno() is called for each line
with its current offset, and it scans the script from the beginning
up to the current line. I think that on the whole, parsing this script
ends up looking at (N*(N+1))/2 lines, which is 1.275 billion lines
if N=50000.
Since it only need the current line number in case of certain errors
with \gset, I've made the trivial attached fix calling
expr_scanner_get_lineno() only in these cases.
This moves the CPU consumption to a more reasonable 0.1s in the
above test case (with the drawback of having the line number pointing
one line after).

However, there is another caller, process_backslash_command()
which is not amenable to the same kind of easy fix. A large script
having backslash commands near the end is also penalized, and it
would be nice to fix that as well.

I wonder whether pgbench should materialize the current line number
in a variable, as psql does in pset.lineno. But given that there are
two different parsers in pgbench, maybe it's not the simplest.
Flex has yylineno but neither pgbench nor psql make use of it.
I thought I would ask before going further into this, as there might
be a better method that I don't see, being unfamiliar with that code
and with flex/bison.
WDYT?


Best regards,
-- 
Daniel Vérité 
https://postgresql.verite.pro/
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 5e1fcf59c61..c3617acae2a 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -5952,7 +5952,6 @@ ParseScript(const char *script, const char *desc, int 
weight)
        PQExpBufferData line_buf;
        int                     alloc_num;
        int                     index;
-       int                     lineno;
        int                     start_offset;
 
 #define COMMANDS_ALLOC_NUM 128
@@ -5990,7 +5989,6 @@ ParseScript(const char *script, const char *desc, int 
weight)
                Command    *command = NULL;
 
                resetPQExpBuffer(&line_buf);
-               lineno = expr_scanner_get_lineno(sstate, start_offset);
 
                sr = psql_scan(sstate, &line_buf, &prompt);
 
@@ -6017,7 +6015,9 @@ ParseScript(const char *script, const char *desc, int 
weight)
                                        Command    *cmd;
 
                                        if (index == 0)
-                                               syntax_error(desc, lineno, 
NULL, NULL,
+                                               syntax_error(desc,
+                                                                        
expr_scanner_get_lineno(sstate, start_offset),
+                                                                        NULL, 
NULL,
                                                                         
"\\gset must follow an SQL command",
                                                                         NULL, 
-1);
 
@@ -6025,7 +6025,9 @@ ParseScript(const char *script, const char *desc, int 
weight)
 
                                        if (cmd->type != SQL_COMMAND ||
                                                cmd->varprefix != NULL)
-                                               syntax_error(desc, lineno, 
NULL, NULL,
+                                               syntax_error(desc,
+                                                                        
expr_scanner_get_lineno(sstate, start_offset),
+                                                                        NULL, 
NULL,
                                                                         
"\\gset must follow an SQL command",
                                                                         
cmd->first_line, -1);
 

Reply via email to