Re: [PATCH] pgbench: add multiconnect option

2023-01-31 Thread vignesh C
On Wed, 11 Jan 2023 at 22:17, vignesh C  wrote:
>
> On Tue, 8 Nov 2022 at 02:16, Fabien COELHO  wrote:
> >
> >
> > Hello Ian,
> >
> > > cfbot reports the patch no longer applies.  As CommitFest 2022-11 is
> > > currently underway, this would be an excellent time to update the patch.
> >
> > Attached a v5 which is just a rebase.
>
> The patch does not apply on top of HEAD as in [1], please post a rebased 
> patch:
> === Applying patches on top of PostgreSQL commit ID
> 3c6fc58209f24b959ee18f5d19ef96403d08f15c ===
> === applying patch ./pgbench-multi-connect-conninfo-5.patch
> (Stripping trailing CRs from patch; use --binary to disable.)
> patching file doc/src/sgml/ref/pgbench.sgml
> Hunk #3 FAILED at 921.
> 1 out of 3 hunks FAILED -- saving rejects to file
> doc/src/sgml/ref/pgbench.sgml.rej

There has been no updates on this thread for some time, so this has
been switched as Returned with Feedback. Feel free to change it open
in the next commitfest if you plan to continue on this.

Regards,
Vignesh




Re: [PATCH] pgbench: add multiconnect option

2023-01-11 Thread vignesh C
On Tue, 8 Nov 2022 at 02:16, Fabien COELHO  wrote:
>
>
> Hello Ian,
>
> > cfbot reports the patch no longer applies.  As CommitFest 2022-11 is
> > currently underway, this would be an excellent time to update the patch.
>
> Attached a v5 which is just a rebase.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
3c6fc58209f24b959ee18f5d19ef96403d08f15c ===
=== applying patch ./pgbench-multi-connect-conninfo-5.patch
(Stripping trailing CRs from patch; use --binary to disable.)
patching file doc/src/sgml/ref/pgbench.sgml
Hunk #3 FAILED at 921.
1 out of 3 hunks FAILED -- saving rejects to file
doc/src/sgml/ref/pgbench.sgml.rej

[1] - http://cfbot.cputube.org/patch_41_3227.log

Regards,
Vignesh




Re: [PATCH] pgbench: add multiconnect option

2023-01-10 Thread Fabien COELHO



Hello Jelte,


This patch seems to have quite some use case overlap with my patch which
adds load balancing to libpq itself:
https://www.postgresql.org/message-id/flat/pr3pr83mb04768e2ff04818eeb2179949f7...@pr3pr83mb0476.eurprd83.prod.outlook.com


Thanks for the pointer.

The end purpose of the patch is to allow pgbench to follow a failover at 
some point, at the client level, AFAICR.



My patch is only able to add "random" load balancing though, not
"round-robin". So this patch still definitely seems useful, even when mine
gets merged.


Yep. I'm not sure the end purpose is the same, but possibly the pgbench 
patch could take advantage of libpq extension.



I'm not sure that the support for the "working" connection is necessary
from a feature perspective though (usability/discoverability is another
question). It's already possible to achieve the same behaviour by simply
providing multiple host names in the connection string. You can even tell
libpq to connect to a primary or secondary by using the
target_session_attrs option.


--
Fabien.




Re: [PATCH] pgbench: add multiconnect option

2023-01-06 Thread Jelte Fennema
This patch seems to have quite some use case overlap with my patch which
adds load balancing to libpq itself:
https://www.postgresql.org/message-id/flat/pr3pr83mb04768e2ff04818eeb2179949f7...@pr3pr83mb0476.eurprd83.prod.outlook.com

My patch is only able to add "random" load balancing though, not
"round-robin". So this patch still definitely seems useful, even when mine
gets merged.

I'm not sure that the support for the "working" connection is necessary
from a feature perspective though (usability/discoverability is another
question). It's already possible to achieve the same behaviour by simply
providing multiple host names in the connection string. You can even tell
libpq to connect to a primary or secondary by using the
target_session_attrs option.

On Fri, 6 Jan 2023 at 11:33, Fabien COELHO  wrote:

>
> Hello Ian,
>
> > cfbot reports the patch no longer applies.  As CommitFest 2022-11 is
> > currently underway, this would be an excellent time to update the patch.
>
> Attached a v5 which is just a rebase.
>
> --
> Fabien.


Re: [PATCH] pgbench: add multiconnect option

2022-11-07 Thread Fabien COELHO


Hello Ian,


cfbot reports the patch no longer applies.  As CommitFest 2022-11 is
currently underway, this would be an excellent time to update the patch.


Attached a v5 which is just a rebase.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 40e6a50a7f..a3ae7cc9be 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -29,7 +29,7 @@ PostgreSQL documentation
   
pgbench
option
-   dbname
+   dbname or conninfo
   
  
 
@@ -169,6 +169,9 @@ pgbench  options  d
 not specified, the environment variable
 PGDATABASE is used. If that is not set, the
 user name specified for the connection is used.
+Alternatively, the dbname can be
+a standard connection information string.
+Several connections can be provided.

   
  
@@ -918,6 +921,21 @@ pgbench  options  d
 
 
 
+ 
+  --connection-policy=policy
+  
+   
+Set the connection policy when multiple connections are available.
+Default is round-robin provided (ro).
+Possible values are:
+first (f),
+random (ra),
+round-robin (ro),
+working (w).
+   
+  
+ 
+
  
   -h hostname
   --host=hostname
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index b208d74767..02f8278b34 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -301,13 +301,39 @@ uint32		max_tries = 1;
 bool		failures_detailed = false;	/* whether to group failures in
 		 * reports or logs by basic types */
 
+char	   *logfile_prefix = NULL;
+
+/* main connection definition */
 const char *pghost = NULL;
 const char *pgport = NULL;
 const char *username = NULL;
-const char *dbName = NULL;
-char	   *logfile_prefix = NULL;
 const char *progname;
 
+/* multi connections */
+typedef enum mc_policy_t
+{
+	MC_UNKNOWN = 0,
+	MC_FIRST,
+	MC_RANDOM,
+	MC_ROUND_ROBIN,
+	MC_WORKING
+} mc_policy_t;
+
+/* connection info list */
+typedef struct connection_t
+{
+	const char *connection;		/* conninfo or dbname */
+	int			errors;			/* number of connection errors */
+} connection_t;
+
+static intn_connections = 0;
+static connection_t	   *connections = NULL;
+static mc_policy_t	mc_policy = MC_ROUND_ROBIN;
+
+/* last used connection */
+// FIXME per thread?
+static int current_connection = 0;
+
 #define WSEP '@'/* weight separator */
 
 volatile bool timer_exceeded = false;	/* flag from signal handler */
@@ -871,7 +897,7 @@ usage(void)
 {
 	printf("%s is a benchmarking tool for PostgreSQL.\n\n"
 		   "Usage:\n"
-		   "  %s [OPTION]... [DBNAME]\n"
+		   "  %s [OPTION]... [DBNAME or CONNINFO ...]\n"
 		   "\nInitialization options:\n"
 		   "  -i, --initialize invokes initialization mode\n"
 		   "  -I, --init-steps=[" ALL_INIT_STEPS "]+ (default \"" DEFAULT_INIT_STEPS "\")\n"
@@ -929,6 +955,7 @@ usage(void)
 		   "  -h, --host=HOSTNAME  database server host or socket directory\n"
 		   "  -p, --port=PORT  database server port number\n"
 		   "  -U, --username=USERNAME  connect as specified database user\n"
+		   "  --connection-policy=Sset multiple connection policy (\"first\", \"rand\", \"round-robin\", \"working\")\n"
 		   "  -V, --versionoutput version information, then exit\n"
 		   "  -?, --help   show this help, then exit\n"
 		   "\n"
@@ -1535,13 +1562,89 @@ tryExecuteStatement(PGconn *con, const char *sql)
 	PQclear(res);
 }
 
+/* store a new connection information string */
+static void
+push_connection(const char *c)
+{
+	connections = pg_realloc(connections, sizeof(connection_t) * (n_connections+1));
+	connections[n_connections].connection = pg_strdup(c);
+	connections[n_connections].errors = 0;
+	n_connections++;
+}
+
+/* switch connection */
+static int
+next_connection(int *pci)
+{
+	int ci;
+
+	ci = ((*pci) + 1) % n_connections;
+	*pci = ci;
+
+	return ci;
+}
+
+/* return the connection index to use for next attempt */
+static int
+choose_connection(int *pci)
+{
+	int ci;
+
+	switch (mc_policy)
+	{
+		case MC_FIRST:
+			ci = 0;
+			break;
+		case MC_RANDOM:
+			// FIXME should use a prng state ; not thread safe ;
+			ci = (int) getrand(_random_sequence, 0, n_connections-1);
+			*pci = ci;
+			break;
+		case MC_ROUND_ROBIN:
+			ci = next_connection(pci);
+			break;
+		case MC_WORKING:
+			ci = *pci;
+			break;
+		default:
+			pg_fatal("unexpected multi connection policy: %d", mc_policy);
+			exit(1);
+	}
+
+	return ci;
+}
+
+/* return multi-connection policy based on its name or shortest prefix */
+static mc_policy_t
+get_connection_policy(const char *s)
+{
+	if (s == NULL || *s == '\0' || strcmp(s, "first") == 0 || strcmp(s, "f") == 0)
+		return MC_FIRST;
+	else if (strcmp(s, "random") == 0 || strcmp(s, "ra") == 0)
+		return MC_RANDOM;
+	else if (strcmp(s, "round-robin") == 0 || strcmp(s, "ro") == 0)
+		return MC_ROUND_ROBIN;
+	else if (strcmp(s, "working") == 0 || 

Re: [PATCH] pgbench: add multiconnect option

2022-11-03 Thread Ian Lawrence Barwick
2022年4月2日(土) 22:35 Fabien COELHO :
>
>
> > According to the cfbot this patch needs a rebase
>
> Indeed. v4 attached.

Hi

cfbot reports the patch no longer applies.  As CommitFest 2022-11 is
currently underway, this would be an excellent time to update the patch.

Thanks

Ian Barwick




Re: [PATCH] pgbench: add multiconnect option

2022-04-02 Thread Fabien COELHO



According to the cfbot this patch needs a rebase


Indeed. v4 attached.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index ebdb4b3f46..d96d2d291d 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -29,7 +29,7 @@ PostgreSQL documentation
   
pgbench
option
-   dbname
+   dbname or conninfo
   
  
 
@@ -169,6 +169,9 @@ pgbench  options  d
 not specified, the environment variable
 PGDATABASE is used. If that is not set, the
 user name specified for the connection is used.
+Alternatively, the dbname can be
+a standard connection information string.
+Several connections can be provided.

   
  
@@ -918,6 +921,21 @@ pgbench  options  d
 
 
 
+ 
+  --connection-policy=policy
+  
+   
+Set the connection policy when multiple connections are available.
+Default is round-robin provided (ro).
+Possible values are:
+first (f), 
+random (ra), 
+round-robin (ro),
+working (w).
+   
+  
+ 
+
  
   -h hostname
   --host=hostname
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index acf3e56413..d99d40fbb9 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -305,13 +305,39 @@ uint32		max_tries = 1;
 bool		failures_detailed = false;	/* whether to group failures in reports
 		 * or logs by basic types */
 
+char	   *logfile_prefix = NULL;
+
+/* main connection definition */
 const char *pghost = NULL;
 const char *pgport = NULL;
 const char *username = NULL;
-const char *dbName = NULL;
-char	   *logfile_prefix = NULL;
 const char *progname;
 
+/* multi connections */
+typedef enum mc_policy_t
+{
+	MC_UNKNOWN = 0,
+	MC_FIRST,
+	MC_RANDOM,
+	MC_ROUND_ROBIN,
+	MC_WORKING
+} mc_policy_t;
+
+/* connection info list */
+typedef struct connection_t
+{
+	const char *connection;		/* conninfo or dbname */
+	int			errors;			/* number of connection errors */
+} connection_t;
+
+static intn_connections = 0;
+static connection_t	   *connections = NULL;
+static mc_policy_t	mc_policy = MC_ROUND_ROBIN;
+
+/* last used connection */
+// FIXME per thread?
+static int current_connection = 0;
+
 #define WSEP '@'/* weight separator */
 
 volatile bool timer_exceeded = false;	/* flag from signal handler */
@@ -873,7 +899,7 @@ usage(void)
 {
 	printf("%s is a benchmarking tool for PostgreSQL.\n\n"
 		   "Usage:\n"
-		   "  %s [OPTION]... [DBNAME]\n"
+		   "  %s [OPTION]... [DBNAME or CONNINFO ...]\n"
 		   "\nInitialization options:\n"
 		   "  -i, --initialize invokes initialization mode\n"
 		   "  -I, --init-steps=[" ALL_INIT_STEPS "]+ (default \"" DEFAULT_INIT_STEPS "\")\n"
@@ -931,6 +957,7 @@ usage(void)
 		   "  -h, --host=HOSTNAME  database server host or socket directory\n"
 		   "  -p, --port=PORT  database server port number\n"
 		   "  -U, --username=USERNAME  connect as specified database user\n"
+		   "  --connection-policy=Sset multiple connection policy (\"first\", \"rand\", \"round-robin\", \"working\")\n"
 		   "  -V, --versionoutput version information, then exit\n"
 		   "  -?, --help   show this help, then exit\n"
 		   "\n"
@@ -1538,13 +1565,89 @@ tryExecuteStatement(PGconn *con, const char *sql)
 	PQclear(res);
 }
 
+/* store a new connection information string */
+static void
+push_connection(const char *c)
+{
+	connections = pg_realloc(connections, sizeof(connection_t) * (n_connections+1));
+	connections[n_connections].connection = pg_strdup(c);
+	connections[n_connections].errors = 0;
+	n_connections++;
+}
+
+/* switch connection */
+static int
+next_connection(int *pci)
+{
+	int ci;
+
+	ci = ((*pci) + 1) % n_connections;
+	*pci = ci;
+
+	return ci;
+}
+
+/* return the connection index to use for next attempt */
+static int
+choose_connection(int *pci)
+{
+	int ci;
+
+	switch (mc_policy)
+	{
+		case MC_FIRST:
+			ci = 0;
+			break;
+		case MC_RANDOM:
+			// FIXME should use a prng state ; not thread safe ;
+			ci = (int) getrand(_random_sequence, 0, n_connections-1);
+			*pci = ci;
+			break;
+		case MC_ROUND_ROBIN:
+			ci = next_connection(pci);
+			break;
+		case MC_WORKING:
+			ci = *pci;
+			break;
+		default:
+			pg_log_fatal("unexpected multi connection policy: %d", mc_policy);
+			exit(1);
+	}
+
+	return ci;
+}
+
+/* return multi-connection policy based on its name or shortest prefix */
+static mc_policy_t
+get_connection_policy(const char *s)
+{
+	if (s == NULL || *s == '\0' || strcmp(s, "first") == 0 || strcmp(s, "f") == 0)
+		return MC_FIRST;
+	else if (strcmp(s, "random") == 0 || strcmp(s, "ra") == 0)
+		return MC_RANDOM;
+	else if (strcmp(s, "round-robin") == 0 || strcmp(s, "ro") == 0)
+		return MC_ROUND_ROBIN;
+	else if (strcmp(s, "working") == 0 || strcmp(s, "w") == 0)
+		return MC_WORKING;
+	else
+		return MC_UNKNOWN;
+}
+
+/* get backend connection info */

Re: [PATCH] pgbench: add multiconnect option

2022-03-31 Thread Greg Stark
According to the cfbot this patch needs a rebase




Re: [PATCH] pgbench: add multiconnect option

2022-03-25 Thread Fabien COELHO




Pgbench is a simple benchmark tool by design, and I wonder if adding
a multiconnect feature will cause pgbench to be used incorrectly.


Maybe, but I do not see how it would be worse that what pgbench already
allows.



I agree that pgbench is simple; perhaps really too simple when it comes to
being able to measure much more than basic query flows.  What pgbench does
have in its favor is being distributed with the core distribution.

I think there is definitely space for a more complicated benchmarking tool
that exercises more scenarios and more realistic query patterns and
scenarios.  Whether that is distributed with the core is another question.


As far as this feature is concerned, the source code impact of the patch 
is very small, so I do not think that is worth barring this feature on 
that ground.


--
Fabien.




Re: [PATCH] pgbench: add multiconnect option

2022-03-22 Thread David Christensen
On Sat, Mar 19, 2022 at 11:43 AM Fabien COELHO  wrote:

>
> Hi Sami,
>
> > Pgbench is a simple benchmark tool by design, and I wonder if adding
> > a multiconnect feature will cause pgbench to be used incorrectly.
>
> Maybe, but I do not see how it would be worse that what pgbench already
> allows.
>

I agree that pgbench is simple; perhaps really too simple when it comes to
being able to measure much more than basic query flows.  What pgbench does
have in its favor is being distributed with the core distribution.

I think there is definitely space for a more complicated benchmarking tool
that exercises more scenarios and more realistic query patterns and
scenarios.  Whether that is distributed with the core is another question.

David


Re: [PATCH] pgbench: add multiconnect option

2022-03-19 Thread Fabien COELHO


Hi Sami,


Pgbench is a simple benchmark tool by design, and I wonder if adding
a multiconnect feature will cause pgbench to be used incorrectly.


Maybe, but I do not see how it would be worse that what pgbench already 
allows.



A real world use-case will be helpful for this thread.


Basically more versatile testing for non single host setups.

For instance, it would allow testing directly a multi-master setup, such 
as bucardo, symmetricds or coackroachdb.


It would be a first step on the path to allow interesting features such 
as:


 - testing failover setup, on connection error a client could connect to 
another host.


 - testing a primary/standby setup, with write transactions sent to the 
primary and read transactions sent to the standbyes.


Basically I have no doubt that it can be useful.

For the current patch, Should the report also cover per-database 
statistics (tps/latency/etc.) ?


That could be a "per-connection" option. If there is a reasonable use case 
I think that it would be an easy enough feature to implement.


Attached a rebased version.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index be1896fa99..69bd5b76f1 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -29,7 +29,7 @@ PostgreSQL documentation
   
pgbench
option
-   dbname
+   dbname or conninfo
   
  
 
@@ -160,6 +160,9 @@ pgbench  options  d
 not specified, the environment variable
 PGDATABASE is used. If that is not set, the
 user name specified for the connection is used.
+Alternatively, the dbname can be
+a standard connection information string.
+Several connections can be provided.

   
  
@@ -843,6 +846,21 @@ pgbench  options  d
 
 
 
+ 
+  --connection-policy=policy
+  
+   
+Set the connection policy when multiple connections are available.
+Default is round-robin provided (ro).
+Possible values are:
+first (f), 
+random (ra), 
+round-robin (ro),
+working (w).
+   
+  
+ 
+
  
   -h hostname
   --host=hostname
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 000ffc4a5c..5006e21766 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -278,13 +278,39 @@ bool		is_connect;			/* establish connection for each transaction */
 bool		report_per_command; /* report per-command latencies */
 int			main_pid;			/* main process id used in log filename */
 
+char	   *logfile_prefix = NULL;
+
+/* main connection definition */
 const char *pghost = NULL;
 const char *pgport = NULL;
 const char *username = NULL;
-const char *dbName = NULL;
-char	   *logfile_prefix = NULL;
 const char *progname;
 
+/* multi connections */
+typedef enum mc_policy_t
+{
+	MC_UNKNOWN = 0,
+	MC_FIRST,
+	MC_RANDOM,
+	MC_ROUND_ROBIN,
+	MC_WORKING
+} mc_policy_t;
+
+/* connection info list */
+typedef struct connection_t
+{
+	const char *connection;		/* conninfo or dbname */
+	int			errors;			/* number of connection errors */
+} connection_t;
+
+static intn_connections = 0;
+static connection_t	   *connections = NULL;
+static mc_policy_t	mc_policy = MC_ROUND_ROBIN;
+
+/* last used connection */
+// FIXME per thread?
+static int current_connection = 0;
+
 #define WSEP '@'/* weight separator */
 
 volatile bool timer_exceeded = false;	/* flag from signal handler */
@@ -694,7 +720,7 @@ usage(void)
 {
 	printf("%s is a benchmarking tool for PostgreSQL.\n\n"
 		   "Usage:\n"
-		   "  %s [OPTION]... [DBNAME]\n"
+		   "  %s [OPTION]... [DBNAME or CONNINFO ...]\n"
 		   "\nInitialization options:\n"
 		   "  -i, --initialize invokes initialization mode\n"
 		   "  -I, --init-steps=[" ALL_INIT_STEPS "]+ (default \"" DEFAULT_INIT_STEPS "\")\n"
@@ -749,6 +775,7 @@ usage(void)
 		   "  -h, --host=HOSTNAME  database server host or socket directory\n"
 		   "  -p, --port=PORT  database server port number\n"
 		   "  -U, --username=USERNAME  connect as specified database user\n"
+		   "  --connection-policy=Sset multiple connection policy (\"first\", \"rand\", \"round-robin\", \"working\")\n"
 		   "  -V, --versionoutput version information, then exit\n"
 		   "  -?, --help   show this help, then exit\n"
 		   "\n"
@@ -1323,13 +1350,89 @@ tryExecuteStatement(PGconn *con, const char *sql)
 	PQclear(res);
 }
 
+/* store a new connection information string */
+static void
+push_connection(const char *c)
+{
+	connections = pg_realloc(connections, sizeof(connection_t) * (n_connections+1));
+	connections[n_connections].connection = pg_strdup(c);
+	connections[n_connections].errors = 0;
+	n_connections++;
+}
+
+/* switch connection */
+static int
+next_connection(int *pci)
+{
+	int ci;
+
+	ci = ((*pci) + 1) % n_connections;
+	*pci = ci;
+
+	return ci;
+}
+
+/* return the connection index to use for next attempt */
+static int

Re: [PATCH] pgbench: add multiconnect option

2022-03-18 Thread Imseih (AWS), Sami
The current version of the patch does not apply, so I could not test it.

Here are some comments I have.

Pgbench is a simple benchmark tool by design, and I wonder if adding 
a multiconnect feature will cause pgbench to be used incorrectly.
A real world use-case will be helpful for this thread.

For the current patch, Should the report also cover per-database statistics 
(tps/latency/etc.) ?

Regards,

Sami Imseih
Amazon Web Services




Re: [PATCH] pgbench: add multiconnect option

2022-03-16 Thread Fabien COELHO



Hello Greg,


It looks like David sent a patch and Fabien sent a followup patch. But
there hasn't been a whole lot of discussion or further patches.

It sounds like there are some basic questions about what the right
interface should be. Are there specific questions that would be
helpful for moving forward?


Review the designs and patches and tell us what you think?

Personnaly, I think that allowing multiple connections is a good thing, 
especially if the code impact is reduced, which is the case with the 
version I sent.


Then for me the next step would be to have a reconnection on errors so as 
to implement a client-side failover policy that could help testing a 
server-failover performance impact. I have done that internally but it 
requires that "Pgbench Serialization and deadlock errors" to land, as it 
would just be another error that can be handled.


--
Fabien.




Re: [PATCH] pgbench: add multiconnect option

2022-03-15 Thread Greg Stark
Hi guys,

It looks like David sent a patch and Fabien sent a followup patch. But
there hasn't been a whole lot of discussion or further patches.

It sounds like there are some basic questions about what the right
interface should be. Are there specific questions that would be
helpful for moving forward?




Re: [PATCH] pgbench: add multiconnect option

2021-08-28 Thread Fabien COELHO


Hello David,


round-robin and random make sense.  I am wondering how round-robin
would work with -C, though?  Would you just reuse the same connection
string as the one chosen at the starting point.


Well, not necessarily, but this is debatable.


My expectation for such a behavior would be that it would reconnect to
a random connstring each time, otherwise what's the point of using
this with -C?  If we needed to forbid some option combinations that is
also an option.


Yep. ISTM that it should follow the connection policy/strategy, what ever 
it is.



I was thinking of providing a allowing a list of conninfo strings with
repeated options, eg --conninfo "foo" --conninfo "bla"…


That was my first thought when reading the subject of this thread:
create a list of connection strings and pass one of them to
doConnect() to grab the properties looked for.  That's a bit confusing
though as pgbench does not support directly connection strings,


They are supported because libpq silently assumes that "dbname" can be a
full connection string.


and we should be careful to keep fallback_application_name intact.


Hmmm. See attached patch, ISTM that it does the right thing.


I guess the multiple --conninfo approach is fine; I personally liked
having the list come from a file, as you could benchmark different
groups/clusters based on a file, much easier than constructing
multiple pgbench invocations depending.  I can see an argument for
both approaches.  The PGSERVICEFILE was an idea I'd had to store
easily indexed groups of connection information in a way that I didn't
need to know all the details, could easily parse, and could later pass
in the ENV so libpq could just pull out the information.


The attached version does work with the service file if the user provides 
"service=whatever" on the command line. The main difference is that it 
sticks to the libpq policy to use an explicit connection string or list of 
connection strings.


Also, note that the patch I sent dropped the --conninfo option. 
Connections are simply tghe last arguments to pgbench.



I'll see if I can take a look at your latest patch.


Thanks!

I was also wondering about how we should handle `pgbench -i` with 
multiple connection strings; currently it would only initialize with the 
first DSN it gets, but it probably makes sense to run initialize against 
all of the databases (or at least attempt to).


I'll tend to disagree on this one. Pgbench whole expectation is to run 
against "one" system, which might be composed of several nodes because of 
replications. I do not think that it is desirable to jump to "serveral 
fully independent databases".


Maybe this is one argument for the multiple --conninfo handling, since 
you could explicitly pass the databases you want.  (Not that it is hard 
to just loop over connection info and `pgbench -i` with ENV, or any 
other number of ways to accomplish the same thing.)


Yep.

--
Fabien.

Re: [PATCH] pgbench: add multiconnect option

2021-08-27 Thread David Christensen
> >> Good. I was thinking of adding such capability, possibly for handling
> >> connection errors and reconnecting…
> >
> > round-robin and random make sense.  I am wondering how round-robin
> > would work with -C, though?  Would you just reuse the same connection
> > string as the one chosen at the starting point.
>
> Well, not necessarily, but this is debatable.

My expectation for such a behavior would be that it would reconnect to
a random connstring each time, otherwise what's the point of using
this with -C?  If we needed to forbid some option combinations that is
also an option.

> >> I was thinking of providing a allowing a list of conninfo strings with
> >> repeated options, eg --conninfo "foo" --conninfo "bla"…
> >
> > That was my first thought when reading the subject of this thread:
> > create a list of connection strings and pass one of them to
> > doConnect() to grab the properties looked for.  That's a bit confusing
> > though as pgbench does not support directly connection strings,
>
> They are supported because libpq silently assumes that "dbname" can be a
> full connection string.
>
> > and we should be careful to keep fallback_application_name intact.
>
> Hmmm. See attached patch, ISTM that it does the right thing.

I guess the multiple --conninfo approach is fine; I personally liked
having the list come from a file, as you could benchmark different
groups/clusters based on a file, much easier than constructing
multiple pgbench invocations depending.  I can see an argument for
both approaches.  The PGSERVICEFILE was an idea I'd had to store
easily indexed groups of connection information in a way that I didn't
need to know all the details, could easily parse, and could later pass
in the ENV so libpq could just pull out the information.

> >> Your approach using PGSERVICEFILE also make sense!
> >
> > I am not sure that's actually needed here, as it is possible to pass
> > down a service name within a connection string.  I think that you'd
> > better leave libpq do all the work related to a service file, if
> > specified.  pgbench does not need to know any of that.
>
> Yes, this is an inconvenient with this approach, part of libpq machinery
> is more or less replicated in pgbench, which is quite annoying, and less
> powerful.

There is some small fraction reproduced here just to pull out the
named sections; no other parsing should be done though.

> Attached my work-in-progress version, with a few open issues (eg probably
> not thread safe), but comments about the provided feature are welcome.
>
> I borrowed the "strategy" option, renamed policy, from the initial patch.
> Pgbench just accepts several connection strings as parameters, eg:
>
>pgbench ... "service=db1" "service=db2" "service=db3"
>
> The next stage is to map scripts to connections types and connections
> to connection types, so that pgbench could run W transactions against a
> primary and R transactions agains a hot standby, for instance. I have a
> some design for that, but nothing is implemented.
>
> There is also the combination with the error handling patch to consider:
> if a connection fails, a connection to a replica could be issued instead.

I'll see if I can take a look at your latest patch.  I was also
wondering about how we should handle `pgbench -i` with multiple
connection strings; currently it would only initialize with the first
DSN it gets, but it probably makes sense to run initialize against all
of the databases (or at least attempt to).  Maybe this is one argument
for the multiple --conninfo handling, since you could explicitly pass
the databases you want.  (Not that it is hard to just loop over
connection info and `pgbench -i` with ENV, or any other number of ways
to accomplish the same thing.)

Best,

David




Re: [PATCH] pgbench: add multiconnect option

2021-08-27 Thread Fabien COELHO


Bonjour Michaël,


Good. I was thinking of adding such capability, possibly for handling
connection errors and reconnecting…


round-robin and random make sense.  I am wondering how round-robin
would work with -C, though?  Would you just reuse the same connection
string as the one chosen at the starting point.


Well, not necessarily, but this is debatable.


I was thinking of providing a allowing a list of conninfo strings with
repeated options, eg --conninfo "foo" --conninfo "bla"…


That was my first thought when reading the subject of this thread:
create a list of connection strings and pass one of them to
doConnect() to grab the properties looked for.  That's a bit confusing
though as pgbench does not support directly connection strings,


They are supported because libpq silently assumes that "dbname" can be a 
full connection string.



and we should be careful to keep fallback_application_name intact.


Hmmm. See attached patch, ISTM that it does the right thing.


Your approach using PGSERVICEFILE also make sense!


I am not sure that's actually needed here, as it is possible to pass
down a service name within a connection string.  I think that you'd
better leave libpq do all the work related to a service file, if
specified.  pgbench does not need to know any of that.


Yes, this is an inconvenient with this approach, part of libpq machinery
is more or less replicated in pgbench, which is quite annoying, and less 
powerful.


Attached my work-in-progress version, with a few open issues (eg probably 
not thread safe), but comments about the provided feature are welcome.


I borrowed the "strategy" option, renamed policy, from the initial patch. 
Pgbench just accepts several connection strings as parameters, eg:


  pgbench ... "service=db1" "service=db2" "service=db3"

The next stage is to map scripts to connections types and connections
to connection types, so that pgbench could run W transactions against a 
primary and R transactions agains a hot standby, for instance. I have a 
some design for that, but nothing is implemented.


There is also the combination with the error handling patch to consider: 
if a connection fails, a connection to a replica could be issued instead.


--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 0c60077e1f..7b99344c90 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -29,7 +29,7 @@ PostgreSQL documentation
   
pgbench
option
-   dbname
+   dbname or conninfo
   
  
 
@@ -160,6 +160,9 @@ pgbench  options  d
 not specified, the environment variable
 PGDATABASE is used. If that is not set, the
 user name specified for the connection is used.
+Alternatively, the dbname can be
+a standard connection information string.
+Several connections can be provided.

   
  
@@ -840,6 +843,21 @@ pgbench  options  d
 
 
 
+ 
+  --connection-policy=policy
+  
+   
+Set the connection policy when multiple connections are available.
+Default is round-robin provided (ro).
+Possible values are:
+first (f), 
+random (ra), 
+round-robin (ro),
+working (w).
+   
+  
+ 
+
  
   -h hostname
   --host=hostname
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index b0e20c46ae..95e58f0573 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -277,13 +277,39 @@ bool		is_connect;			/* establish connection for each transaction */
 bool		report_per_command; /* report per-command latencies */
 int			main_pid;			/* main process id used in log filename */
 
+char	   *logfile_prefix = NULL;
+
+/* main connection definition */
 const char *pghost = NULL;
 const char *pgport = NULL;
 const char *username = NULL;
-const char *dbName = NULL;
-char	   *logfile_prefix = NULL;
 const char *progname;
 
+/* multi connections */
+typedef enum mc_policy_t
+{
+	MC_UNKNOWN = 0,
+	MC_FIRST,
+	MC_RANDOM,
+	MC_ROUND_ROBIN,
+	MC_WORKING
+} mc_policy_t;
+
+/* connection info list */
+typedef struct connection_t
+{
+	const char *connection;		/* conninfo or dbname */
+	int			errors;			/* number of connection errors */
+} connection_t;
+
+static intn_connections = 0;
+static connection_t	   *connections = NULL;
+static mc_policy_t	mc_policy = MC_ROUND_ROBIN;
+
+/* last used connection */
+// FIXME per thread?
+static int current_connection = 0;
+
 #define WSEP '@'/* weight separator */
 
 volatile bool timer_exceeded = false;	/* flag from signal handler */
@@ -701,7 +727,7 @@ usage(void)
 {
 	printf("%s is a benchmarking tool for PostgreSQL.\n\n"
 		   "Usage:\n"
-		   "  %s [OPTION]... [DBNAME]\n"
+		   "  %s [OPTION]... [DBNAME or CONNINFO ...]\n"
 		   "\nInitialization options:\n"
 		   "  -i, --initialize invokes initialization mode\n"
 		   "  -I, --init-steps=[" ALL_INIT_STEPS "]+ (default \"" DEFAULT_INIT_STEPS "\")\n"

Re: [PATCH] pgbench: add multiconnect option

2021-08-27 Thread Michael Paquier
On Thu, Jul 01, 2021 at 12:22:45PM +0200, Fabien COELHO wrote:
> Good. I was thinking of adding such capability, possibly for handling
> connection errors and reconnecting…

round-robin and random make sense.  I am wondering how round-robin
would work with -C, though?  Would you just reuse the same connection
string as the one chosen at the starting point.

> I was thinking of providing a allowing a list of conninfo strings with
> repeated options, eg --conninfo "foo" --conninfo "bla"…

That was my first thought when reading the subject of this thread:
create a list of connection strings and pass one of them to
doConnect() to grab the properties looked for.  That's a bit confusing
though as pgbench does not support directly connection strings, and we
should be careful to keep fallback_application_name intact.

> Your approach using PGSERVICEFILE also make sense!

I am not sure that's actually needed here, as it is possible to pass
down a service name within a connection string.  I think that you'd
better leave libpq do all the work related to a service file, if
specified.  pgbench does not need to know any of that.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] pgbench: add multiconnect option

2021-07-01 Thread Fabien COELHO


Hello David,


This patch adds the concept of "multiconnect" to pgbench (better
terminology welcome).


Good. I was thinking of adding such capability, possibly for handling 
connection errors and reconnecting…


The basic idea here is to allow connections made with pgbench to use 
different auth values or connect to multiple databases. We implement 
this using a user-provided PGSERVICEFILE and choosing a PGSERVICE from 
this based on a number of strategies. (Currently the only supported 
strategies are round robin or random.)


I was thinking of providing a allowing a list of conninfo strings with 
repeated options, eg --conninfo "foo" --conninfo "bla"…


Your approach using PGSERVICEFILE also make sense!

Maybe it could be simplified, the code base reduced, and provide more 
benefits, by mixing both ideas.


In particular, pgbench parses the file but then it will be read also by 
libpq, yuk yuk.


Also, I do not like that PGSERVICE is overriden by pgbench, while other 
options are passed with the parameters approach in doConnect. It would 
make proce sense to add a "service" field to the parameters for 
consistency, if this approach was to be pursued.


On reflexion, I'd suggest to use the --conninfo (or some other name) 
approach, eg "pgbench --conninfo='service=s1' --conninfo='service=s2'" and 
users just have to set PGSERVICEFILE env themselves, which I think is 
better than pgbench overriding env variables behind their back.


This allow to have a service file with more connections and just tell 
pgbench which ones to use, which is the expected way to use this feature. 
This drops file parsing.


I can only see benefit to this simplified approach.
What do you think?

About the patch:

There are warnings about trailing whitespaces when applying the patch, and 
there are some tabbing issues in the file.


I would not consume "-g" option unless there is some logical link with the 
feature. I'd be okay with "-m" if it is still needed. I would suggest to 
use it for the choice strategy?


stringinfo: We already have PQExpBuffer imported, could we use that 
instead? Having two set of struct/functions which do the same in the same 
source file does not look like a good idea. If we do not parse the file, 
nothing is needed, which is a relief.


Attached is my work-in-progress start at adding conninfo, that would need 
to be improved with strategies.


--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 0c60077e1f..d1390e83e5 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -840,6 +840,16 @@ pgbench  options  d
 
 
 
+ 
+  --conninfo=conninfo
+  
+   
+Add a conninfo connection information string
+to the pool of possible connection strings.
+   
+  
+ 
+
  
   -h hostname
   --host=hostname
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 4aeccd93af..ad71a568db 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -276,13 +276,36 @@ bool		is_connect;			/* establish connection for each transaction */
 bool		report_per_command; /* report per-command latencies */
 int			main_pid;			/* main process id used in log filename */
 
+char	   *logfile_prefix = NULL;
+
+/* main connection definition */
 const char *pghost = NULL;
 const char *pgport = NULL;
 const char *username = NULL;
 const char *dbName = NULL;
-char	   *logfile_prefix = NULL;
 const char *progname;
 
+/* connection info list */
+typedef struct conninfo_t
+{
+	const char *conninfo;
+	int			errors;
+} conninfo_t;
+
+#define		MAX_CONNINFO	8
+int			n_conninfo = 0;
+conninfo_t	conninfos[MAX_CONNINFO];
+
+static void
+push_conninfo(const char *ci)
+{
+	// FIXME nicer error
+	Assert(n_conninfo < MAX_CONNINFO);
+	conninfos[n_conninfo].conninfo = ci;
+	conninfos[n_conninfo].errors = 0;
+	n_conninfo++;
+}
+
 #define WSEP '@'/* weight separator */
 
 volatile bool timer_exceeded = false;	/* flag from signal handler */
@@ -751,6 +774,7 @@ usage(void)
 		   "  -U, --username=USERNAME  connect as specified database user\n"
 		   "  -V, --versionoutput version information, then exit\n"
 		   "  -?, --help   show this help, then exit\n"
+		   "  --conninfo=CONNINFO  add a database server conninfo\n"
 		   "\n"
 		   "Report bugs to <%s>.\n"
 		   "%s home page: <%s>\n",
@@ -1343,9 +1367,44 @@ tryExecuteStatement(PGconn *con, const char *sql)
 	PQclear(res);
 }
 
+/* set up a connection to the backend with a provided conninfo */
+static PGconn *
+doConnectCI(bool change)
+{
+	static int	nci = 0;
+	PGconn	   *conn;
+	conninfo_t *ci = [nci];
+
+	conn = PQconnectdb(ci->conninfo);
+
+	if (!conn)
+	{
+		ci->errors++;
+		pg_log_error("connection to database \"%s\" failed", ci->conninfo);
+		/* try another one next time */
+		nci = (nci + 1) % n_conninfo;
+		return NULL;
+	}
+
+	if (PQstatus(conn) == CONNECTION_BAD)
+	{
+		ci->errors++;
+		pg_log_error("%s", 

[PATCH] pgbench: add multiconnect option

2021-06-30 Thread David Christensen
-hackers,

This patch adds the concept of "multiconnect" to pgbench (better
terminology welcome).  The basic idea here is to allow connections made
with pgbench to use different auth values or connect to multiple
databases. We implement this using a user-provided PGSERVICEFILE and
choosing a PGSERVICE from this based on a number of strategies.
(Currently the only supported strategies are round robin or random.)

There is definite room for improvement here; at the very least, teaching
`pgbench -i` about all of the distinct DBs referenced in this service
file would ensure that initialization works as expected in all places.
For now, we are punting initialization to the user in this version of
the patch if using more that one database in the given service file.

Best,

David
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 0c60077e1f..94616c13c2 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -161,6 +161,11 @@ pgbench  options  d
 PGDATABASE is used. If that is not set, the
 user name specified for the connection is used.

+   
+If multiconnect mode is enabled, a defined
+dbname in the chosen service will override this
+value.
+   
   
  
 
@@ -840,6 +845,39 @@ pgbench  options  d
 
 
 
+ 
+  -m servicefile
+  --multiconnect-file=servicefile
+  
+   
+Turns on multiconnect mode and uses the given
+pg_service-style file to derive connection
+information from.  Any/all connection parameters in this file will
+overwrite any that were provided in the command-line.
+   
+   
+Since this behavior will make a connection using
+the PGSERVICEFILE mechanism, it is possible to
+connect to other databases than the one provided in the original
+command invocation.  This option assumes that the user has previously
+run the necessarily initialization steps against all databases that
+would be accessed via this service file.
+   
+  
+ 
+
+ 
+  -g roundrobin|random
+  --multiconnect-strategy=roundrobin|random
+  
+   
+Selects the strategy by which multiconnect mode
+uses the connections defined in the indicated service file.  The
+default value is roundrobin.
+   
+  
+ 
+
  
   -h hostname
   --host=hostname
@@ -847,6 +885,11 @@ pgbench  options  d

 The database server's host name

+   
+If multiconnect mode is enabled, a defined
+host in the chosen service will override this
+value.
+   
   
  
 
@@ -857,6 +900,11 @@ pgbench  options  d

 The database server's port number

+   
+If multiconnect mode is enabled, a defined
+port in the chosen service will override this
+value.
+   
   
  
 
@@ -867,6 +915,11 @@ pgbench  options  d

 The user name to connect as

+   
+If multiconnect mode is enabled, a defined
+user in the chosen service will override this
+value.
+   
   
  
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 4aeccd93af..2834c9ef3c 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -69,6 +69,7 @@
 #include "pgbench.h"
 #include "port/pg_bitutils.h"
 #include "portability/instr_time.h"
+#include "lib/stringinfo.h"
 
 #ifndef M_PI
 #define M_PI 3.14159265358979323846
@@ -275,6 +276,8 @@ int			nthreads = 1;		/* number of threads */
 bool		is_connect;			/* establish connection for each transaction */
 bool		report_per_command; /* report per-command latencies */
 int			main_pid;			/* main process id used in log filename */
+int num_service_names = 0; /* how many service file names are in the indicated service file */
+int cur_service_index = 0; /* the index of the next service file; used for round-robin */
 
 const char *pghost = NULL;
 const char *pgport = NULL;
@@ -282,6 +285,7 @@ const char *username = NULL;
 const char *dbName = NULL;
 char	   *logfile_prefix = NULL;
 const char *progname;
+const char **service_names = NULL;
 
 #define WSEP '@'/* weight separator */
 
@@ -549,6 +553,14 @@ typedef enum QueryMode
 static QueryMode querymode = QUERY_SIMPLE;
 static const char *QUERYMODE[] = {"simple", "extended", "prepared"};
 
+typedef enum MultiConnectStrategy
+{
+	MC_ROUND_ROBIN,
+	MC_RANDOM
+} MultiConnectStrategy;
+
+static MultiConnectStrategy multiconnect_strategy = MC_ROUND_ROBIN;
+
 /*
  * struct Command represents one command in a script.
  *
@@ -663,7 +675,7 @@ static void clear_socket_set(socket_set *sa);
 static void add_socket_to_set(socket_set *sa, int fd, int idx);
 static int	wait_on_socket_set(socket_set *sa, int64 usecs);
 static bool socket_has_input(socket_set *sa, int fd, int idx);
-
+static const char