Re: pgbench bug candidate: negative "initial connection time"

2021-11-03 Thread Yugo NAGATA
On Tue, 2 Nov 2021 23:11:39 +0900
Fujii Masao  wrote:

> 
> 
> On 2021/11/01 23:01, Fujii Masao wrote:
> >> The remainings are the changes of handling of initial connection or
> >> logfile open failures. I agree to push them at least for the master.
> >> But I'm not sure if they should be back-patched. Without these changes,
> >> even when those failures happen, pgbench proceeds the benchmark and
> >> reports the result. But with the changes, pgbench exits immediately in
> >> that case. I'm not sure if there are people who expect this behavior,
> >> but if there are, maybe we should not change it at least at stable 
> >> branches.
> >> Thought?
> > 
> > The current behavior should be improved, but not a bug.
> > So I don't think that the patch needs to be back-patched.
> > Barring any objection, I will push the attached patch to the master.
> 
> Pushed. Thanks!

Thanks!

> 
> I also pushed the typo-fix patch that I proposed upthread.
> 
> Regards,
> 
> -- 
> Fujii Masao
> Advanced Computing Technology Center
> Research and Development Headquarters
> NTT DATA CORPORATION
> 
> 


-- 
Yugo NAGATA 




Re: pgbench bug candidate: negative "initial connection time"

2021-11-02 Thread Fujii Masao




On 2021/11/01 23:01, Fujii Masao wrote:

The remainings are the changes of handling of initial connection or
logfile open failures. I agree to push them at least for the master.
But I'm not sure if they should be back-patched. Without these changes,
even when those failures happen, pgbench proceeds the benchmark and
reports the result. But with the changes, pgbench exits immediately in
that case. I'm not sure if there are people who expect this behavior,
but if there are, maybe we should not change it at least at stable branches.
Thought?


The current behavior should be improved, but not a bug.
So I don't think that the patch needs to be back-patched.
Barring any objection, I will push the attached patch to the master.


Pushed. Thanks!

I also pushed the typo-fix patch that I proposed upthread.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: pgbench bug candidate: negative "initial connection time"

2021-11-01 Thread Fujii Masao




On 2021/10/09 0:41, Fujii Masao wrote:



On 2021/10/01 15:27, Michael Paquier wrote:

On Wed, Sep 29, 2021 at 10:11:53PM +0900, Fujii Masao wrote:

BTW, when logfile fails to be opened, pgbench gets stuck due to commit
aeb57af8e6. So even if we decided not to back-patch those changes,
we should improve the handling of logfile open failure, to fix the issue.


There is an entry in the CF for this thread:
https://commitfest.postgresql.org/34/3219/

I have moved that to the next one as some pieces are missing.  If you
are planning to handle the rest, could you register your name as a
committer?


Thanks for letting me know that!
I registered myself as a committer of the patch again.


 pg_time_usec_t conn_duration;    /* cumulated connection and deconnection
  * delays */

BTW, while reading the patch, I found the above comment in pgbench.c.
"deconnection" seems a valid word in French (?), but isn't it better to
replace it with "disconnection"? Patch attached.


Barring any objection, I will push this patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: pgbench bug candidate: negative "initial connection time"

2021-11-01 Thread Fujii Masao



On 2021/09/29 22:11, Fujii Masao wrote:



On 2021/09/24 11:26, Fujii Masao wrote:



On 2021/09/24 7:26, Yugo NAGATA wrote:

That makes sense. Failures of setup connection or initial connection doesn't
seem 'static problems'. I rewrote this description to explain exit status 1
indicates also interal errors and early errors.

   Exit status 1 indicates static problems such as invalid command-line options
   or internal errors which are supposed to never occur.  Early errors that 
occur
   when starting benchmark such as initial connection failures also exit with
   status 1.


LGTM. Barring any objection, I will commit the patch.


I extracted two changes from the patch and pushed (also back-patched) them.

The remainings are the changes of handling of initial connection or
logfile open failures. I agree to push them at least for the master.
But I'm not sure if they should be back-patched. Without these changes,
even when those failures happen, pgbench proceeds the benchmark and
reports the result. But with the changes, pgbench exits immediately in
that case. I'm not sure if there are people who expect this behavior,
but if there are, maybe we should not change it at least at stable branches.
Thought?


The current behavior should be improved, but not a bug.
So I don't think that the patch needs to be back-patched.
Barring any objection, I will push the attached patch to the master.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 0f432767c2..c71dab644c 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -904,10 +904,12 @@ pgbench  options 
 d
 
   
A successful run will exit with status 0.  Exit status 1 indicates static
-   problems such as invalid command-line options.  Errors during the run such
-   as database errors or problems in the script will result in exit status 2.
-   In the latter case, pgbench will print partial
-   results.
+   problems such as invalid command-line options or internal errors which
+   are supposed to never occur.  Early errors that occur when starting
+   benchmark such as initial connection failures also exit with status 1.
+   Errors during the run such as database errors or problems in the script
+   will result in exit status 2. In the latter case,
+   pgbench will print partial results.
   
  
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index d17f69333f..f8331bbb60 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3181,6 +3181,7 @@ advanceConnectionState(TState *thread, CState *st, 
StatsData *agg)
 
if ((st->con = doConnect()) == NULL)
{
+   /* as the bench is already 
running, we do not abort the process */
pg_log_error("client %d aborted 
while establishing connection", st->id);
st->state = CSTATE_ABORTED;
break;
@@ -4456,7 +4457,10 @@ runInitSteps(const char *initialize_steps)
initPQExpBuffer();
 
if ((con = doConnect()) == NULL)
+   {
+   pg_log_fatal("could not create connection for initialization");
exit(1);
+   }
 
setup_cancel_handler(NULL);
SetCancelConn(con);
@@ -6399,7 +6403,10 @@ main(int argc, char **argv)
/* opening connection... */
con = doConnect();
if (con == NULL)
+   {
+   pg_log_fatal("could not create connection for setup");
exit(1);
+   }
 
/* report pgbench and server versions */
printVersion(con);
@@ -6625,7 +6632,7 @@ threadRun(void *arg)
if (thread->logfile == NULL)
{
pg_log_fatal("could not open logfile \"%s\": %m", 
logpath);
-   goto done;
+   exit(1);
}
}
 
@@ -6650,16 +6657,10 @@ threadRun(void *arg)
{
if ((state[i].con = doConnect()) == NULL)
{
-   /*
-* On connection failure, we meet the barrier 
here in place of
-* GO before proceeding to the "done" path 
which will cleanup,
-* so as to avoid locking the process.
-*
-* It is unclear whether it is worth doing 
anything rather
-* than coldly exiting with an error message.
-*/
-   THREAD_BARRIER_WAIT();
-   goto done;
+   /* coldly 

Re: pgbench bug candidate: negative "initial connection time"

2021-10-08 Thread Fujii Masao



On 2021/10/01 15:27, Michael Paquier wrote:

On Wed, Sep 29, 2021 at 10:11:53PM +0900, Fujii Masao wrote:

BTW, when logfile fails to be opened, pgbench gets stuck due to commit
aeb57af8e6. So even if we decided not to back-patch those changes,
we should improve the handling of logfile open failure, to fix the issue.


There is an entry in the CF for this thread:
https://commitfest.postgresql.org/34/3219/

I have moved that to the next one as some pieces are missing.  If you
are planning to handle the rest, could you register your name as a
committer?


Thanks for letting me know that!
I registered myself as a committer of the patch again.


pg_time_usec_t conn_duration;   /* cumulated connection and deconnection
 * 
delays */

BTW, while reading the patch, I found the above comment in pgbench.c.
"deconnection" seems a valid word in French (?), but isn't it better to
replace it with "disconnection"? Patch attached.


Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index d6b31bbf53..81693e1765 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -509,7 +509,7 @@ typedef struct
pg_time_usec_t create_time; /* thread creation time */
pg_time_usec_t started_time;/* thread is running */
pg_time_usec_t bench_start; /* thread is benchmarking */
-   pg_time_usec_t conn_duration;   /* cumulated connection and deconnection
+   pg_time_usec_t conn_duration;   /* cumulated connection and 
disconnection
 * 
delays */
 
StatsData   stats;


Re: pgbench bug candidate: negative "initial connection time"

2021-10-01 Thread Michael Paquier
On Wed, Sep 29, 2021 at 10:11:53PM +0900, Fujii Masao wrote:
> BTW, when logfile fails to be opened, pgbench gets stuck due to commit
> aeb57af8e6. So even if we decided not to back-patch those changes,
> we should improve the handling of logfile open failure, to fix the issue.

There is an entry in the CF for this thread:
https://commitfest.postgresql.org/34/3219/

I have moved that to the next one as some pieces are missing.  If you
are planning to handle the rest, could you register your name as a
committer?
--
Michael


signature.asc
Description: PGP signature


Re: pgbench bug candidate: negative "initial connection time"

2021-09-29 Thread Fujii Masao




On 2021/09/24 11:26, Fujii Masao wrote:



On 2021/09/24 7:26, Yugo NAGATA wrote:

That makes sense. Failures of setup connection or initial connection doesn't
seem 'static problems'. I rewrote this description to explain exit status 1
indicates also interal errors and early errors.

   Exit status 1 indicates static problems such as invalid command-line options
   or internal errors which are supposed to never occur.  Early errors that 
occur
   when starting benchmark such as initial connection failures also exit with
   status 1.


LGTM. Barring any objection, I will commit the patch.


I extracted two changes from the patch and pushed (also back-patched) them.

The remainings are the changes of handling of initial connection or
logfile open failures. I agree to push them at least for the master.
But I'm not sure if they should be back-patched. Without these changes,
even when those failures happen, pgbench proceeds the benchmark and
reports the result. But with the changes, pgbench exits immediately in
that case. I'm not sure if there are people who expect this behavior,
but if there are, maybe we should not change it at least at stable branches.
Thought?

BTW, when logfile fails to be opened, pgbench gets stuck due to commit
aeb57af8e6. So even if we decided not to back-patch those changes,
we should improve the handling of logfile open failure, to fix the issue.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: pgbench bug candidate: negative "initial connection time"

2021-09-23 Thread Fujii Masao




On 2021/09/24 7:26, Yugo NAGATA wrote:

That makes sense. Failures of setup connection or initial connection doesn't
seem 'static problems'. I rewrote this description to explain exit status 1
indicates also interal errors and early errors.

   Exit status 1 indicates static problems such as invalid command-line options
   or internal errors which are supposed to never occur.  Early errors that 
occur
   when starting benchmark such as initial connection failures also exit with
   status 1.


LGTM. Barring any objection, I will commit the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: pgbench bug candidate: negative "initial connection time"

2021-09-23 Thread Yugo NAGATA
Hello Fujii-san,

On Tue, 7 Sep 2021 02:34:17 +0900
Fujii Masao  wrote:

> On 2021/07/29 13:23, Yugo NAGATA wrote:
> > Hello,
> > 
> > On Fri, 18 Jun 2021 15:58:48 +0200 (CEST)
> > Fabien COELHO  wrote:
> > 
> >> Attached Yugo-san patch with some updates discussed in the previous mails,
> >> so as to move things along.
> > 
> > I attached the patch rebased to a change due to 856de3b39cf.
> 
> + pg_log_fatal("connection for initialization failed");
> + pg_log_fatal("setup connection failed");
> + pg_log_fatal("cannot create connection for 
> client %d",
> 
> These fatal messages output when doConnect() fails should be a bit more 
> consistent each other? For example,
> 
>  could not create connection for initialization
>  could not create connection for setup
>  could not create connection for client %d

Ok. I fixed as your suggestion.

> > Exit status 1 indicates static problems such as invalid command-line 
> > options.
> > Errors during the run such as database errors or problems in the script will
> > result in exit status 2.
> 
> While reading the code and docs related to the patch, I found
> these descriptions in pgbench docs. The first description needs to be
> updated? Because even database error (e.g., failure of connection for setup)
> can result in exit status 1 if it happens before the benchmark actually runs.

That makes sense. Failures of setup connection or initial connection doesn't
seem 'static problems'. I rewrote this description to explain exit status 1
indicates also interal errors and early errors.

  Exit status 1 indicates static problems such as invalid command-line options
  or internal errors which are supposed to never occur.  Early errors that occur
  when starting benchmark such as initial connection failures also exit with
  status 1.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 0f432767c2..c71dab644c 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -904,10 +904,12 @@ pgbench  options  d
 
   
A successful run will exit with status 0.  Exit status 1 indicates static
-   problems such as invalid command-line options.  Errors during the run such
-   as database errors or problems in the script will result in exit status 2.
-   In the latter case, pgbench will print partial
-   results.
+   problems such as invalid command-line options or internal errors which
+   are supposed to never occur.  Early errors that occur when starting
+   benchmark such as initial connection failures also exit with status 1.
+   Errors during the run such as database errors or problems in the script
+   will result in exit status 2. In the latter case,
+   pgbench will print partial results.
   
  
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 433abd954b..8d44399c16 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3181,6 +3181,7 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 
 	if ((st->con = doConnect()) == NULL)
 	{
+		/* as the bench is already running, we do not abort the process */
 		pg_log_error("client %d aborted while establishing connection", st->id);
 		st->state = CSTATE_ABORTED;
 		break;
@@ -4456,7 +4457,10 @@ runInitSteps(const char *initialize_steps)
 	initPQExpBuffer();
 
 	if ((con = doConnect()) == NULL)
+	{
+		pg_log_fatal("could not create connection for initialization");
 		exit(1);
+	}
 
 	setup_cancel_handler(NULL);
 	SetCancelConn(con);
@@ -6399,7 +6403,10 @@ main(int argc, char **argv)
 	/* opening connection... */
 	con = doConnect();
 	if (con == NULL)
+	{
+		pg_log_fatal("could not create connection for setup");
 		exit(1);
+	}
 
 	/* report pgbench and server versions */
 	printVersion(con);
@@ -6553,7 +6560,7 @@ main(int argc, char **argv)
 #endif			/* ENABLE_THREAD_SAFETY */
 
 		for (int j = 0; j < thread->nstate; j++)
-			if (thread->state[j].state == CSTATE_ABORTED)
+			if (thread->state[j].state != CSTATE_FINISHED)
 exit_code = 2;
 
 		/* aggregate thread level stats */
@@ -6625,7 +6632,7 @@ threadRun(void *arg)
 		if (thread->logfile == NULL)
 		{
 			pg_log_fatal("could not open logfile \"%s\": %m", logpath);
-			goto done;
+			exit(1);
 		}
 	}
 
@@ -6650,16 +6657,10 @@ threadRun(void *arg)
 		{
 			if ((state[i].con = doConnect()) == NULL)
 			{
-/*
- * On connection failure, we meet the barrier here in place of
- * GO before proceeding to the "done" path which will cleanup,
- * so as to avoid locking the process.
- *
- * It is unclear whether it is worth doing anything rather
- * than coldly exiting with an error message.
- */
-THREAD_BARRIER_WAIT();
-goto done;
+/* coldly abort on initial connection failure */
+pg_log_fatal("could not create connection for client %d",
+			 state[i].id);
+exit(1);
 			}
 		}
 	}
@@ 

Re: pgbench bug candidate: negative "initial connection time"

2021-09-06 Thread Fujii Masao




On 2021/07/29 13:23, Yugo NAGATA wrote:

Hello,

On Fri, 18 Jun 2021 15:58:48 +0200 (CEST)
Fabien COELHO  wrote:


Attached Yugo-san patch with some updates discussed in the previous mails,
so as to move things along.


I attached the patch rebased to a change due to 856de3b39cf.


+   pg_log_fatal("connection for initialization failed");
+   pg_log_fatal("setup connection failed");
+   pg_log_fatal("cannot create connection for client 
%d",

These fatal messages output when doConnect() fails should be a bit more 
consistent each other? For example,

could not create connection for initialization
could not create connection for setup
could not create connection for client %d

I'm not sure, but *if* "xxx failed" is more proper for pgbench, what about

connection for initialization failed
connection for setup failed
connection for client %d failed



Exit status 1 indicates static problems such as invalid command-line options.
Errors during the run such as database errors or problems in the script will
result in exit status 2.


While reading the code and docs related to the patch, I found
these descriptions in pgbench docs. The first description needs to be
updated? Because even database error (e.g., failure of connection for setup)
can result in exit status 1 if it happens before the benchmark actually runs.

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: pgbench bug candidate: negative "initial connection time"

2021-07-28 Thread Yugo NAGATA
Hello,

On Fri, 18 Jun 2021 15:58:48 +0200 (CEST)
Fabien COELHO  wrote:

> Attached Yugo-san patch with some updates discussed in the previous mails, 
> so as to move things along.

I attached the patch rebased to a change due to 856de3b39cf.

Regards,
Yugo Nagata



-- 
Yugo NAGATA 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 55d14604c0..e460dc7e5b 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3181,6 +3181,7 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 
 	if ((st->con = doConnect()) == NULL)
 	{
+		/* as the bench is already running, we do not abort the process */
 		pg_log_error("client %d aborted while establishing connection", st->id);
 		st->state = CSTATE_ABORTED;
 		break;
@@ -4418,7 +4419,10 @@ runInitSteps(const char *initialize_steps)
 	initPQExpBuffer();
 
 	if ((con = doConnect()) == NULL)
+	{
+		pg_log_fatal("connection for initialization failed");
 		exit(1);
+	}
 
 	setup_cancel_handler(NULL);
 	SetCancelConn(con);
@@ -6361,7 +6365,10 @@ main(int argc, char **argv)
 	/* opening connection... */
 	con = doConnect();
 	if (con == NULL)
+	{
+		pg_log_fatal("setup connection failed");
 		exit(1);
+	}
 
 	/* report pgbench and server versions */
 	printVersion(con);
@@ -6515,7 +6522,7 @@ main(int argc, char **argv)
 #endif			/* ENABLE_THREAD_SAFETY */
 
 		for (int j = 0; j < thread->nstate; j++)
-			if (thread->state[j].state == CSTATE_ABORTED)
+			if (thread->state[j].state != CSTATE_FINISHED)
 exit_code = 2;
 
 		/* aggregate thread level stats */
@@ -6583,7 +6590,7 @@ threadRun(void *arg)
 		if (thread->logfile == NULL)
 		{
 			pg_log_fatal("could not open logfile \"%s\": %m", logpath);
-			goto done;
+			exit(1);
 		}
 	}
 
@@ -6607,16 +6614,10 @@ threadRun(void *arg)
 		{
 			if ((state[i].con = doConnect()) == NULL)
 			{
-/*
- * On connection failure, we meet the barrier here in place of
- * GO before proceeding to the "done" path which will cleanup,
- * so as to avoid locking the process.
- *
- * It is unclear whether it is worth doing anything rather
- * than coldly exiting with an error message.
- */
-THREAD_BARRIER_WAIT();
-goto done;
+/* coldly abort on initial connection failure */
+pg_log_fatal("cannot create connection for client %d",
+			 state[i].id);
+exit(1);
 			}
 		}
 
@@ -6749,7 +6750,7 @@ threadRun(void *arg)
 	continue;
 }
 /* must be something wrong */
-pg_log_fatal("%s() failed: %m", SOCKET_WAIT_METHOD);
+pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD);
 goto done;
 			}
 		}


Re: pgbench bug candidate: negative "initial connection time"

2021-06-18 Thread Yugo NAGATA
Hello Horiguchi-san, Fabien,

On Fri, 18 Jun 2021 15:58:48 +0200 (CEST)
Fabien COELHO  wrote:

> 
> >>>/* must be something wrong */
> >>>pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD);
> >>>goto done;
> >> 
> >> Should say such like "thread %d aborted: %s() failed: ...".
> 
> After having a lookg, there are already plenty such cases. I'd say not to 
> change anything for beta, and think of it for the next round.

Agreed. Basically, I think the existing message should be left as is.

> >> 
> >>   errno = THREAD_BARRIER_INIT(, nthreads);
> >>   if (errno != 0)
> >> +  {
> >> pg_log_fatal("could not initialize barrier: %m");
> >> +exit(1);
> >> 
> >> This is a run-time error. Maybe we should return 2 in that case.
> 
> I think that you are right, but there are plenty such places where exit 
> should be 2 instead of 1 if the doc is followed:
> 
> """Errors during the run such as database errors or problems in the script 
> will result in exit status 2."""
> 
> My beta take is to let these as they are, i.e. pretty inconsistent all 
> over pgbench, and schedule a cleanup on the next round.

As same as the below Fabian's comment about thread->logfile, 

> >> ===
> >> if (thread->logfile == NULL)
> >> {
> >>   pg_log_fatal("could not open logfile \"%s\": %m", logpath);
> >> -  goto done;
> >> +  exit(1);
> >> 
> >> Maybe we should exit with 2 this case.
> >
> > Yep.
> 
> The bench is not even started, this is not really run time yet, 1 seems 
> ok. The failure may be due to a typo in the path which comes from the 
> user.

the bench is not started at THREAD_BARRIER_INIT, so I think exit(1) is ok.  

> 
> >> /* must be something wrong */
> >> -pg_log_fatal("%s() failed: %m", SOCKET_WAIT_METHOD);
> >> +pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD);
> >> goto done;
> >> 
> >> Why doesn't a fatal error cause an immediate exit?
> >
> > Good point. I do not know, but I would expect it to be the case, and AFAICR 
> > it does not.
> >
> >> (And if we change this to fatal, we also need to change similar errors in 
> >> the same function to fatal.)
> >
> > Possibly.
> 
> On second look, I think that error is fine, indeed we do not stop the 
> process, so "fatal" it is not;

I replaced this 'fatal' with 'error' because we are aborting the client
instead of exit(1). When pgbench was rewritten to use common logging API
by the commit 30a3e772b40, somehow pg_log_fatal was used, but I am
wondering it should have be pg_log_error.

> Attached Yugo-san patch with some updates discussed in the previous mails, 
> so as to move things along.

Thank you for update. I agree with this fix.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 




Re: pgbench bug candidate: negative "initial connection time"

2021-06-18 Thread Fabien COELHO



   /* must be something wrong */
   pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD);
   goto done;


Should say such like "thread %d aborted: %s() failed: ...".


After having a lookg, there are already plenty such cases. I'd say not to 
change anything for beta, and think of it for the next round.




  errno = THREAD_BARRIER_INIT(, nthreads);
  if (errno != 0)
+  {
pg_log_fatal("could not initialize barrier: %m");
+exit(1);

This is a run-time error. Maybe we should return 2 in that case.


I think that you are right, but there are plenty such places where exit 
should be 2 instead of 1 if the doc is followed:


"""Errors during the run such as database errors or problems in the script 
will result in exit status 2."""


My beta take is to let these as they are, i.e. pretty inconsistent all 
over pgbench, and schedule a cleanup on the next round.



===
if (thread->logfile == NULL)
{
  pg_log_fatal("could not open logfile \"%s\": %m", logpath);
-  goto done;
+  exit(1);

Maybe we should exit with 2 this case.


Yep.


The bench is not even started, this is not really run time yet, 1 seems 
ok. The failure may be due to a typo in the path which comes from the 
user.


If we exit this case, we might also want to exit when fclose() fails. 
(Currently the error of fclose() is ignored.)


Not sure. I'd let it at that for now.


I stand by this.


+/* coldly abort on connection failure */
+pg_log_fatal("cannot create connection for thread %d client %d",
+   thread->tid, i);
+exit(1);

It seems to me that the "thread %d client %d(not clinent id but the
client index within the thread)" doesn't make sense to users.  Even if
we showed a message like that, it should show only the global clinent
id (cstate->id).


This is not obvious to me. I think that we should be homogeneous with what is 
already done around.


ok for only giving the global client id.


I think that we should return with 2 here but we return with 1
in another place for the same reason..


Possibly.


Again for this one, the bench has not really started, so 1 seems fine.


/* must be something wrong */
-pg_log_fatal("%s() failed: %m", SOCKET_WAIT_METHOD);
+pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD);
goto done;

Why doesn't a fatal error cause an immediate exit?


Good point. I do not know, but I would expect it to be the case, and AFAICR 
it does not.


(And if we change this to fatal, we also need to change similar errors in 
the same function to fatal.)


Possibly.


On second look, I think that error is fine, indeed we do not stop the 
process, so "fatal" it is not;


Attached Yugo-san patch with some updates discussed in the previous mails, 
so as to move things along.


--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index d7479925cb..202507f5d5 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3171,6 +3171,7 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 
 	if ((st->con = doConnect()) == NULL)
 	{
+		/* as the bench is already running, we do not abort the process */
 		pg_log_error("client %d aborted while establishing connection", st->id);
 		st->state = CSTATE_ABORTED;
 		break;
@@ -4405,7 +4406,10 @@ runInitSteps(const char *initialize_steps)
 	initPQExpBuffer();
 
 	if ((con = doConnect()) == NULL)
+	{
+		pg_log_fatal("connection for initialization failed");
 		exit(1);
+	}
 
 	setup_cancel_handler(NULL);
 	SetCancelConn(con);
@@ -6332,7 +6336,10 @@ main(int argc, char **argv)
 	/* opening connection... */
 	con = doConnect();
 	if (con == NULL)
+	{
+		pg_log_fatal("setup connection failed");
 		exit(1);
+	}
 
 	pg_log_debug("pghost: %s pgport: %s nclients: %d %s: %d dbName: %s",
  PQhost(con), PQport(con), nclients,
@@ -6437,7 +6444,10 @@ main(int argc, char **argv)
 
 	errno = THREAD_BARRIER_INIT(, nthreads);
 	if (errno != 0)
+	{
 		pg_log_fatal("could not initialize barrier: %m");
+		exit(1);
+	}
 
 #ifdef ENABLE_THREAD_SAFETY
 	/* start all threads but thread 0 which is executed directly later */
@@ -6480,7 +6490,7 @@ main(int argc, char **argv)
 #endif			/* ENABLE_THREAD_SAFETY */
 
 		for (int j = 0; j < thread->nstate; j++)
-			if (thread->state[j].state == CSTATE_ABORTED)
+			if (thread->state[j].state != CSTATE_FINISHED)
 exit_code = 2;
 
 		/* aggregate thread level stats */
@@ -6548,7 +6558,7 @@ threadRun(void *arg)
 		if (thread->logfile == NULL)
 		{
 			pg_log_fatal("could not open logfile \"%s\": %m", logpath);
-			goto done;
+			exit(1);
 		}
 	}
 
@@ -6572,16 +6582,10 @@ threadRun(void *arg)
 		{
 			if ((state[i].con = doConnect()) == NULL)
 			{
-/*
- * On connection failure, we meet the barrier here in place of
- * GO before proceeding to the "done" path which will cleanup,
- * so as to avoid locking the process.
- *
- * It is unclear whether it is worth 

Re: pgbench bug candidate: negative "initial connection time"

2021-06-18 Thread Fabien COELHO



Hello,


Doing this means we regard any state other than CSTATE_FINISHED as
aborted. So, the current goto's to done in threadRun are effectively
aborting a part or the all clients running on the thread. So for
example the following place:

pgbench.c:6713

   /* must be something wrong */
   pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD);
   goto done;


Should say such like "thread %d aborted: %s() failed: ...".


Yep, possibly.


I'm not sure what is the consensus here about the case where aborted
client can recoonect to the same server.  This patch doesn't that.


Non trivial future work.

However, I think causing reconnection needs more work than accepted as 
a fix while beta.


It is an entire project which requires some thinking about.


+/* as the bench is already running, we do not abort */
pg_log_error("client %d aborted while establishing connection", 
st->id);
st->state = CSTATE_ABORTED;

The comment looks strange that it is saying "we don't abort" while
setting the state to CSTATE_ABORT then showing "client %d aborted".


Indeed. There is abort from the client, which just means that it stops 
sending transaction, and abort for the process, which is basically 
"exit(1)".




  if ((con = doConnect()) == NULL)
+  {
+pg_log_fatal("connection for initialization failed");
exit(1);

doConnect() prints an error emssage given from libpq.  So The
additional messaget is redundant.


This is not the same for me: doConnect may fail but we may decide to go 
retry the connection later, or just one client may be disconnected but 
others are going on, which is different from deciding to stop the whole 
program, which deserves a message on its own.




  errno = THREAD_BARRIER_INIT(, nthreads);
  if (errno != 0)
+  {
pg_log_fatal("could not initialize barrier: %m");
+exit(1);

This is a run-time error. Maybe we should return 2 in that case.


Hmmm. Yep.


===
if (thread->logfile == NULL)
{
  pg_log_fatal("could not open logfile \"%s\": %m", logpath);
-  goto done;
+  exit(1);

Maybe we should exit with 2 this case.


Yep.

If we exit this case, we might also want to exit when fclose() fails. 
(Currently the error of fclose() is ignored.)


Not sure. I'd let it at that for now.


===
+/* coldly abort on connection failure */
+pg_log_fatal("cannot create connection for thread %d client %d",
+   thread->tid, i);
+exit(1);

It seems to me that the "thread %d client %d(not clinent id but the
client index within the thread)" doesn't make sense to users.  Even if
we showed a message like that, it should show only the global clinent
id (cstate->id).


This is not obvious to me. I think that we should be homogeneous with what 
is already done around.



I think that we should return with 2 here but we return with 1
in another place for the same reason..


Possibly.


/* must be something wrong */
-pg_log_fatal("%s() failed: %m", SOCKET_WAIT_METHOD);
+pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD);
goto done;

Why doesn't a fatal error cause an immediate exit?


Good point. I do not know, but I would expect it to be the case, and 
AFAICR it does not.


(And if we change this to fatal, we also need to change similar errors 
in the same function to fatal.)


Possibly.

I'll look into it over the week-end.

Thanks for the feedback!

--
Fabien.




Re: pgbench bug candidate: negative "initial connection time"

2021-06-18 Thread Kyotaro Horiguchi
At Thu, 17 Jun 2021 11:52:04 +0200 (CEST), Fabien COELHO  
wrote in 
> > Ok. I gave up to change the state in threadRun. Instead, I changed the
> > condition at the end of bench, which enables to report abortion due to
> > socket errors.
> >
> > +@@ -6480,7 +6490,7 @@ main(int argc, char **argv)
> > + #endif/* 
> > ENABLE_THREAD_SAFETY */
> > +
> > +   for (int j = 0; j < thread->nstate; j++)
> > +-  if (thread->state[j].state == CSTATE_ABORTED)
> > ++  if (thread->state[j].state != CSTATE_FINISHED)
> > +   exit_code = 2;
> > +
> > +   /* aggregate thread level stats */
> >
> > Does this make sense?
> 
> Yes, definitely.

I sought for a simple way to enforce all client finishes with the
states abort or finished but I didn't find.  So +1 for the
change. However, as a matter of style. if we touch the code maybe we
want to enclose the if statement.

Doing this means we regard any state other than CSTATE_FINISHED as
aborted. So, the current goto's to done in threadRun are effectively
aborting a part or the all clients running on the thread. So for
example the following place:

pgbench.c:6713
>/* must be something wrong */
>pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD);
>goto done;

Should say such like "thread %d aborted: %s() failed: ...".


I'm not sure what is the consensus here about the case where aborted
client can recoonect to the same server.  This patch doesn't that.  However, I 
think causing reconnection needs more work than accepted as a fix while beta.




+/* as the bench is already running, we do not abort */
 pg_log_error("client %d aborted while establishing connection", 
st->id);
 st->state = CSTATE_ABORTED;

The comment looks strange that it is saying "we don't abort" while
setting the state to CSTATE_ABORT then showing "client %d aborted".



   if ((con = doConnect()) == NULL)
+  {
+pg_log_fatal("connection for initialization failed");
 exit(1);

doConnect() prints an error emssage given from libpq.  So The
additional messaget is redundant.



   errno = THREAD_BARRIER_INIT(, nthreads);
   if (errno != 0)
+  {
 pg_log_fatal("could not initialize barrier: %m");
+exit(1);

This is a run-time error. Maybe we should return 2 in that case.


===
 if (thread->logfile == NULL)
 {
   pg_log_fatal("could not open logfile \"%s\": %m", logpath);
-  goto done;
+  exit(1);

Maybe we should exit with 2 this case.  If we exit this case, we might
also want to exit when fclose() fails.  (Currently the error of
fclose() is ignored.)



===
+/* coldly abort on connection failure */
+pg_log_fatal("cannot create connection for thread %d client %d",
+   thread->tid, i);
+exit(1);

It seems to me that the "thread %d client %d(not clinent id but the
client index within the thread)" doesn't make sense to users.  Even if
we showed a message like that, it should show only the global clinent
id (cstate->id).

I think that we should return with 2 here but we return with 1
in another place for the same reason..


===
 /* must be something wrong */
-pg_log_fatal("%s() failed: %m", SOCKET_WAIT_METHOD);
+pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD);
 goto done;

Why doesn't a fatal error cause an immediate exit?  (And if we change
this to fatal, we also need to change similar errors in the same
function to fatal.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: pgbench bug candidate: negative "initial connection time"

2021-06-17 Thread Fabien COELHO




Second, currently the *only* function to change the client state is
advanceConnectionState, so it can be checked there and any bug is only
there. We had issues before when several functions where doing updates,
and it was a mess to understand what was going on. I really want that it
stays that way, so I disagree with setting the state to ABORTED from
threadRun. Moreover I do not see that it brings a feature, so ISTM that it
is not an actual issue not to do it?


Ok. I gave up to change the state in threadRun. Instead, I changed the
condition at the end of bench, which enables to report abortion due to
socket errors.

+@@ -6480,7 +6490,7 @@ main(int argc, char **argv)
+ #endif/* 
ENABLE_THREAD_SAFETY */
+
+   for (int j = 0; j < thread->nstate; j++)
+-  if (thread->state[j].state == CSTATE_ABORTED)
++  if (thread->state[j].state != CSTATE_FINISHED)
+   exit_code = 2;
+
+   /* aggregate thread level stats */

Does this make sense?


Yes, definitely.

--
Fabien.




Re: pgbench bug candidate: negative "initial connection time"

2021-06-17 Thread Yugo NAGATA
Hello Fabien,

On Thu, 17 Jun 2021 10:37:05 +0200 (CEST)
Fabien COELHO  wrote:

> > ).  Is this acceptable for you?
> 
> I disagree on two counts:
> 
> First, thread[0] should not appear.
> 
> Second, currently the *only* function to change the client state is 
> advanceConnectionState, so it can be checked there and any bug is only 
> there. We had issues before when several functions where doing updates, 
> and it was a mess to understand what was going on. I really want that it 
> stays that way, so I disagree with setting the state to ABORTED from 
> threadRun. Moreover I do not see that it brings a feature, so ISTM that it 
> is not an actual issue not to do it?

Ok. I gave up to change the state in threadRun. Instead, I changed the
condition at the end of bench, which enables to report abortion due to
socket errors.

+@@ -6480,7 +6490,7 @@ main(int argc, char **argv)
+ #endif/* 
ENABLE_THREAD_SAFETY */
+ 
+   for (int j = 0; j < thread->nstate; j++)
+-  if (thread->state[j].state == CSTATE_ABORTED)
++  if (thread->state[j].state != CSTATE_FINISHED)
+   exit_code = 2;
+ 
+   /* aggregate thread level stats */

Does this make sense?

-- 
Yugo NAGATA 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index d7479925cb..901f25aab7 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3171,6 +3171,7 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 
 	if ((st->con = doConnect()) == NULL)
 	{
+		/* as the bench is already running, we do not abort */
 		pg_log_error("client %d aborted while establishing connection", st->id);
 		st->state = CSTATE_ABORTED;
 		break;
@@ -4405,7 +4406,10 @@ runInitSteps(const char *initialize_steps)
 	initPQExpBuffer();
 
 	if ((con = doConnect()) == NULL)
+	{
+		pg_log_fatal("connection for initialization failed");
 		exit(1);
+	}
 
 	setup_cancel_handler(NULL);
 	SetCancelConn(con);
@@ -6332,7 +6336,10 @@ main(int argc, char **argv)
 	/* opening connection... */
 	con = doConnect();
 	if (con == NULL)
+	{
+		pg_log_fatal("setup connection failed");
 		exit(1);
+	}
 
 	pg_log_debug("pghost: %s pgport: %s nclients: %d %s: %d dbName: %s",
  PQhost(con), PQport(con), nclients,
@@ -6437,7 +6444,10 @@ main(int argc, char **argv)
 
 	errno = THREAD_BARRIER_INIT(, nthreads);
 	if (errno != 0)
+	{
 		pg_log_fatal("could not initialize barrier: %m");
+		exit(1);
+	}
 
 #ifdef ENABLE_THREAD_SAFETY
 	/* start all threads but thread 0 which is executed directly later */
@@ -6480,7 +6490,7 @@ main(int argc, char **argv)
 #endif			/* ENABLE_THREAD_SAFETY */
 
 		for (int j = 0; j < thread->nstate; j++)
-			if (thread->state[j].state == CSTATE_ABORTED)
+			if (thread->state[j].state != CSTATE_FINISHED)
 exit_code = 2;
 
 		/* aggregate thread level stats */
@@ -6548,7 +6558,7 @@ threadRun(void *arg)
 		if (thread->logfile == NULL)
 		{
 			pg_log_fatal("could not open logfile \"%s\": %m", logpath);
-			goto done;
+			exit(1);
 		}
 	}
 
@@ -6572,16 +6582,10 @@ threadRun(void *arg)
 		{
 			if ((state[i].con = doConnect()) == NULL)
 			{
-/*
- * On connection failure, we meet the barrier here in place of
- * GO before proceeding to the "done" path which will cleanup,
- * so as to avoid locking the process.
- *
- * It is unclear whether it is worth doing anything rather
- * than coldly exiting with an error message.
- */
-THREAD_BARRIER_WAIT();
-goto done;
+/* coldly abort on connection failure */
+pg_log_fatal("cannot create connection for thread %d client %d",
+			 thread->tid, i);
+exit(1);
 			}
 		}
 
@@ -6707,7 +6711,7 @@ threadRun(void *arg)
 	continue;
 }
 /* must be something wrong */
-pg_log_fatal("%s() failed: %m", SOCKET_WAIT_METHOD);
+pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD);
 goto done;
 			}
 		}


Re: pgbench bug candidate: negative "initial connection time"

2021-06-17 Thread Fabien COELHO



Hello Yugo-san,


By the way, the issue of inital connection erros reported in this thread
will be fixed by the patch attached in my previous post (a major part are
written by you :-)


That does not, on its own, ensure that it is bug free:-)


).  Is this acceptable for you?


I disagree on two counts:

First, thread[0] should not appear.

Second, currently the *only* function to change the client state is 
advanceConnectionState, so it can be checked there and any bug is only 
there. We had issues before when several functions where doing updates, 
and it was a mess to understand what was going on. I really want that it 
stays that way, so I disagree with setting the state to ABORTED from 
threadRun. Moreover I do not see that it brings a feature, so ISTM that it 
is not an actual issue not to do it?


--
Fabien.




Re: pgbench bug candidate: negative "initial connection time"

2021-06-16 Thread Yugo NAGATA
On Wed, 16 Jun 2021 20:25:31 +0200 (CEST)
Fabien COELHO  wrote:

> 
> Hello Yugo-san,
> 
> >> When connection break while the bench has already started,
> >> maybe it makes more sense to proceed,
> >
> > The result would be incomplete also in this case. However, the reason why
> > it is worth to proceed is that  such information is still useful for users,
> > or we don't want to waste the bench that has already started?
> 
> Hmmm. It depends on what the user is testing. If one is interested in 
> client resilience under errors, the bench should probably attempt a 
> reconnect. If one is interested in best performance when all is well,
> then clearly something is amiss and there is no point to go on.

Agreed. After --max-tries options is implemented on pgbench, we would be
able to add a new feature to allow users to choose if we try to reconnect
or not. However, we don't have it yet for now,  so we should just abort
the client and report the abortion at the end of the bench when a connection
or socket error occurs during the bench, as same the existing behaviour.

By the way, the issue of inital connection erros reported in this thread
will be fixed by the patch attached in my previous post (a major part are
written by you :-) ).  Is this acceptable for you?


Regards,
Yugo Nagata
 
-- 
Yugo NAGATA 




Re: pgbench bug candidate: negative "initial connection time"

2021-06-16 Thread Fabien COELHO



Hello Yugo-san,


When connection break while the bench has already started,
maybe it makes more sense to proceed,


The result would be incomplete also in this case. However, the reason why
it is worth to proceed is that  such information is still useful for users,
or we don't want to waste the bench that has already started?


Hmmm. It depends on what the user is testing. If one is interested in 
client resilience under errors, the bench should probably attempt a 
reconnect. If one is interested in best performance when all is well,

then clearly something is amiss and there is no point to go on.

although I guess that maybe reattempting connections would make also 
sense in such case.


This might become possible after pgbench gets the feature to retry in deadlock
or serialization errors.


Yes, I agree that part of the needed infrastructure would be in place for 
that. As reconnecting is already in place under -c, so possibly it is just 
a matter of switching between states with some care.


I am working on rebase of the patch [2] and I will submit this in a few 
days.


Ok. Very good, I look forward to your submission! I'll be sure to look at 
it.


--
Fabien.




Re: pgbench bug candidate: negative "initial connection time"

2021-06-16 Thread Yugo NAGATA
Hello Fabien,

On Mon, 14 Jun 2021 11:30:14 +0200 (CEST)
Fabien COELHO  wrote:

> >>> Hmmm. Possibly. Another option could be not to report anything after some
> >>> errors. I'm not sure, because it would depend on the use case. I guess the
> >>> command returned an error status as well.
> >>
> >> I did not know any use cases and decisions , but I vote to report nothing 
> >> when error occurs.
> >
> > I would prefer to abort the thread whose connection got an error and report
> > results for other threads, as handled when doConnect fails in 
> > CSTATE_START_TX
> > state.
> 
> It is unclear to me whether it makes much sense to report performance when 
> things go wrong. At least when a one connection per client bench is run 
> ISTM that it should not proceed, because the bench could not even start 
> as prescribe. 

I agreed that when an initial connections fails we cannot start a bench
in the condition that the user wants and that we should stop early to let
the user know it and check the conf. 

I attached a patch, which is a fusion of my previous patch that changes the
state to CSTATE_ABORT when the socket get failure during the bench, and a
part of your patch attached in [1] that exits for initial failures.

[1] 
https://www.postgresql.org/message-id/alpine.DEB.2.22.394.2106141011100.1338009%40pseudo

> When connection break while the bench has already started, 
> maybe it makes more sense to proceed, 

The result would be incomplete also in this case. However, the reason why
it is worth to proceed is that  such information is still useful for users,
or we don't want to waste the bench that has already started?

> although I guess that maybe 
> reattempting connections would make also sense in such case.

This might become possible after pgbench gets the feature to retry in deadlock
or serialization errors. I am working on rebase of the patch  [2] and I will
submit this in a few days.

[2] 
https://www.postgresql.org/message-id/20210524112910.444fbfdfbff747bd3b972...@sraoss.co.jp

-- 
Yugo NAGATA 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index d7479925cb..2d9080c444 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3171,6 +3171,7 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 
 	if ((st->con = doConnect()) == NULL)
 	{
+		/* as the bench is already running, we do not abort */
 		pg_log_error("client %d aborted while establishing connection", st->id);
 		st->state = CSTATE_ABORTED;
 		break;
@@ -4405,7 +4406,10 @@ runInitSteps(const char *initialize_steps)
 	initPQExpBuffer();
 
 	if ((con = doConnect()) == NULL)
+	{
+		pg_log_fatal("connection for initialization failed");
 		exit(1);
+	}
 
 	setup_cancel_handler(NULL);
 	SetCancelConn(con);
@@ -6332,7 +6336,10 @@ main(int argc, char **argv)
 	/* opening connection... */
 	con = doConnect();
 	if (con == NULL)
+	{
+		pg_log_fatal("setup connection failed");
 		exit(1);
+	}
 
 	pg_log_debug("pghost: %s pgport: %s nclients: %d %s: %d dbName: %s",
  PQhost(con), PQport(con), nclients,
@@ -6437,7 +6444,10 @@ main(int argc, char **argv)
 
 	errno = THREAD_BARRIER_INIT(, nthreads);
 	if (errno != 0)
+	{
 		pg_log_fatal("could not initialize barrier: %m");
+		exit(1);
+	}
 
 #ifdef ENABLE_THREAD_SAFETY
 	/* start all threads but thread 0 which is executed directly later */
@@ -6548,7 +6558,7 @@ threadRun(void *arg)
 		if (thread->logfile == NULL)
 		{
 			pg_log_fatal("could not open logfile \"%s\": %m", logpath);
-			goto done;
+			exit(1);
 		}
 	}
 
@@ -6572,16 +6582,10 @@ threadRun(void *arg)
 		{
 			if ((state[i].con = doConnect()) == NULL)
 			{
-/*
- * On connection failure, we meet the barrier here in place of
- * GO before proceeding to the "done" path which will cleanup,
- * so as to avoid locking the process.
- *
- * It is unclear whether it is worth doing anything rather
- * than coldly exiting with an error message.
- */
-THREAD_BARRIER_WAIT();
-goto done;
+/* coldly abort on connection failure */
+pg_log_fatal("cannot create connection for thread %d client %d",
+			 thread->tid, i);
+exit(1);
 			}
 		}
 
@@ -6647,6 +6651,7 @@ threadRun(void *arg)
 if (sock < 0)
 {
 	pg_log_error("invalid socket: %s", PQerrorMessage(st->con));
+	st->state = CSTATE_ABORTED;
 	goto done;
 }
 
@@ -6707,7 +6712,8 @@ threadRun(void *arg)
 	continue;
 }
 /* must be something wrong */
-pg_log_fatal("%s() failed: %m", SOCKET_WAIT_METHOD);
+pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD);
+state[0].state = CSTATE_ABORTED;
 goto done;
 			}
 		}
@@ -6733,6 +6739,7 @@ threadRun(void *arg)
 if (sock < 0)
 {
 	pg_log_error("invalid socket: %s", PQerrorMessage(st->con));
+	st->state = CSTATE_ABORTED;
 	goto done;
 }
 


Re: pgbench bug candidate: negative "initial connection time"

2021-06-14 Thread Fabien COELHO




Hmmm. Possibly. Another option could be not to report anything after some
errors. I'm not sure, because it would depend on the use case. I guess the
command returned an error status as well.


I did not know any use cases and decisions , but I vote to report nothing when 
error occurs.


I would prefer to abort the thread whose connection got an error and report
results for other threads, as handled when doConnect fails in CSTATE_START_TX
state.


It is unclear to me whether it makes much sense to report performance when 
things go wrong. At least when a one connection per client bench is run 
ISTM that it should not proceed, because the bench could not even start 
as prescribe. When connection break while the bench has already started, 
maybe it makes more sense to proceed, although I guess that maybe 
reattempting connections would make also sense in such case.



In this case, we have to set the state to CSTATE_ABORT before going to 'done'
as fixed in the attached patch, in order to ensure that exit status is 2 and the
result reports "pgbench: fatal: Run was aborted; the above results are 
incomplete."


Hmmm. I agree that at least reporting that there was an issue is a good 
idea.



Otherwise, if we want pgbench to exit immediately when a connection error 
occurs,
we have tocall exit(1) to ensure the exit code is 1, of course. Anyway, it is 
wrong
that thecurrent pgbench exit successfully with exit code 0 when doConnnect 
fails.


Indeed, I can only agree on this one.

--
Fabien.




Re: pgbench bug candidate: negative "initial connection time"

2021-06-14 Thread Yugo NAGATA
On Mon, 14 Jun 2021 00:42:12 +
"kuroda.hay...@fujitsu.com"  wrote:

> Dear Fabien,
> 
> Thank you for replying!
> 
> > Hmmm. Possibly. Another option could be not to report anything after some 
> > errors. I'm not sure, because it would depend on the use case. I guess the 
> > command returned an error status as well.
> 
> I did not know any use cases and decisions , but I vote to report nothing 
> when error occurs.

I would prefer to abort the thread whose connection got an error and report
results for other threads, as handled when doConnect fails in CSTATE_START_TX
state. 

In this case, we have to set the state to CSTATE_ABORT before going to 'done'
as fixed in the attached patch, in order to ensure that exit status is 2 and the
result reports "pgbench: fatal: Run was aborted; the above results are 
incomplete." 

Otherwise, if we want pgbench to exit immediately when a connection error 
occurs, 
we have tocall exit(1) to ensure the exit code is 1, of course. Anyway, it is 
wrong
that thecurrent pgbench exit successfully with exit code 0 when doConnnect 
fails.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index d7479925cb..35b67265c7 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -6581,6 +6581,7 @@ threadRun(void *arg)
  * than coldly exiting with an error message.
  */
 THREAD_BARRIER_WAIT();
+state[i].state = CSTATE_ABORTED;
 goto done;
 			}
 		}
@@ -6647,6 +6648,7 @@ threadRun(void *arg)
 if (sock < 0)
 {
 	pg_log_error("invalid socket: %s", PQerrorMessage(st->con));
+	st->state = CSTATE_ABORTED;
 	goto done;
 }
 
@@ -6708,6 +6710,7 @@ threadRun(void *arg)
 }
 /* must be something wrong */
 pg_log_fatal("%s() failed: %m", SOCKET_WAIT_METHOD);
+state[0].state = CSTATE_ABORTED;
 goto done;
 			}
 		}
@@ -6733,6 +6736,7 @@ threadRun(void *arg)
 if (sock < 0)
 {
 	pg_log_error("invalid socket: %s", PQerrorMessage(st->con));
+	st->state = CSTATE_ABORTED;
 	goto done;
 }
 


Re: pgbench bug candidate: negative "initial connection time"

2021-06-14 Thread Yugo NAGATA
Hello Kuroda-san,

On Fri, 11 Jun 2021 08:58:45 +
"kuroda.hay...@fujitsu.com"  wrote:

> Hi Hackers,
> 
> I played pgbench with wrong parameters, and I found bug-candidate.
> The latest commit in my source is 3a09d75.
> 
> 1. Do initdb and start.
> 2. Initialize schema and data with "scale factor" = 1.
> 3. execute following command many times:
> 
> $ pgbench -c 101 -j 10 postgres
> 
> Then, sometimes the negative " initial connection time" was returned.
> Lateyncy average is also strange.
> 
> ```
> $ pgbench -c 101 -j 10 postgres
> starting vacuum...end.
> pgbench: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: 
> FATAL:  sorry, too many clients already
> pgbench (PostgreSQL) 14.0
> transaction type: 
> scaling factor: 1
> query mode: simple
> number of clients: 101
> number of threads: 10
> number of transactions per client: 10
> number of transactions actually processed: 910/1010
> latency average = 41387686.662 ms
> initial connection time = -372896921.586 ms
> tps = 0.002440 (without initial connection time)
> ```
> 
> I sought pgbench.c and found a reason.
> When a thread failed to get some connections, they do not fill any values to 
> thread->bench_start in threadRun().
> And if the failure is caused in the final thread (this means threads[nthreads 
> - 1]->bench_start is zero),
> the following if-statement sets bench_start to zero.
> 
> ```
>6494 /* first recorded benchmarking start time */
>6495 if (bench_start == 0 || thread->bench_start < 
> bench_start)
>6496 bench_start = thread->bench_start;
> ```
> 
> The wrong bench_start propagates to printResult() and then some invalid 
> values are appered.
> 
> ```
>6509 printResults(, pg_time_now() - bench_start, 
> conn_total_duration,
>6510  bench_start - start_time, 
> latency_late);
> ```
> 
> I cannot distinguish whether we have to fix it, but I attache the patch.
> This simply ignores a result when therad->bench_start is zero.


 +  /* skip if the thread faild to get connection */
+   if (thread->bench_start == 0)
+   continue;

It detects if a thread failed to get the initial connection by 
thread->bench_start == 0, but this assumes the initial value is zero. For 
ensuring this, I think it is better to initialize it in an early state, for 
example like this.

@@ -6419,6 +6419,7 @@ main(int argc, char **argv)
initRandomState(>ts_throttle_rs);
initRandomState(>ts_sample_rs);
thread->logfile = NULL; /* filled in later */
+   thread->bench_start = 0;
thread->latency_late = 0;
initStats(>stats, 0)

typo: faild -> failed

Regards,
Yugo Nagata
-- 
Yugo NAGATA 




RE: pgbench bug candidate: negative "initial connection time"

2021-06-13 Thread kuroda.hay...@fujitsu.com
Dear Fabien,

Thank you for replying!

> Hmmm. Possibly. Another option could be not to report anything after some 
> errors. I'm not sure, because it would depend on the use case. I guess the 
> command returned an error status as well.

I did not know any use cases and decisions , but I vote to report nothing when 
error occurs.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED




Re: pgbench bug candidate: negative "initial connection time"

2021-06-11 Thread Fabien COELHO




Hello Hayato-san,


I played pgbench with wrong parameters,


That's good:-)


and I found bug-candidate.

1. Do initdb and start.
2. Initialize schema and data with "scale factor" = 1.
3. execute following command many times:

$ pgbench -c 101 -j 10 postgres

Then, sometimes the negative " initial connection time" was returned.
Lateyncy average is also strange.

```
$ pgbench -c 101 -j 10 postgres
starting vacuum...end.
pgbench: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: 
FATAL:  sorry, too many clients already


Hmmm.

AFAICR there was a decision to generate a report even if something went 
very wrong, in this case some client could not connect, so some values 
are not initialized, hence the absurd figures, as you show below.


Maybe we should revisit this decision.


initial connection time = -372896921.586 ms



I sought pgbench.c and found a reason.



When a thread failed to get some connections, they do not fill any values to 
thread->bench_start in threadRun().
And if the failure is caused in the final thread (this means threads[nthreads - 
1]->bench_start is zero),
the following if-statement sets bench_start to zero.



I cannot distinguish whether we have to fix it, but I attache the patch.
This simply ignores a result when therad->bench_start is zero.



How do you think?


Hmmm. Possibly. Another option could be not to report anything after some 
errors. I'm not sure, because it would depend on the use case. I guess the 
command returned an error status as well.


I'm going to give it some thoughts.

--
Fabien.




pgbench bug candidate: negative "initial connection time"

2021-06-11 Thread kuroda.hay...@fujitsu.com
Hi Hackers,

I played pgbench with wrong parameters, and I found bug-candidate.
The latest commit in my source is 3a09d75.

1. Do initdb and start.
2. Initialize schema and data with "scale factor" = 1.
3. execute following command many times:

$ pgbench -c 101 -j 10 postgres

Then, sometimes the negative " initial connection time" was returned.
Lateyncy average is also strange.

```
$ pgbench -c 101 -j 10 postgres
starting vacuum...end.
pgbench: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: 
FATAL:  sorry, too many clients already
pgbench (PostgreSQL) 14.0
transaction type: 
scaling factor: 1
query mode: simple
number of clients: 101
number of threads: 10
number of transactions per client: 10
number of transactions actually processed: 910/1010
latency average = 41387686.662 ms
initial connection time = -372896921.586 ms
tps = 0.002440 (without initial connection time)
```

I sought pgbench.c and found a reason.
When a thread failed to get some connections, they do not fill any values to 
thread->bench_start in threadRun().
And if the failure is caused in the final thread (this means threads[nthreads - 
1]->bench_start is zero),
the following if-statement sets bench_start to zero.

```
   6494 /* first recorded benchmarking start time */
   6495 if (bench_start == 0 || thread->bench_start < 
bench_start)
   6496 bench_start = thread->bench_start;
```

The wrong bench_start propagates to printResult() and then some invalid values 
are appered.

```
   6509 printResults(, pg_time_now() - bench_start, 
conn_total_duration,
   6510  bench_start - start_time, 
latency_late);
```

I cannot distinguish whether we have to fix it, but I attache the patch.
This simply ignores a result when therad->bench_start is zero.

How do you think?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



ignore_failed_thread.patch
Description: ignore_failed_thread.patch