Re: Avoid stuck of pbgench due to skipped transactions

2021-09-09 Thread Fujii Masao




On 2021/09/08 23:40, Fujii Masao wrote:

Agreed. Since it's hard to imagine the issue happens in practice,
we don't need to bother back-patch to the stable branches.
So I'm thinking to commit the patch to 15dev and 14.


Pushed. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Avoid stuck of pbgench due to skipped transactions

2021-09-08 Thread Fujii Masao



On 2021/09/07 18:24, Fabien COELHO wrote:


Hello Fujii-san,


Stop counting skipped transactions under -T as soon as the timer is exceeded. 
Because otherwise it can take a very long time to count all of them especially 
when quite a lot of them happen with unrealistically high rate setting in -R, 
which would prevent pgbench from ending immediately. Because of this behavior, 
note that there is no guarantee that all skipped transactions are counted under 
-T though there is under -t. This is OK in practice because it's very unlikely 
to happen with realistic setting.


Ok, I find this text quite clear.


Thanks for the check! So attached is the updated version of the patch.



One question is; which version do we want to back-patch to?


If we consider it a "very minor bug fix" which is triggered by somehow unrealistic 
options, so I'd say 14 & dev, or possibly only dev.


Agreed. Since it's hard to imagine the issue happens in practice,
we don't need to bother back-patch to the stable branches.
So I'm thinking to commit the patch to 15dev and 14.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 4c9952a85a..433abd954b 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3233,31 +3233,36 @@ advanceConnectionState(TState *thread, CState *st, 
StatsData *agg)
/*
 * If --latency-limit is used, and this slot is 
already late
 * so that the transaction will miss the 
latency limit even if
-* it completed immediately, skip this time 
slot and schedule
-* to continue running on the next slot that 
isn't late yet.
-* But don't iterate beyond the -t limit, if 
one is given.
+* it completed immediately, skip this time 
slot and loop to
+* reschedule.
 */
if (latency_limit)
{
pg_time_now_lazy();
 
-   while (thread->throttle_trigger < now - 
latency_limit &&
-  (nxacts <= 0 || st->cnt < 
nxacts))
+   if (thread->throttle_trigger < now - 
latency_limit)
{
processXactStats(thread, st, 
, true, agg);
-   /* next rendez-vous */
-   thread->throttle_trigger +=
-   
getPoissonRand(>ts_throttle_rs, throttle_delay);
-   st->txn_scheduled = 
thread->throttle_trigger;
-   }
 
-   /*
-* stop client if -t was exceeded in 
the previous skip
-* loop
-*/
-   if (nxacts > 0 && st->cnt >= nxacts)
-   {
-   st->state = CSTATE_FINISHED;
+   /*
+* Finish client if -T or -t 
was exceeded.
+*
+* Stop counting skipped 
transactions under -T as soon
+* as the timer is exceeded. 
Because otherwise it can
+* take a very long time to 
count all of them
+* especially when quite a lot 
of them happen with
+* unrealistically high rate 
setting in -R, which
+* would prevent pgbench from 
ending immediately.
+* Because of this behavior, 
note that there is no
+* guarantee that all skipped 
transactions are counted
+* under -T though there is 
under -t. This is OK in
+* practice because it's very 
unlikely to happen with
+* realistic setting.
+*/
+   if (timer_exceeded || (nxacts > 
0 && st->cnt >= nxacts))
+   

Re: Avoid stuck of pbgench due to skipped transactions

2021-09-07 Thread Fabien COELHO



Hello Fujii-san,

Stop counting skipped transactions under -T as soon as the timer is 
exceeded. Because otherwise it can take a very long time to count all of 
them especially when quite a lot of them happen with unrealistically 
high rate setting in -R, which would prevent pgbench from ending 
immediately. Because of this behavior, note that there is no guarantee 
that all skipped transactions are counted under -T though there is under 
-t. This is OK in practice because it's very unlikely to happen with 
realistic setting.


Ok, I find this text quite clear.


One question is; which version do we want to back-patch to?


If we consider it a "very minor bug fix" which is triggered by somehow 
unrealistic options, so I'd say 14 & dev, or possibly only dev.


--
Fabien.




Re: Avoid stuck of pbgench due to skipped transactions

2021-09-06 Thread Fujii Masao




On 2021/09/04 15:27, Fabien COELHO wrote:


Hello Fujii-san,


ISTM that the patch changes pgbench so that it can skip counting
some skipped transactions here even for realistic rates under -T.
Of course, which would happen very rarely. Is this understanding right?


Yes. The point is to get out of the scheduling loop when time has expired, as 
soon it is known, instead of looping there for some possibly long time.


Thanks for checking my understanding!

+* For very unrealistic 
rates under -T, some skipped
+* transactions are not 
counted because the catchup
+* loop is not fast 
enough just to do the scheduling
+* and counting at the 
expected speed.
+*
+* We do not bother 
with such a degenerate case.

So this comment is a bit misleading? What about updating this as follows?

--
Stop counting skipped transactions under -T as soon as the timer is exceeded.
Because otherwise it can take a very long time to count all of them especially
when quite a lot of them happen with unrealistically high rate setting in -R,
which would prevent pgbench from ending immediately. Because of this behavior,
note that there is no guarantee that all skipped transactions are counted
under -T though there is under -t. This is OK in practice because it's very
unlikely to happen with realistic setting.
--



So that behavior change by the patch would be acceptable. Is this understanding 
right?


I think so.


+1

One question is; which version do we want to back-patch to?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Avoid stuck of pbgench due to skipped transactions

2021-09-04 Thread Fabien COELHO



Hello Fujii-san,


ISTM that the patch changes pgbench so that it can skip counting
some skipped transactions here even for realistic rates under -T.
Of course, which would happen very rarely. Is this understanding right?


Yes. The point is to get out of the scheduling loop when time has expired, 
as soon it is known, instead of looping there for some possibly long time.



On the other hand, even without the patch, in the first place, there seems
no guarantee that all the skipped transactions are counted under -T.
When the timer is exceeded in CSTATE_END_TX, a client ends without
checking outstanding skipped transactions.


Indeed. But that should be very few transactions under latency limit.

Therefore the "issue" that some skipped transactions are not counted is 
not one the patch newly introdues.


Yep. The patch counts less of them though, because of the early exit 
introduced in the patch in the scheduling state. Before it could be stuck 
in the "while (late) { count; schedule; }" loop.


So that behavior change by the patch would be acceptable. Is this 
understanding right?


I think so.

--
Fabien.




Re: Avoid stuck of pbgench due to skipped transactions

2021-09-03 Thread Fujii Masao




On 2021/06/17 1:23, Yugo NAGATA wrote:

I attached the v2 patch to clarify that I withdrew the v3 patch.


Thanks for the patch!

+* For very unrealistic 
rates under -T, some skipped
+* transactions are not 
counted because the catchup
+* loop is not fast 
enough just to do the scheduling
+* and counting at the 
expected speed.
+*
+* We do not bother 
with such a degenerate case.
+*/

ISTM that the patch changes pgbench so that it can skip counting
some skipped transactions here even for realistic rates under -T.
Of course, which would happen very rarely. Is this understanding right?

On the other hand, even without the patch, in the first place, there seems
no guarantee that all the skipped transactions are counted under -T.
When the timer is exceeded in CSTATE_END_TX, a client ends without
checking outstanding skipped transactions. Therefore the "issue" that
some skipped transactions are not counted is not one the patch newly introdues.
So that behavior change by the patch would be acceptable.
Is this understanding right?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Avoid stuck of pbgench due to skipped transactions

2021-08-12 Thread Yugo NAGATA
On Tue, 10 Aug 2021 10:50:20 -0400
Greg Sabino Mullane  wrote:

> Apologies, just saw this. I found no problems, those "failures" were just
> me missing checkboxes on the commitfest interface. +1 on the patch.

Thank you!


-- 
Yugo NAGATA 




Re: Avoid stuck of pbgench due to skipped transactions

2021-08-10 Thread Greg Sabino Mullane
Apologies, just saw this. I found no problems, those "failures" were just
me missing checkboxes on the commitfest interface. +1 on the patch.

Cheers,
Greg


Re: Avoid stuck of pbgench due to skipped transactions

2021-06-22 Thread Yugo NAGATA
Hello Greg,

On Tue, 22 Jun 2021 19:22:38 +
Greg Sabino Mullane  wrote:

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, failed
> Implements feature:   tested, failed
> Spec compliant:   not tested
> Documentation:not tested
> 
> Looks fine to me, as a way of catching this edge case.

Thank you for looking into this!

'make installcheck-world' and 'Implements feature' are marked "failed",
but did you find any problem on this patch?

-- 
Yugo NAGATA 




Re: Avoid stuck of pbgench due to skipped transactions

2021-06-22 Thread Greg Sabino Mullane
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, failed
Spec compliant:   not tested
Documentation:not tested

Looks fine to me, as a way of catching this edge case.

Re: Avoid stuck of pbgench due to skipped transactions

2021-06-16 Thread Yugo NAGATA
On Mon, 14 Jun 2021 16:06:10 +0900
Yugo NAGATA  wrote:

> On Mon, 14 Jun 2021 08:47:40 +0200 (CEST)
> Fabien COELHO  wrote:

> > > I attached the fixed patch that uses return instead of break, and I 
> > > confirmed
> > > that this made the progress reporting work property.
> > 
> > I'm hesitating to do such a strictural change for a degenerate case linked 
> > to "insane" parameters, as pg is unlikely to reach 100 million tps, ever.
> > It seems to me enough that the command is not blocked in such cases.
> 
> Sure. The change from "break" to "return" is just for making the progress
> reporting work in the loop, as you mentioned. However,  my original intention
> is avoiding stuck in a corner-case where a unrealistic parameter was used, and
> I agree with you that this change is not so necessary for handling such a
> special situation. 

I attached the v2 patch to clarify that I withdrew the v3 patch.

Regards
Yugo Nagata

-- 
Yugo NAGATA 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index d7479925cb..fe75533e3e 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3223,31 +3223,30 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 /*
  * If --latency-limit is used, and this slot is already late
  * so that the transaction will miss the latency limit even if
- * it completed immediately, skip this time slot and schedule
- * to continue running on the next slot that isn't late yet.
- * But don't iterate beyond the -t limit, if one is given.
+ * it completed immediately, skip this time slot and loop to
+ * reschedule.
  */
 if (latency_limit)
 {
 	pg_time_now_lazy();
 
-	while (thread->throttle_trigger < now - latency_limit &&
-		   (nxacts <= 0 || st->cnt < nxacts))
+	if (thread->throttle_trigger < now - latency_limit)
 	{
 		processXactStats(thread, st, , true, agg);
-		/* next rendez-vous */
-		thread->throttle_trigger +=
-			getPoissonRand(>ts_throttle_rs, throttle_delay);
-		st->txn_scheduled = thread->throttle_trigger;
-	}
 
-	/*
-	 * stop client if -t was exceeded in the previous skip
-	 * loop
-	 */
-	if (nxacts > 0 && st->cnt >= nxacts)
-	{
-		st->state = CSTATE_FINISHED;
+		/* stop client if -T/-t was exceeded. */
+		if (timer_exceeded || (nxacts > 0 && st->cnt >= nxacts))
+			/*
+			 * For very unrealistic rates under -T, some skipped
+			 * transactions are not counted because the catchup
+			 * loop is not fast enough just to do the scheduling
+			 * and counting at the expected speed.
+			 *
+			 * We do not bother with such a degenerate case.
+			 */
+			st->state = CSTATE_FINISHED;
+
+		/* otherwise loop over PREPARE_THROTTLE */
 		break;
 	}
 }


Re: Avoid stuck of pbgench due to skipped transactions

2021-06-14 Thread Yugo NAGATA
On Mon, 14 Jun 2021 08:47:40 +0200 (CEST)
Fabien COELHO  wrote:

> 
> >>> I attached a patch for this fix.
> >>
> >> The patch mostly works for me, and I agree that the bench should not be in
> >> a loop on any parameters, even when "crazy" parameters are given…
> >>
> >> However I'm not sure this is the right way to handle this issue.
> >>
> >> The catch-up loop can be dropped and the automaton can loop over itself to
> >> reschedule. Doing that as the attached fixes this issue and also makes
> >> progress reporting work proprely in more cases, and reduces the number of
> >> lines of code. I did not add a test case because time sensitive tests have
> >> been removed (which is too bad, IMHO).
> >
> > I agree with your way to fix. However, the progress reporting didn't work
> > because we cannot return from advanceConnectionState to threadRun and just
> > break the loop.
> >
> > +   /* otherwise loop over 
> > PREPARE_THROTTLE */
> > break;
> >
> > I attached the fixed patch that uses return instead of break, and I 
> > confirmed
> > that this made the progress reporting work property.
> 
> I'm hesitating to do such a strictural change for a degenerate case linked 
> to "insane" parameters, as pg is unlikely to reach 100 million tps, ever.
> It seems to me enough that the command is not blocked in such cases.

Sure. The change from "break" to "return" is just for making the progress
reporting work in the loop, as you mentioned. However,  my original intention
is avoiding stuck in a corner-case where a unrealistic parameter was used, and
I agree with you that this change is not so necessary for handling such a
special situation. 

Regards,
Yugo Nagata

-- 
Yugo NAGATA 




Re: Avoid stuck of pbgench due to skipped transactions

2021-06-14 Thread Fabien COELHO



I attached a patch for this fix.


The patch mostly works for me, and I agree that the bench should not be in
a loop on any parameters, even when "crazy" parameters are given…

However I'm not sure this is the right way to handle this issue.

The catch-up loop can be dropped and the automaton can loop over itself to
reschedule. Doing that as the attached fixes this issue and also makes
progress reporting work proprely in more cases, and reduces the number of
lines of code. I did not add a test case because time sensitive tests have
been removed (which is too bad, IMHO).


I agree with your way to fix. However, the progress reporting didn't work
because we cannot return from advanceConnectionState to threadRun and just
break the loop.

+   /* otherwise loop over 
PREPARE_THROTTLE */
break;

I attached the fixed patch that uses return instead of break, and I confirmed
that this made the progress reporting work property.


I'm hesitating to do such a strictural change for a degenerate case linked 
to "insane" parameters, as pg is unlikely to reach 100 million tps, ever.

It seems to me enough that the command is not blocked in such cases.

--
Fabien.

Re: Avoid stuck of pbgench due to skipped transactions

2021-06-13 Thread Yugo NAGATA
Hello Fabien,

On Sun, 13 Jun 2021 08:56:59 +0200 (CEST)
Fabien COELHO  wrote:

> > I attached a patch for this fix.
> 
> The patch mostly works for me, and I agree that the bench should not be in 
> a loop on any parameters, even when "crazy" parameters are given…
> 
> However I'm not sure this is the right way to handle this issue.
> 
> The catch-up loop can be dropped and the automaton can loop over itself to 
> reschedule. Doing that as the attached fixes this issue and also makes 
> progress reporting work proprely in more cases, and reduces the number of 
> lines of code. I did not add a test case because time sensitive tests have 
> been removed (which is too bad, IMHO).

I agree with your way to fix. However, the progress reporting didn't work
because we cannot return from advanceConnectionState to threadRun and just
break the loop.

+   /* otherwise loop over 
PREPARE_THROTTLE */
break;

I attached the fixed patch that uses return instead of break, and I confirmed
that this made the progress reporting work property.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index dc84b7b9b7..8f5d000938 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3223,32 +3223,31 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 /*
  * If --latency-limit is used, and this slot is already late
  * so that the transaction will miss the latency limit even if
- * it completed immediately, skip this time slot and schedule
- * to continue running on the next slot that isn't late yet.
- * But don't iterate beyond the -t limit, if one is given.
+ * it completed immediately, skip this time slot and loop to
+ * reschedule.
  */
 if (latency_limit)
 {
 	pg_time_now_lazy();
 
-	while (thread->throttle_trigger < now - latency_limit &&
-		   (nxacts <= 0 || st->cnt < nxacts))
+	if (thread->throttle_trigger < now - latency_limit)
 	{
 		processXactStats(thread, st, , true, agg);
-		/* next rendez-vous */
-		thread->throttle_trigger +=
-			getPoissonRand(>ts_throttle_rs, throttle_delay);
-		st->txn_scheduled = thread->throttle_trigger;
-	}
 
-	/*
-	 * stop client if -t was exceeded in the previous skip
-	 * loop
-	 */
-	if (nxacts > 0 && st->cnt >= nxacts)
-	{
-		st->state = CSTATE_FINISHED;
-		break;
+		/* stop client if -T/-t was exceeded. */
+		if (timer_exceeded || (nxacts > 0 && st->cnt >= nxacts))
+			/*
+			 * For very unrealistic rates under -T, some skipped
+			 * transactions are not counted because the catchup
+			 * loop is not fast enough just to do the scheduling
+			 * and counting at the expected speed.
+			 *
+			 * We do not bother with such a degenerate case.
+			 */
+			st->state = CSTATE_FINISHED;
+
+		/*otherwise loop over PREPARE_THROTTLE */
+		return;
 	}
 }
 


Re: Avoid stuck of pbgench due to skipped transactions

2021-06-13 Thread Fabien COELHO


Hello Yugo-san,


For example, when I usee a large rate (-R) for throttling and a
small latency limit (-L) values with a duration (-T), pbbench
got stuck.

$ pgbench -T 5 -R 1 -L 1;


Indeed, it does not get out of the catchup loop for a long time because 
even scheduling takes more time than the expected transaction time!



I think it is better to check the timer expiration even in the loop
of transaction skips and to finish pgbnech successfully because we
should correcly repport how many transactions are proccessed and
skipped also in this case, and getting stuck would not be good
anyway.

I attached a patch for this fix.


The patch mostly works for me, and I agree that the bench should not be in 
a loop on any parameters, even when "crazy" parameters are given…


However I'm not sure this is the right way to handle this issue.

The catch-up loop can be dropped and the automaton can loop over itself to 
reschedule. Doing that as the attached fixes this issue and also makes 
progress reporting work proprely in more cases, and reduces the number of 
lines of code. I did not add a test case because time sensitive tests have 
been removed (which is too bad, IMHO).


--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index d7479925cb..fe75533e3e 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3223,31 +3223,30 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 /*
  * If --latency-limit is used, and this slot is already late
  * so that the transaction will miss the latency limit even if
- * it completed immediately, skip this time slot and schedule
- * to continue running on the next slot that isn't late yet.
- * But don't iterate beyond the -t limit, if one is given.
+ * it completed immediately, skip this time slot and loop to
+ * reschedule.
  */
 if (latency_limit)
 {
 	pg_time_now_lazy();
 
-	while (thread->throttle_trigger < now - latency_limit &&
-		   (nxacts <= 0 || st->cnt < nxacts))
+	if (thread->throttle_trigger < now - latency_limit)
 	{
 		processXactStats(thread, st, , true, agg);
-		/* next rendez-vous */
-		thread->throttle_trigger +=
-			getPoissonRand(>ts_throttle_rs, throttle_delay);
-		st->txn_scheduled = thread->throttle_trigger;
-	}
 
-	/*
-	 * stop client if -t was exceeded in the previous skip
-	 * loop
-	 */
-	if (nxacts > 0 && st->cnt >= nxacts)
-	{
-		st->state = CSTATE_FINISHED;
+		/* stop client if -T/-t was exceeded. */
+		if (timer_exceeded || (nxacts > 0 && st->cnt >= nxacts))
+			/*
+			 * For very unrealistic rates under -T, some skipped
+			 * transactions are not counted because the catchup
+			 * loop is not fast enough just to do the scheduling
+			 * and counting at the expected speed.
+			 *
+			 * We do not bother with such a degenerate case.
+			 */
+			st->state = CSTATE_FINISHED;
+
+		/* otherwise loop over PREPARE_THROTTLE */
 		break;
 	}
 }


Avoid stuck of pbgench due to skipped transactions

2021-06-12 Thread Yugo NAGATA
Hi,

I found that pgbench could get stuck when every transaction
come to be skipped and the number of transaction is not limitted
by -t option.

For example, when I usee a large rate (-R) for throttling and a
small latency limit (-L) values with a duration (-T), pbbench
got stuck.

 $ pgbench -T 5 -R 1 -L 1;

When we specify the number of transactions by -t, it doesn't get
stuck because the number of skipped transactions are counted and
checked during the loop. However, the timer expiration is not
checked in the loop although it is checked before and after a
sleep for throttling. 

I think it is better to check the timer expiration even in the loop
of transaction skips and to finish pgbnech successfully because we
should correcly repport how many transactions are proccessed and
skipped also in this case, and getting stuck would not be good
anyway.

I attached a patch for this fix.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index dc84b7b9b7..1aa3e6b7be 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3232,7 +3232,8 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 	pg_time_now_lazy();
 
 	while (thread->throttle_trigger < now - latency_limit &&
-		   (nxacts <= 0 || st->cnt < nxacts))
+		   (nxacts <= 0 || st->cnt < nxacts) &&
+		   !timer_exceeded)
 	{
 		processXactStats(thread, st, , true, agg);
 		/* next rendez-vous */