Re: Avoid stuck of pbgench due to skipped transactions
On 2021/09/08 23:40, Fujii Masao wrote: Agreed. Since it's hard to imagine the issue happens in practice, we don't need to bother back-patch to the stable branches. So I'm thinking to commit the patch to 15dev and 14. Pushed. Thanks! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Avoid stuck of pbgench due to skipped transactions
On 2021/09/07 18:24, Fabien COELHO wrote: Hello Fujii-san, Stop counting skipped transactions under -T as soon as the timer is exceeded. Because otherwise it can take a very long time to count all of them especially when quite a lot of them happen with unrealistically high rate setting in -R, which would prevent pgbench from ending immediately. Because of this behavior, note that there is no guarantee that all skipped transactions are counted under -T though there is under -t. This is OK in practice because it's very unlikely to happen with realistic setting. Ok, I find this text quite clear. Thanks for the check! So attached is the updated version of the patch. One question is; which version do we want to back-patch to? If we consider it a "very minor bug fix" which is triggered by somehow unrealistic options, so I'd say 14 & dev, or possibly only dev. Agreed. Since it's hard to imagine the issue happens in practice, we don't need to bother back-patch to the stable branches. So I'm thinking to commit the patch to 15dev and 14. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index 4c9952a85a..433abd954b 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -3233,31 +3233,36 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg) /* * If --latency-limit is used, and this slot is already late * so that the transaction will miss the latency limit even if -* it completed immediately, skip this time slot and schedule -* to continue running on the next slot that isn't late yet. -* But don't iterate beyond the -t limit, if one is given. +* it completed immediately, skip this time slot and loop to +* reschedule. */ if (latency_limit) { pg_time_now_lazy(); - while (thread->throttle_trigger < now - latency_limit && - (nxacts <= 0 || st->cnt < nxacts)) + if (thread->throttle_trigger < now - latency_limit) { processXactStats(thread, st, , true, agg); - /* next rendez-vous */ - thread->throttle_trigger += - getPoissonRand(>ts_throttle_rs, throttle_delay); - st->txn_scheduled = thread->throttle_trigger; - } - /* -* stop client if -t was exceeded in the previous skip -* loop -*/ - if (nxacts > 0 && st->cnt >= nxacts) - { - st->state = CSTATE_FINISHED; + /* +* Finish client if -T or -t was exceeded. +* +* Stop counting skipped transactions under -T as soon +* as the timer is exceeded. Because otherwise it can +* take a very long time to count all of them +* especially when quite a lot of them happen with +* unrealistically high rate setting in -R, which +* would prevent pgbench from ending immediately. +* Because of this behavior, note that there is no +* guarantee that all skipped transactions are counted +* under -T though there is under -t. This is OK in +* practice because it's very unlikely to happen with +* realistic setting. +*/ + if (timer_exceeded || (nxacts > 0 && st->cnt >= nxacts)) +
Re: Avoid stuck of pbgench due to skipped transactions
Hello Fujii-san, Stop counting skipped transactions under -T as soon as the timer is exceeded. Because otherwise it can take a very long time to count all of them especially when quite a lot of them happen with unrealistically high rate setting in -R, which would prevent pgbench from ending immediately. Because of this behavior, note that there is no guarantee that all skipped transactions are counted under -T though there is under -t. This is OK in practice because it's very unlikely to happen with realistic setting. Ok, I find this text quite clear. One question is; which version do we want to back-patch to? If we consider it a "very minor bug fix" which is triggered by somehow unrealistic options, so I'd say 14 & dev, or possibly only dev. -- Fabien.
Re: Avoid stuck of pbgench due to skipped transactions
On 2021/09/04 15:27, Fabien COELHO wrote: Hello Fujii-san, ISTM that the patch changes pgbench so that it can skip counting some skipped transactions here even for realistic rates under -T. Of course, which would happen very rarely. Is this understanding right? Yes. The point is to get out of the scheduling loop when time has expired, as soon it is known, instead of looping there for some possibly long time. Thanks for checking my understanding! +* For very unrealistic rates under -T, some skipped +* transactions are not counted because the catchup +* loop is not fast enough just to do the scheduling +* and counting at the expected speed. +* +* We do not bother with such a degenerate case. So this comment is a bit misleading? What about updating this as follows? -- Stop counting skipped transactions under -T as soon as the timer is exceeded. Because otherwise it can take a very long time to count all of them especially when quite a lot of them happen with unrealistically high rate setting in -R, which would prevent pgbench from ending immediately. Because of this behavior, note that there is no guarantee that all skipped transactions are counted under -T though there is under -t. This is OK in practice because it's very unlikely to happen with realistic setting. -- So that behavior change by the patch would be acceptable. Is this understanding right? I think so. +1 One question is; which version do we want to back-patch to? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Avoid stuck of pbgench due to skipped transactions
Hello Fujii-san, ISTM that the patch changes pgbench so that it can skip counting some skipped transactions here even for realistic rates under -T. Of course, which would happen very rarely. Is this understanding right? Yes. The point is to get out of the scheduling loop when time has expired, as soon it is known, instead of looping there for some possibly long time. On the other hand, even without the patch, in the first place, there seems no guarantee that all the skipped transactions are counted under -T. When the timer is exceeded in CSTATE_END_TX, a client ends without checking outstanding skipped transactions. Indeed. But that should be very few transactions under latency limit. Therefore the "issue" that some skipped transactions are not counted is not one the patch newly introdues. Yep. The patch counts less of them though, because of the early exit introduced in the patch in the scheduling state. Before it could be stuck in the "while (late) { count; schedule; }" loop. So that behavior change by the patch would be acceptable. Is this understanding right? I think so. -- Fabien.
Re: Avoid stuck of pbgench due to skipped transactions
On 2021/06/17 1:23, Yugo NAGATA wrote: I attached the v2 patch to clarify that I withdrew the v3 patch. Thanks for the patch! +* For very unrealistic rates under -T, some skipped +* transactions are not counted because the catchup +* loop is not fast enough just to do the scheduling +* and counting at the expected speed. +* +* We do not bother with such a degenerate case. +*/ ISTM that the patch changes pgbench so that it can skip counting some skipped transactions here even for realistic rates under -T. Of course, which would happen very rarely. Is this understanding right? On the other hand, even without the patch, in the first place, there seems no guarantee that all the skipped transactions are counted under -T. When the timer is exceeded in CSTATE_END_TX, a client ends without checking outstanding skipped transactions. Therefore the "issue" that some skipped transactions are not counted is not one the patch newly introdues. So that behavior change by the patch would be acceptable. Is this understanding right? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Avoid stuck of pbgench due to skipped transactions
On Tue, 10 Aug 2021 10:50:20 -0400 Greg Sabino Mullane wrote: > Apologies, just saw this. I found no problems, those "failures" were just > me missing checkboxes on the commitfest interface. +1 on the patch. Thank you! -- Yugo NAGATA
Re: Avoid stuck of pbgench due to skipped transactions
Apologies, just saw this. I found no problems, those "failures" were just me missing checkboxes on the commitfest interface. +1 on the patch. Cheers, Greg
Re: Avoid stuck of pbgench due to skipped transactions
Hello Greg, On Tue, 22 Jun 2021 19:22:38 + Greg Sabino Mullane wrote: > The following review has been posted through the commitfest application: > make installcheck-world: tested, failed > Implements feature: tested, failed > Spec compliant: not tested > Documentation:not tested > > Looks fine to me, as a way of catching this edge case. Thank you for looking into this! 'make installcheck-world' and 'Implements feature' are marked "failed", but did you find any problem on this patch? -- Yugo NAGATA
Re: Avoid stuck of pbgench due to skipped transactions
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, failed Spec compliant: not tested Documentation:not tested Looks fine to me, as a way of catching this edge case.
Re: Avoid stuck of pbgench due to skipped transactions
On Mon, 14 Jun 2021 16:06:10 +0900 Yugo NAGATA wrote: > On Mon, 14 Jun 2021 08:47:40 +0200 (CEST) > Fabien COELHO wrote: > > > I attached the fixed patch that uses return instead of break, and I > > > confirmed > > > that this made the progress reporting work property. > > > > I'm hesitating to do such a strictural change for a degenerate case linked > > to "insane" parameters, as pg is unlikely to reach 100 million tps, ever. > > It seems to me enough that the command is not blocked in such cases. > > Sure. The change from "break" to "return" is just for making the progress > reporting work in the loop, as you mentioned. However, my original intention > is avoiding stuck in a corner-case where a unrealistic parameter was used, and > I agree with you that this change is not so necessary for handling such a > special situation. I attached the v2 patch to clarify that I withdrew the v3 patch. Regards Yugo Nagata -- Yugo NAGATA diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index d7479925cb..fe75533e3e 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -3223,31 +3223,30 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg) /* * If --latency-limit is used, and this slot is already late * so that the transaction will miss the latency limit even if - * it completed immediately, skip this time slot and schedule - * to continue running on the next slot that isn't late yet. - * But don't iterate beyond the -t limit, if one is given. + * it completed immediately, skip this time slot and loop to + * reschedule. */ if (latency_limit) { pg_time_now_lazy(); - while (thread->throttle_trigger < now - latency_limit && - (nxacts <= 0 || st->cnt < nxacts)) + if (thread->throttle_trigger < now - latency_limit) { processXactStats(thread, st, , true, agg); - /* next rendez-vous */ - thread->throttle_trigger += - getPoissonRand(>ts_throttle_rs, throttle_delay); - st->txn_scheduled = thread->throttle_trigger; - } - /* - * stop client if -t was exceeded in the previous skip - * loop - */ - if (nxacts > 0 && st->cnt >= nxacts) - { - st->state = CSTATE_FINISHED; + /* stop client if -T/-t was exceeded. */ + if (timer_exceeded || (nxacts > 0 && st->cnt >= nxacts)) + /* + * For very unrealistic rates under -T, some skipped + * transactions are not counted because the catchup + * loop is not fast enough just to do the scheduling + * and counting at the expected speed. + * + * We do not bother with such a degenerate case. + */ + st->state = CSTATE_FINISHED; + + /* otherwise loop over PREPARE_THROTTLE */ break; } }
Re: Avoid stuck of pbgench due to skipped transactions
On Mon, 14 Jun 2021 08:47:40 +0200 (CEST) Fabien COELHO wrote: > > >>> I attached a patch for this fix. > >> > >> The patch mostly works for me, and I agree that the bench should not be in > >> a loop on any parameters, even when "crazy" parameters are given… > >> > >> However I'm not sure this is the right way to handle this issue. > >> > >> The catch-up loop can be dropped and the automaton can loop over itself to > >> reschedule. Doing that as the attached fixes this issue and also makes > >> progress reporting work proprely in more cases, and reduces the number of > >> lines of code. I did not add a test case because time sensitive tests have > >> been removed (which is too bad, IMHO). > > > > I agree with your way to fix. However, the progress reporting didn't work > > because we cannot return from advanceConnectionState to threadRun and just > > break the loop. > > > > + /* otherwise loop over > > PREPARE_THROTTLE */ > > break; > > > > I attached the fixed patch that uses return instead of break, and I > > confirmed > > that this made the progress reporting work property. > > I'm hesitating to do such a strictural change for a degenerate case linked > to "insane" parameters, as pg is unlikely to reach 100 million tps, ever. > It seems to me enough that the command is not blocked in such cases. Sure. The change from "break" to "return" is just for making the progress reporting work in the loop, as you mentioned. However, my original intention is avoiding stuck in a corner-case where a unrealistic parameter was used, and I agree with you that this change is not so necessary for handling such a special situation. Regards, Yugo Nagata -- Yugo NAGATA
Re: Avoid stuck of pbgench due to skipped transactions
I attached a patch for this fix. The patch mostly works for me, and I agree that the bench should not be in a loop on any parameters, even when "crazy" parameters are given… However I'm not sure this is the right way to handle this issue. The catch-up loop can be dropped and the automaton can loop over itself to reschedule. Doing that as the attached fixes this issue and also makes progress reporting work proprely in more cases, and reduces the number of lines of code. I did not add a test case because time sensitive tests have been removed (which is too bad, IMHO). I agree with your way to fix. However, the progress reporting didn't work because we cannot return from advanceConnectionState to threadRun and just break the loop. + /* otherwise loop over PREPARE_THROTTLE */ break; I attached the fixed patch that uses return instead of break, and I confirmed that this made the progress reporting work property. I'm hesitating to do such a strictural change for a degenerate case linked to "insane" parameters, as pg is unlikely to reach 100 million tps, ever. It seems to me enough that the command is not blocked in such cases. -- Fabien.
Re: Avoid stuck of pbgench due to skipped transactions
Hello Fabien, On Sun, 13 Jun 2021 08:56:59 +0200 (CEST) Fabien COELHO wrote: > > I attached a patch for this fix. > > The patch mostly works for me, and I agree that the bench should not be in > a loop on any parameters, even when "crazy" parameters are given… > > However I'm not sure this is the right way to handle this issue. > > The catch-up loop can be dropped and the automaton can loop over itself to > reschedule. Doing that as the attached fixes this issue and also makes > progress reporting work proprely in more cases, and reduces the number of > lines of code. I did not add a test case because time sensitive tests have > been removed (which is too bad, IMHO). I agree with your way to fix. However, the progress reporting didn't work because we cannot return from advanceConnectionState to threadRun and just break the loop. + /* otherwise loop over PREPARE_THROTTLE */ break; I attached the fixed patch that uses return instead of break, and I confirmed that this made the progress reporting work property. Regards, Yugo Nagata -- Yugo NAGATA diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index dc84b7b9b7..8f5d000938 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -3223,32 +3223,31 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg) /* * If --latency-limit is used, and this slot is already late * so that the transaction will miss the latency limit even if - * it completed immediately, skip this time slot and schedule - * to continue running on the next slot that isn't late yet. - * But don't iterate beyond the -t limit, if one is given. + * it completed immediately, skip this time slot and loop to + * reschedule. */ if (latency_limit) { pg_time_now_lazy(); - while (thread->throttle_trigger < now - latency_limit && - (nxacts <= 0 || st->cnt < nxacts)) + if (thread->throttle_trigger < now - latency_limit) { processXactStats(thread, st, , true, agg); - /* next rendez-vous */ - thread->throttle_trigger += - getPoissonRand(>ts_throttle_rs, throttle_delay); - st->txn_scheduled = thread->throttle_trigger; - } - /* - * stop client if -t was exceeded in the previous skip - * loop - */ - if (nxacts > 0 && st->cnt >= nxacts) - { - st->state = CSTATE_FINISHED; - break; + /* stop client if -T/-t was exceeded. */ + if (timer_exceeded || (nxacts > 0 && st->cnt >= nxacts)) + /* + * For very unrealistic rates under -T, some skipped + * transactions are not counted because the catchup + * loop is not fast enough just to do the scheduling + * and counting at the expected speed. + * + * We do not bother with such a degenerate case. + */ + st->state = CSTATE_FINISHED; + + /*otherwise loop over PREPARE_THROTTLE */ + return; } }
Re: Avoid stuck of pbgench due to skipped transactions
Hello Yugo-san, For example, when I usee a large rate (-R) for throttling and a small latency limit (-L) values with a duration (-T), pbbench got stuck. $ pgbench -T 5 -R 1 -L 1; Indeed, it does not get out of the catchup loop for a long time because even scheduling takes more time than the expected transaction time! I think it is better to check the timer expiration even in the loop of transaction skips and to finish pgbnech successfully because we should correcly repport how many transactions are proccessed and skipped also in this case, and getting stuck would not be good anyway. I attached a patch for this fix. The patch mostly works for me, and I agree that the bench should not be in a loop on any parameters, even when "crazy" parameters are given… However I'm not sure this is the right way to handle this issue. The catch-up loop can be dropped and the automaton can loop over itself to reschedule. Doing that as the attached fixes this issue and also makes progress reporting work proprely in more cases, and reduces the number of lines of code. I did not add a test case because time sensitive tests have been removed (which is too bad, IMHO). -- Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index d7479925cb..fe75533e3e 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -3223,31 +3223,30 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg) /* * If --latency-limit is used, and this slot is already late * so that the transaction will miss the latency limit even if - * it completed immediately, skip this time slot and schedule - * to continue running on the next slot that isn't late yet. - * But don't iterate beyond the -t limit, if one is given. + * it completed immediately, skip this time slot and loop to + * reschedule. */ if (latency_limit) { pg_time_now_lazy(); - while (thread->throttle_trigger < now - latency_limit && - (nxacts <= 0 || st->cnt < nxacts)) + if (thread->throttle_trigger < now - latency_limit) { processXactStats(thread, st, , true, agg); - /* next rendez-vous */ - thread->throttle_trigger += - getPoissonRand(>ts_throttle_rs, throttle_delay); - st->txn_scheduled = thread->throttle_trigger; - } - /* - * stop client if -t was exceeded in the previous skip - * loop - */ - if (nxacts > 0 && st->cnt >= nxacts) - { - st->state = CSTATE_FINISHED; + /* stop client if -T/-t was exceeded. */ + if (timer_exceeded || (nxacts > 0 && st->cnt >= nxacts)) + /* + * For very unrealistic rates under -T, some skipped + * transactions are not counted because the catchup + * loop is not fast enough just to do the scheduling + * and counting at the expected speed. + * + * We do not bother with such a degenerate case. + */ + st->state = CSTATE_FINISHED; + + /* otherwise loop over PREPARE_THROTTLE */ break; } }
Avoid stuck of pbgench due to skipped transactions
Hi, I found that pgbench could get stuck when every transaction come to be skipped and the number of transaction is not limitted by -t option. For example, when I usee a large rate (-R) for throttling and a small latency limit (-L) values with a duration (-T), pbbench got stuck. $ pgbench -T 5 -R 1 -L 1; When we specify the number of transactions by -t, it doesn't get stuck because the number of skipped transactions are counted and checked during the loop. However, the timer expiration is not checked in the loop although it is checked before and after a sleep for throttling. I think it is better to check the timer expiration even in the loop of transaction skips and to finish pgbnech successfully because we should correcly repport how many transactions are proccessed and skipped also in this case, and getting stuck would not be good anyway. I attached a patch for this fix. Regards, Yugo Nagata -- Yugo NAGATA diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index dc84b7b9b7..1aa3e6b7be 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -3232,7 +3232,8 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg) pg_time_now_lazy(); while (thread->throttle_trigger < now - latency_limit && - (nxacts <= 0 || st->cnt < nxacts)) + (nxacts <= 0 || st->cnt < nxacts) && + !timer_exceeded) { processXactStats(thread, st, , true, agg); /* next rendez-vous */