Hello Greg,

Thanks for this very detailed review and the suggestions!

I'll submit a new patch

Question 1: should it report the maximum lang encountered?

I haven't found the lag measurement to be very useful yet, outside of debugging the feature itself. Accordingly I don't see a reason to add even more statistics about the number outside of testing the code. I'm seeing some weird lag problems that this will be useful for though right now, more on that a few places below.

I'll explain below why it is really interesting to get this figure, and that it is not really available as precisely elsewhere.

Question 2: the next step would be to have the current lag shown under
option --progress, but that would mean having a combined --throttle
--progress patch submission, or maybe dependencies between patches.

This is getting too far ahead.


Let's get the throttle part nailed down before introducing even more moving parts into this. I've attached an updated patch that changes a few things around already. I'm not done with this yet and it needs some more review before commit, but it's not too far away from being ready.

Ok. I'll submit a new version by the end of the week.

This feature works quite well. On a system that will run at 25K TPS without any limit, I did a run with 25 clients and a rate of 400/second, aiming at 10,000 TPS, and that's what I got:

number of clients: 25
number of threads: 1
duration: 60 s
number of transactions actually processed: 599620
average transaction lag: 0.307 ms
tps = 9954.779317 (including connections establishing)
tps = 9964.947522 (excluding connections establishing)

I never thought of implementing the throttle like this before,

Stochastic processes are a little bit magic:-)

but it seems to work out well so far. Check out tps.png to see the smoothness of the TPS curve (the graphs came out of pgbench-tools. There's a little more play outside of the target than ideal for this case. Maybe it's worth tightening the Poisson curve a bit around its center?

The point of a Poisson distribution is to model random events the kind of which are a little bit irregular, such as web requests or queuing clients at a taxi stop. I cannot really change the formula, but if you want to argue with Siméon Denis Poisson, hist current address is 19th section of "Père Lachaise" graveyard in Paris:-)

More seriously, the only parameter that can be changed is the "1000000.0" which drives the granularity of the Poisson process. A smaller value would mean a smaller potential multiplier; that is how far from the average time the schedule can go. This may come under "tightening", although it would depart from a "perfect" process and possibly may be a little less "smooth"... for a given definition of "tight", "perfect" and "smooth":-)

[...] What I did instead was think of this as a transaction rate target, which makes the help a whole lot simpler:

 -R SPEC, --rate SPEC
              target rate per client in transactions per second

Ok, I'm fine with this name.

Made the documentation easier to write too. I'm not quite done with that yet, the docs wording in this updated patch could still be better.

I'm not an English native speaker, any help is welcome here. I'll do my best.

I personally would like this better if --rate specified a *total* rate across all clients.

Ok, I can do that, with some reworking so that the stochastic process is shared by all threads instead of being within each client. This mean that a lock between threads to access some variables, which should not impact the test much. Another option is to have a per-thread stochastic process.

However, there are examples of both types of settings in the program already, so there's no one precedent for which is right here. -t is per-client and now -R is too; I'd prefer it to be like -T instead. It's not that important though, and the code is cleaner as it's written right now. Maybe this is better; I'm not sure.

I like the idea of just one process instead of a per-client one. I did not try at the beginning because the implementation is less straightforward.

On the topic of this weird latency spike issue, I did see that show up in some of the results too.

Your example illustrates *exactly* why the lag measure was added.

The Poisson processes generate an ideal event line (that is irregularly scheduled transaction start times targetting the expected tps) which induces a varrying load that the database is trying to handle.

If it cannot start right away, this means that some transactions are differed with respect to their schedule start time. The measure latency reports exactly that: the clients do not handle the load. There may be some catchup later, that is the clients come back in line with the scheduled transactions.

I need to put this measure here because the "schedluled time" is only known to pgbench and not available elsewhere. The max would really be more interesting than the mean, so as to catch that some things were temporarily amiss, even if things went back to nominal later.

Here's one where I tried to specify a rate higher than the system can actually handle, 80000 TPS total on a SELECT-only test

$ pgbench -S -T 30 -c 8 -j 4 -R10000tps pgbench
starting vacuum...end.
transaction type: SELECT only
scaling factor: 100
query mode: simple
number of clients: 8
number of threads: 4
duration: 30 s
number of transactions actually processed: 761779
average transaction lag: 10298.380 ms

The interpretation is the following: as the database cannot handle the load, transactions were processed on average 10 seconds behind their scheduled transaction time. You had on average a 10 second latency to answer "incoming" requests. Also some transactions where implicitely not even scheduled, so the situation is worse than that...

tps = 25392.312544 (including connections establishing)
tps = 25397.294583 (excluding connections establishing)

It was actually limited by the capabilities of the hardware, 25K TPS. 10298 ms of lag per transaction can't be right though.

Some general patch submission suggestions for you as a new contributor:

Hmmm, I did a few things such as "pgxs" back in 2004, so maybe "not very active" is a better description than "new":-)

-When re-submitting something with improvements, it's a good idea to add a version number to the patch so reviewers can tell them apart easily. But there is no reason to change the subject line of the e-mail each time. I followed that standard here. If you updated this again I would name the file pgbench-throttle-v9.patch but keep the same e-mail subject.


-There were some extra carriage return characters in your last submission. Wasn't a problem this time, but if you can get rid of those that makes for a better patch.


Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to