Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

Fabien COELHO Mon, 10 Jun 2013 03:41:30 -0700


Hello Greg,


Thanks for this very detailed review and the suggestions!

I'll submit a new patch

Question 1: should it report the maximum lang encountered?
I haven't found the lag measurement to be very useful yet, outside ofdebugging the feature itself. Accordingly I don't see a reason to add evenmore statistics about the number outside of testing the code. I'm seeingsome weird lag problems that this will be useful for though right now, moreon that a few places below.

I'll explain below why it is really interesting to get this figure, andthat it is not really available as precisely elsewhere.

Question 2: the next step would be to have the current lag shown under
option --progress, but that would mean having a combined --throttle
--progress patch submission, or maybe dependencies between patches.


This is getting too far ahead.

Ok!

Let's get the throttle part nailed down before introducing even moremoving parts into this. I've attached an updated patch that changes afew things around already. I'm not done with this yet and it needs somemore review before commit, but it's not too far away from being ready.


Ok. I'll submit a new version by the end of the week.

This feature works quite well. On a system that will run at 25K TPS withoutany limit, I did a run with 25 clients and a rate of 400/second, aiming at10,000 TPS, and that's what I got:
number of clients: 25
number of threads: 1
duration: 60 s
number of transactions actually processed: 599620
average transaction lag: 0.307 ms
tps = 9954.779317 (including connections establishing)
tps = 9964.947522 (excluding connections establishing)

I never thought of implementing the throttle like this before,


Stochastic processes are a little bit magic:-)

but it seems to work out well so far. Check out tps.png to see thesmoothness of the TPS curve (the graphs came out of pgbench-tools.There's a little more play outside of the target than ideal for thiscase. Maybe it's worth tightening the Poisson curve a bit around itscenter?

The point of a Poisson distribution is to model random events the kind ofwhich are a little bit irregular, such as web requests or queuing clientsat a taxi stop. I cannot really change the formula, but if you want toargue with Siméon Denis Poisson, hist current address is 19th section of"Père Lachaise" graveyard in Paris:-)

More seriously, the only parameter that can be changed is the "1000000.0"which drives the granularity of the Poisson process. A smaller value wouldmean a smaller potential multiplier; that is how far from the average timethe schedule can go. This may come under "tightening", although it woulddepart from a "perfect" process and possibly may be a little less"smooth"... for a given definition of "tight", "perfect" and "smooth":-)

[...] What I did instead was think of this as a transaction rate target,which makes the help a whole lot simpler:
 -R SPEC, --rate SPEC
              target rate per client in transactions per second


Ok, I'm fine with this name.

Made the documentation easier to write too. I'm not quite done with thatyet, the docs wording in this updated patch could still be better.

I'm not an English native speaker, any help is welcome here. I'll do mybest.

I personally would like this better if --rate specified a *total* rate acrossall clients.

Ok, I can do that, with some reworking so that the stochastic process isshared by all threads instead of being within each client. This mean thata lock between threads to access some variables, which should not impactthe test much. Another option is to have a per-thread stochastic process.

However, there are examples of both types of settings in theprogram already, so there's no one precedent for which is right here. -t isper-client and now -R is too; I'd prefer it to be like -T instead. It's notthat important though, and the code is cleaner as it's written right now.Maybe this is better; I'm not sure.

I like the idea of just one process instead of a per-client one. I did nottry at the beginning because the implementation is less straightforward.

On the topic of this weird latency spike issue, I did see that show up insome of the results too.


Your example illustrates *exactly* why the lag measure was added.

The Poisson processes generate an ideal event line (that is irregularlyscheduled transaction start times targetting the expected tps) whichinduces a varrying load that the database is trying to handle.

If it cannot start right away, this means that some transactions arediffered with respect to their schedule start time. The measure latencyreports exactly that: the clients do not handle the load. There may besome catchup later, that is the clients come back in line with thescheduled transactions.

I need to put this measure here because the "schedluled time" is onlyknown to pgbench and not available elsewhere. The max would really be moreinteresting than the mean, so as to catch that some things weretemporarily amiss, even if things went back to nominal later.

Here's one where I tried to specify a rate higherthan the system can actually handle, 80000 TPS total on a SELECT-only test


$ pgbench -S -T 30 -c 8 -j 4 -R10000tps pgbench
starting vacuum...end.
transaction type: SELECT only
scaling factor: 100
query mode: simple
number of clients: 8
number of threads: 4
duration: 30 s
number of transactions actually processed: 761779
average transaction lag: 10298.380 ms

The interpretation is the following: as the database cannot handle theload, transactions were processed on average 10 seconds behind theirscheduled transaction time. You had on average a 10 second latency toanswer "incoming" requests. Also some transactions where implicitely noteven scheduled, so the situation is worse than that...

tps = 25392.312544 (including connections establishing)
tps = 25397.294583 (excluding connections establishing)
It was actually limited by the capabilities of the hardware, 25K TPS. 10298ms of lag per transaction can't be right though.
Some general patch submission suggestions for you as a new contributor:

Hmmm, I did a few things such as "pgxs" back in 2004, so maybe "not veryactive" is a better description than "new":-)

-When re-submitting something with improvements, it's a good idea to add aversion number to the patch so reviewers can tell them apart easily. Butthere is no reason to change the subject line of the e-mail each time. Ifollowed that standard here. If you updated this again I would name the filepgbench-throttle-v9.patch but keep the same e-mail subject.

Ok.

-There were some extra carriage return characters in your last submission.Wasn't a problem this time, but if you can get rid of those that makes for abetter patch.


Ok.

--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

Reply via email to