I don't have this resolved yet, but I think I've identified the cause.
Updating here mainly so Fabien doesn't duplicate my work trying to track
this down. I'm going to keep banging at this until it's resolved now
that I got this far.
Here's a slow transaction:
1371226017.568515 client 1 executing \set naccounts 100000 * :scale
1371226017.568537 client 1 throttling 6191 us
1371226017.747858 client 1 executing \setrandom aid 1 :naccounts
1371226017.747872 client 1 sending SELECT abalance FROM pgbench_accounts
WHERE aid = 268721;
1371226017.789816 client 1 receiving
That confirms it is getting stuck at the "throttling" step. Looks like
the code pauses there because it's trying to overload the "sleeping"
state that was already in pgbench, but handle it in a special way inside
of doCustom(), and that doesn't always work.
The problem is that pgbench doesn't always stay inside doCustom when a
client sleeps. It exits there to poll for incoming messages from the
other clients, via select() on a shared socket. It's not safe to assume
doCustom will be running regularly; that's only true if clients keep
So as long as other clients keep banging on the shared socket, doCustom
is called regularly, and everything works as expected. But at the end
of the test run that happens less often, and that's when the problem
pgbench already has a "\sleep" command, and the way that delay is
handled happens inside threadRun() instead. The pausing of the rate
limit throttle needs to operate in the same place. I have to redo a few
things to confirm this actually fixes the issue, as well as look at
Fabien's later updates to this since I wandered off debugging. I'm sure
it's in the area of code I'm poking at now though.
Greg Smith 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: