I don't have this resolved yet, but I think I've identified the cause. Updating here mainly so Fabien doesn't duplicate my work trying to track this down. I'm going to keep banging at this until it's resolved now that I got this far.

Here's a slow transaction:

1371226017.568515 client 1 executing \set naccounts 100000 * :scale
1371226017.568537 client 1 throttling 6191 us
1371226017.747858 client 1 executing \setrandom aid 1 :naccounts
1371226017.747872 client 1 sending SELECT abalance FROM pgbench_accounts WHERE aid = 268721;
1371226017.789816 client 1 receiving

That confirms it is getting stuck at the "throttling" step. Looks like the code pauses there because it's trying to overload the "sleeping" state that was already in pgbench, but handle it in a special way inside of doCustom(), and that doesn't always work.

The problem is that pgbench doesn't always stay inside doCustom when a client sleeps. It exits there to poll for incoming messages from the other clients, via select() on a shared socket. It's not safe to assume doCustom will be running regularly; that's only true if clients keep returning messages.

So as long as other clients keep banging on the shared socket, doCustom is called regularly, and everything works as expected. But at the end of the test run that happens less often, and that's when the problem shows up.

pgbench already has a "\sleep" command, and the way that delay is handled happens inside threadRun() instead. The pausing of the rate limit throttle needs to operate in the same place. I have to redo a few things to confirm this actually fixes the issue, as well as look at Fabien's later updates to this since I wandered off debugging. I'm sure it's in the area of code I'm poking at now though.

Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to