On 7/17/13 9:16 PM, Tatsuo Ishii wrote:
Now suppose we have 3 transactions and each has following values:

d(0) = 10
d(1) = 20
d(2) = 30

t(0) = 100
t(1) = 110
t(2) = 120

That says pgbench expects the duration 10 for each
transaction. Actually, the first transaction runs slowly for some
reason and the lag = 100 - 10 = 90. However, tx(1) and tx(2) are
finished on schedule because they spend only 10 (110-10 = 10, 120-110
= 10). So the expected average lag would be 90/3 = 30.

The clients are not serialized here in any significant way, even when they shared a single process/thread. I did many rounds of tracing through this code with timestamps on each line, and the sequence of events here will look like this:

client 0:  send "SELECT..." to server.  yield to next client.
client 1:  send "SELECT..." to server.  yield to next client.
client 2:  send "SELECT..." to server.  yield to next client.
select():  wait for the first response from any client.
client 0:  receive response.  complete transaction, compute lag.
client 1:  receive response.  complete transaction, compute lag.
client 2:  receive response.  complete transaction, compute lag.

There is nothing here that is queuing the clients one after the other. If (0) takes 100ms before its reply comes back, (1) and (2) can receive their reply back and continue forward at any time. They are not waiting for (0); it has yielded control while waiting for a response. All three times are independent once you reach the select() point where all are active.

In this situation, if the server gets stuck doing something such that it takes 100ms before any client receives a response, it is correct to penalize every client for that latency. All three clients could have received the information earlier if the server had any to send them. If they did not, they all were suffering from some sort of lag.

I'm not even sure why you spaced the start times out at intervals of 10. If I were constructing an example like this, I'd have them start at times of 0, 1, 2--as fast as the CPU can fire off statements basically--and then start waiting from that point. If client 1 takes 10 units of time to send its query out before client 2 runs, and the rate goal requires 10 units of time, the rate you're asking for is impossible.

For sorting out what's going on with your two systems, I would recommend turning on debugging output with "-d" and looking at the new per-transaction latency numbers that the feature reports. If your theory that the lag is going up as the test proceeds is true, that should show up in the individual latency numbers too.

Based on what I saw during weeks of testing here, I would be more suspicious that there's a system level difference between your two servers than to blame the latency calculation. I saw a *lot* of weird system issues myself when I started looking that carefully at sustained throughput. The latency reports from the perspective of Fabien's code were always reasonable though. When something delays every client, it counts that against every active client's lag, and that's the right thing to do.

Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to