I'm still getting the same sort of pauses waiting for input with your v11.
This is a pretty frustrating problem; I've spent about two days so far trying
to narrow down how it happens. I've attached the test program I'm using. It
seems related to my picking a throttled rate that's close to (but below) the
maximum possible on your system. I'm using 10,000 on a system that can do
about 16,000 TPS when running pgbench in debug mode.
This problem is 100% reproducible here; happens every time. This is a laptop
running Mac OS X. It's possible the problem is specific to that platform.
I'm doing all my tests with the database itself setup for development, with
debug and assertions on. The lag spikes seem smaller without assertions on,
but they are still there.
Here's a sample:
transaction type: SELECT only
What is this test script? I'm doing pgbench for tests.
scaling factor: 10
query mode: simple
number of clients: 25
number of threads: 1
duration: 30 s
number of transactions actually processed: 301921
average transaction lag: 1.133 ms (max 137.683 ms)
tps = 10011.527543 (including connections establishing)
tps = 10027.834189 (excluding connections establishing)
And those slow ones are all at the end of the latency log file, as shown in
column 3 here:
22 11953 3369 0 1371578126 954881
23 11926 3370 0 1371578126 954918
3 12238 30310 0 1371578126 984634
7 12205 30350 0 1371578126 984742
8 12207 30359 0 1371578126 984792
11 12176 30325 0 1371578126 984837
13 12074 30292 0 1371578126 984882
0 12288 175452 0 1371578127 126340
9 12194 171948 0 1371578127 126421
12 12139 171915 0 1371578127 126466
24 11876 175657 0 1371578127 126507
Indeed, there are two spikes, but not all clients are concerned.
As I have not seen that, debuging is hard. I'll give it a try on
When no one is sleeping, the timeout becomes infinite, so only returning data
will break it. This is intended behavior though.
This is not coherent. Under --throttle there should basically always be
someone asleep, unless the server cannot cope with the load and *all*
transactions are late. A no time out state looks pretty unrealistic,
because it means that there is no throttling.
I don't think the st->listen related code has anything to do with this
either. That flag is only used to track when clients have completed sending
their first query over to the server. Once reaching that point once,
afterward they should be "listening" for results each time they exit the
This assumption seems false if you can have a "sleep" at the beginning of
the sequence, which is what throttle is doing, but can be done by any
custom script, so that the client is expected to wait before sending any
command, thus there can be no select underway in that case.
So listen should be set to 1 when a select as been sent, and set back to 0
when the result data have all been received.
"doCustom" makes implicit assumptions about what is going on, whereas it
should focus on looking at the incoming state, performing operations, and
leaving with a state which correspond to the actual status, without
assumptions about what is going to happen next.
st->listen goes to 1 very soon after startup and then it stays there,
and that logic seems fine too.
Sent via pgsql-hackers mailing list (email@example.com)
To make changes to your subscription: