I'm still getting the same sort of pauses waiting for input with your v11.


This is a pretty frustrating problem; I've spent about two days so far trying to narrow down how it happens. I've attached the test program I'm using. It seems related to my picking a throttled rate that's close to (but below) the maximum possible on your system. I'm using 10,000 on a system that can do about 16,000 TPS when running pgbench in debug mode.

This problem is 100% reproducible here; happens every time. This is a laptop running Mac OS X. It's possible the problem is specific to that platform. I'm doing all my tests with the database itself setup for development, with debug and assertions on. The lag spikes seem smaller without assertions on, but they are still there.

Here's a sample:

transaction type: SELECT only

What is this test script? I'm doing pgbench for tests.

scaling factor: 10
query mode: simple
number of clients: 25
number of threads: 1
duration: 30 s
number of transactions actually processed: 301921
average transaction lag: 1.133 ms (max 137.683 ms)
tps = 10011.527543 (including connections establishing)
tps = 10027.834189 (excluding connections establishing)

And those slow ones are all at the end of the latency log file, as shown in column 3 here:

22 11953 3369 0 1371578126 954881
23 11926 3370 0 1371578126 954918
3 12238 30310 0 1371578126 984634
7 12205 30350 0 1371578126 984742
8 12207 30359 0 1371578126 984792
11 12176 30325 0 1371578126 984837
13 12074 30292 0 1371578126 984882
0 12288 175452 0 1371578127 126340
9 12194 171948 0 1371578127 126421
12 12139 171915 0 1371578127 126466
24 11876 175657 0 1371578127 126507

Indeed, there are two spikes, but not all clients are concerned.

As I have not seen that, debuging is hard. I'll give it a try on tomorrow.

When no one is sleeping, the timeout becomes infinite, so only returning data will break it. This is intended behavior though.

This is not coherent. Under --throttle there should basically always be someone asleep, unless the server cannot cope with the load and *all* transactions are late. A no time out state looks pretty unrealistic, because it means that there is no throttling.

I don't think the st->listen related code has anything to do with this either. That flag is only used to track when clients have completed sending their first query over to the server. Once reaching that point once, afterward they should be "listening" for results each time they exit the doCustom() code.

This assumption seems false if you can have a "sleep" at the beginning of the sequence, which is what throttle is doing, but can be done by any custom script, so that the client is expected to wait before sending any command, thus there can be no select underway in that case.

So listen should be set to 1 when a select as been sent, and set back to 0 when the result data have all been received.

"doCustom" makes implicit assumptions about what is going on, whereas it should focus on looking at the incoming state, performing operations, and leaving with a state which correspond to the actual status, without assumptions about what is going to happen next.

st->listen goes to 1 very soon after startup and then it stays there, and that logic seems fine too.


Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to