Re: [HACKERS] checkpointer continuous flushing

Fabien COELHO Sat, 19 Mar 2016 04:47:52 -0700

Is it possible to run tests with distinct table spaces on those many disks?
Nope, that'd require reconfiguring the system (and then back), and I don'thave access to that system (just SSH).

Ok.

Also, I don't quite see what would that tell us?

Currently the flushing context is shared between table space, but I thinkthat it should be per table space. My tests did not manage to convinceAndres, so getting some more figures would be great. That will be anothertime!

I would have suggested using the --latency-limit option to filter out
very slow queries, otherwise if the system is stuck it may catch up
later, but then this is not representative of "sustainable" performance.

When pgbench is running under a target rate, in both runs the
transaction distribution is expected to be the same, around 5000 tps,
and the green run looks pretty ok with respect to that. The magenta one
shows that about 25% of the time, things are not good at all, and the
higher figures just show the catching up, which is not really
interesting if you asked for a web page and it is finally delivered 1
minutes later.

Maybe. But that'd only increase the stress on the system, possibly causingmore issues, no? And the magenta line is the old code, thus it would onlyincrease the improvement of the new code.

Yes and no. I agree that it stresses the system a little more, but thefact that you have 5000 tps in the end does not show that you can reallysustain 5000 tps with reasonnable latency. I find this later informationmore interesting than knowing that you can get 5000 tps on average,thanks to some catching up. Moreover the non throttled runs already shownthat the system could do 8000 tps, so the bandwidth is already there.

Notice the max latency is in microseconds (as logged by pgbench), soaccording to the "max latency" charts the latencies are below 10 seconds(old) and 1 second (new) about 99% of the time.

AFAICS, the max latency is aggregated by second, but then it does not saymuch about the distribution of individuals latencies in the interval, thatis whether they were all close to the max or not, Having the same chartwith median or average might help. Also, with the stddev chart, thepercent do not correspond with the latency one, so it may be that thelatency is high but the stddev is low, i.e. all transactions are equallybad on the interval, or not.

So I must admit that I'm not clear at all how to interpret the max latency& stddev charts you provided.

So I don't think this would make any measurable difference in practice.

I think that it may show that 25% of the time the system could not matchthe target tps, even if it can handle much more on average, so the tpsachieved when discarding late transactions would be under 4000 tps.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

Reply via email to