Re: [HACKERS] checkpointer continuous flushing

Tomas Vondra Sat, 19 Mar 2016 02:05:39 -0700

Hi,

On 03/17/2016 06:36 PM, Fabien COELHO wrote:


Hello Tomas,

Thanks for these great measures.

* 4 x CPU E5-4620 (2.2GHz)


4*8 = 32 cores / 64 threads.

Yep. I only used 32 clients though, to keep some of the CPU availablefor the rest of the system (also, HT does not really double the numberof cores).

* 256GB of RAM


Wow!

* 24x SSD on LSI 2208 controller (with 1GB BBWC)


Wow! RAID configuration ? The patch is designed to fix very big issues
on HDD, but it is good to see that the impact is good on SSD as well.

Yep, RAID-10. I agree that doing the test on a HDD-based system would beuseful, however (a) I don't have a comparable system at hand at themoment, and (b) I was a bit worried that it'll hurt performance on SSDs,but thankfully that's not the case.


I will do the test on a much smaller system with HDDs in a few days.


Is it possible to run tests with distinct table spaces on those many disks?

Nope, that'd require reconfiguring the system (and then back), and Idon't have access to that system (just SSH). Also, I don't quite seewhat would that tell us?

* shared_buffers=64GB


1/4 of the available memory.

The pgbench was scale 60000, so ~750GB of data on disk,


*3 available memory, mostly on disk.

or like this ("throttled"):

pgbench -c 32 -j 8 -T 86400 -R 5000 -l --aggregate-interval=1 pgbench

The reason for the throttling is that people generally don't run
production databases 100% saturated, so it'd be sad to improve the
100% saturated case and hurt the common case by increasing latency.


Sure.

The machine does ~8000 tps, so 5000 tps is ~60% of that.


Ok.

I would have suggested using the --latency-limit option to filter out
very slow queries, otherwise if the system is stuck it may catch up
later, but then this is not representative of "sustainable" performance.

When pgbench is running under a target rate, in both runs the
transaction distribution is expected to be the same, around 5000 tps,
and the green run looks pretty ok with respect to that. The magenta one
shows that about 25% of the time, things are not good at all, and the
higher figures just show the catching up, which is not really
interesting if you asked for a web page and it is finally delivered 1
minutes later.

Maybe. But that'd only increase the stress on the system, possiblycausing more issues, no? And the magenta line is the old code, thus itwould only increase the improvement of the new code.

Notice the max latency is in microseconds (as logged by pgbench), soaccording to the "max latency" charts the latencies are below 10 seconds(old) and 1 second (new) about 99% of the time. So I don't think thiswould make any measurable difference in practice.



regards


--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

Reply via email to