I did a fair dive into double-checking the decision to just leave
xloginsert_locks fixed at 8 for 9.4. My conclusion: good call, move
along. Further improvements beyond what the 8-way split gives sure are
possible. But my guess from chasing them a little is that additional
places will pop up as things that must also be tweaked, before you'll
see those gains turn significant.
I'd like to see that box re-opened at one point. But if we do that, I'm
comfortable that could end with a xloginsert_locks that tunes itself
reasonably on large servers in the end, similar to wal_buffers. There's
nothing about this that makes feel like it needs a GUC. I barely needed
an exposed knob to do this evaluation.
= Baseline =
I rolled back a few commits to just before the GUC was removed and
tested against that point in git time. Starting with the 4 client test
case Heikki provided, the fastest runs on my 24 core server looked like
tps = 56.691855 (including connections establishing)
Repeat runs do need to drop the table and rebuild, because eventually AV
kicks in on things in a big way, and then your test is toast until it's
done. Attached is what I settled on for a test harness. Nothing here
was so subtle I felt a more complicated harness was needed.
Standard practice for me is to give pgbench more workers when worrying
about any scalability tests. That gives a tiny improvement, to where
this is typical with 4 clients and 4 workers:
tps = 60.942537 (including connections establishing)
Increasing to 24 clients plus 24 workers gives roughly the same numbers,
suggesting that the bottleneck here is certainly not the client count,
and that the suggestion of 4 was high enough:
tps = 56.731581 (including connections establishing)
Decreasing xloginsert_locks to 1, so back to the original problem, the
rate normally looks like this instead:
tps = 25.384708 (including connections establishing)
So the big return you get just fine with the default tuning; great. I'm
happy to see it ship like this as good enough for 9.4.
= More locks =
For the next phase, I stuck to 24 clients and 24 workers. If I then
bump up xloginsert_locks to something much larger, there is an
additional small gain to be had. With 24 locks, so basically ever
client has their own, instead of 57-60 TPS, I managed to get as high as
tps = 66.790968 (including connections establishing)
However, the minute I get into this territory, there's an obvious
bottleneck shift going on in there too. The rate of creating new
checkpoint segments becomes troublesome as one example, with messages
LOG: checkpoints are occurring too frequently (1 second apart)
HINT: Consider increasing the configuration parameter
When 9.4 is already giving a more than 100% gain on this targeted test
case, I can't see that chasing after maybe an extra 10% is worth having
yet another GUC around. Especially when it will probably take multiple
tuning steps before you're done anyway; we don't really know the rest of
them yet; and when we do, we probably won't need a GUC to cope with them
in the end anyway.
Greg Smith greg.sm...@crunchydatasolutions.com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/
psql postgres -c "drop table if exists foo"
psql postgres -c "create table foo (id int4)"
pgbench postgres -n -f fooinsert.sql -c $CLIENTS -j $CLIENTS -T10
insert into foo select g from generate_series(1, 10000) g;
Sent via pgsql-hackers mailing list (email@example.com)
To make changes to your subscription: