On 1/30/15 5:44 PM, Slava Mudry wrote:
Hi,
I have to deal with badly written system which regularly suffers from
transaction wraparound issue. This issue is happenning every 10-14 days
and forces me to take system offline and vacuum in single-user mode.
Main causes for this are (afaik):
1) heavy transaction traffic + 100+GB of stale tables
2) slow i/o (rotational drives)
3) autovacuum can't keep up.

Basically the database stores events data in daily partitioned table
"daily_events".
What I did, was - I ran vaccum freeze on all partitions (the tables are
never touched after they're done for a day). I have also scheduled
vacuum-freeze for a partition after it's done writing.

This essentially set xmin in each partition to "frozen" value of "2".
However, to my surprise, this was not enough!
Postgres stores relfrozenxid in pg_class and this value apparently is
getting old pretty fast (due to high volume of transactions).
And it seems that it doesn't really matter that xmin is frozen for a
table, the relfrozenxid is what causing transaction wraparound.

relfrozenxid is only part of the picture. A database-wide freeze vacuum will be controlled by pg_database.datfrozenxid.

What version is this? You may also be suffering from multixact wrap.

Why is that? and most importantly - why updating pg_class.relfrozenxid
requires huge amount of i/o by vacuum process for tables that are never
updated?

Because it has to scan the entire table to see what the oldest XID is. We don't check to see if relfrozenxid is already 2, though I suppose we could add that.

Is it safe to just update pg_class.relfrozenxid for tables where xmin=2
for all rows? Same for linked toast table?

That would be a great way to lose data...

You need to look at relations where relfrozenxid is >= 3 and see why relfrozenxid isn't advancing fast enough on them. Check your cost delay settings as well as the *freeze* settings. It's very likely that on a system this busy autovac would never keep up with default settings.

Also, keep in mind that transaction and multixact IDs are cluster-wide, so this is going to affect all databases in that instance. You should think about ways to move the heaviest transaction workload to a separate cluster; possibly putting the raw updates there and having a separate process that aggregates that data into fewer transactions for the main cluster.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to