I'm preparing a set of servers which will eventually need to handle a high volume of queries (both reads and writes, but most reads are very simple index-based queries returning a limited set of rows, when not just one), and I would like to optimize things as much as possible, so I have a few questions on the exact way PostgreSQL's MVCC works, and how transactions, updates and vacuuming interact. I hope someone will be able to point me in the right direction (feel free to give pointers if I missed the places where this is described).
From what I understand (and testing confirms it), bundling many queries in one single transaction is more efficient than having each query be a separate transaction (like with autocommit on). However, I wonder about the limits of this:
- are there any drawbacks to grouping hundreds or thousands of queries (inserts/updates) over several minutes in one single transaction? Other than the fact that the inserts/updates will not be visible until committed, of course. Essentially turning autocommit off, and doing a commit once in a while.
1. If any locks are held then they will be held for much longer, causing other processes to block.
2. PG needs to be able to roll back the changes - thousands of simple inserts are fine, millions will probably not be.
- does this apply only to inserts/selects/updates or also for selects? Another way to put this is: does a transaction with only one select actually have much transaction-related work to do? Or, does a transaction with only selects actually have any impact anywhere? Does it really leave a trace anywhere? Again, I understand that selects grouped in a transaction will not see updates done after the start of the transaction (unless done by the same process).
There are implications if a SELECT has side-effects (I can call a function in a select - that might do anything).
- if during a single transaction several UPDATEs affect the same row, will MVCC generate as many row versions as there are updates (like would be the case with autocommit) or will they be grouped into one single row version?
I believe there will be many versions. Certainly for 8.0 that must be the case to support savepoints within a transaction.
Another related issue is that many of the tables are indexed on a date field, and one process does a lot of updates on "recent" rows (which lead to many dead tuples), but after that "older" rows tend to remain pretty much unchanged for quite a while. Other than splitting the tables into "old" and "recent" tables, is there any way to make vacuum more efficient? Scanning the whole table for dead tuples when only a small portion of the table actually has any does not feel like being very efficient in this situation.
Other issue: every five minutes or so, I see a noticeable performance drop as PostgreSQL checkpoints. This is 7.4.3 with pretty lousy hardware, I know 8.0 with decent hardware and separate disk(s) for pg_xlog will definitely help, but I really wonder if there is any way to reduce the amount of work that needs to be done at that point (I'm a strong believer of fixing software before hardware). I have already bumped checkpoint_segments to 8, but I'm not quite sure I understand how this helps (or doesn't help) things. Logs show 3 to 6 "recycled transaction log file" lines at that time, that seems quite a lot of work for a load that's still pretty low. Does grouping of more queries in transactions help with this? Are there other parameters that can affect things, or is just a matter of how much inserts/updates/deletes are done, and the amount of data that was changed?
You might be better off reducing the number of checkpoint segments, and decreasing the timeout. There is a balance between doing a lot of work in one go, and the overhead of many smaller bursts of activity.
Last point: some of the servers have expandable data (and will be replicated with slony-I) and will run with fsync off. I have read conflicting statements as to what exactly this does: some sources indicate that setting fsync off actually switches off WAL/checkpointing, others that it just prevents the fsync (or equivalent) system calls. Since I still see checkpointing in that case, I guess it's not exactly the former, but I would love to understand more about it. Really, I would love to be able to set some tables or databases to "go as fast as you can and don't worry about transactions, MVCC or anything like that", but I'm not sure that option exists...
Setting fsync=false means the sync isn't done, so data might still be cached below PG's level. I'm not sure it's ever going to be possible to mark a table as "ignore transactions" - it would be a lot of work, and means you couldn't guarantee transactions that included that table in any way.
-- Richard Huxton Archonet Ltd
---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings