On Thu, Dec 22, 2016 at 7:35 AM, Michael Paquier <michael.paqu...@gmail.com> wrote: > On Wed, Dec 21, 2016 at 10:37 PM, Stas Kelvich <s.kelv...@postgrespro.ru> > wrote: >> ISTM your reasoning about filesystem cache applies here as well, but just >> without spending time on file creation. > > True. The more spread the checkpoints and 2PC files, the more risk to > require access to disk. Memory's cheap anyway. What was the system > memory? How many checkpoints did you trigger for how many 2PC files > created? Perhaps it would be a good idea to look for the 2PC files > from WAL records in a specific order. Did you try to use > dlist_push_head instead of dlist_push_tail? This may make a difference > on systems where WAL segments don't fit in system cache as the latest > files generated would be looked at first for 2PC data.
Stas, have you tested as well tested the impact on recovery time when WAL segments are very likely evicted from the OS cache? This could be a plausible scenario if a standby instance is heavily used for read-only transactions (say pgbench -S), and that the data quantity is higher than the amount of RAM available. It would not be complicated to test that: just drop_caches before beginning recovery. The maximum amount of 2PC transactions that need to have access to the past WAL segments is linearly related to the volume of WAL between two checkpoints, so max_wal_size does not really matter. What matters is the time it takes to recover the same amount of WAL. Increasing max_wal_size would give more room to reduce the overall noise between two measurements though. -- Michael -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers