> On 21 Dec 2016, at 19:56, Michael Paquier <michael.paqu...@gmail.com> wrote: > > That's indeed way simpler than before. Have you as well looked at the > most simple approach discussed? That would be just roughly replacing > the pg_fsync() calls currently in RecreateTwoPhaseFile() by a save > into a list as you are doing, then issue them all checkpoint.Even for > 2PC files that are created and then removed before the next > checkpoint, those will likely be in system cache.
Yes, I tried that as well. But in such approach another bottleneck arises — new file creation isn’t very cheap operation itself. Dual xeon with 100 backends quickly hit that, and OS routines about file creation occupies first places in perf top. Probably that depends on filesystem (I used ext4), but avoiding file creation when it isn’t necessary seems like cleaner approach. On the other hand it is possible to skip file creation by reusing files, for example naming them by dummy PGPROC offset, but that would require some changes to places that right now looks only at filenames. > This removes as well > the need to have XlogReadTwoPhaseData() work in crash recovery, which > makes me a bit nervous. Hm, do you have any particular bad scenario for that case in you mind? > And this saves lookups at the WAL segments > still present in pg_xlog, making the operation at checkpoint much > faster with many 2PC files to process. ISTM your reasoning about filesystem cache applies here as well, but just without spending time on file creation. -- Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company