For other approaches, such as whether to use an LRU list to manage the shared_buffer or to use a clock sweep for management, both methods have their pros and cons. But for these two issues, there is a clearly better solution. For example, using DirectIO avoids the problem of double-copying data, and the OS’s page cache LRU list is optimized for general scenarios, while the database kernel should use its own eviction algorithm. Regarding the other issue, full-page writes don’t actually reduce the number of page reads—it’s just a matter of whether those page reads come from data files or from the redo log; the amount of data read is essentially the same. However, the problem it introduces is significant write amplification on the critical write path, which severely impacts performance. As a result, PostgreSQL has to minimize the frequency of checkpoints as much as possible.
I thought someone could write a demo to show it.. On Tue, Aug 20, 2024 at 9:46 PM Heikki Linnakangas <hlinn...@iki.fi> wrote: > > On 20/08/2024 11:46, 陈宗志 wrote: > > I’ve recently started exploring PostgreSQL implementation. I used to > > be a MySQL InnoDB developer, and I find the PostgreSQL community feels > > a bit strange. > > > > There are some areas where they’ve done really well, but there are > > also some obvious issues that haven’t been improved. > > > > For example, the B-link tree implementation in PostgreSQL is > > particularly elegant, and the code is very clean. > > But there are some clear areas that could be improved but haven’t been > > addressed, like the double memory problem where the buffer pool and > > page cache store the same page, using full-page writes to deal with > > torn page writes instead of something like InnoDB’s double write > > buffer. > > > > It seems like these issues have clear solutions, such as using > > DirectIO like InnoDB instead of buffered IO, or using a double write > > buffer instead of relying on the full-page write approach. > > Can anyone replay why? > > There are pros and cons. With direct I/O, you cannot take advantage of > the kernel page cache anymore, so it becomes important to tune > shared_buffers more precisely. That's a downside: the system requires > more tuning. For many applications, squeezing the last ounce of > performance just isn't that important. There are also scaling issues > with the Postgres buffer cache, which might need to be addressed first. > > With double write buffering, there are also pros and cons. It also > requires careful tuning. And replaying WAL that contains full-page > images can be much faster, because you can write new page images > "blindly" without reading the old pages first. We have WAL prefetching > now, which alleviates that, but it's no panacea. > > In summary, those are good solutions but they're not obviously better in > all circumstances. > > > However, the PostgreSQL community’s mailing list is truly a treasure > > trove, where you can find really interesting discussions. For > > instance, this discussion on whether lock coupling is needed for > > B-link trees, etc. > > https://www.postgresql.org/message-id/flat/CALJbhHPiudj4usf6JF7wuCB81fB7SbNAeyG616k%2Bm9G0vffrYw%40mail.gmail.com > > Yep, there are old threads and patches for double write buffers and > direct IO too :-). > > -- > Heikki Linnakangas > Neon (https://neon.tech) > -- --- Blog: http://www.chenzongzhi.info Twitter: https://twitter.com/baotiao Git: https://github.com/baotiao