On Fri, Jun 9, 2023 at 4:00 AM Andres Freund <and...@anarazel.de> wrote: > On 2023-06-08 12:15:58 +0200, Hannu Krosing wrote: > > > This part was touched in the "AMA with a Linux Kernale Hacker" > > > Unconference session where he mentioned that the had proposed a > > > 'mshare' syscall for this. > > As-is that'd just lead to sharing page table, not the TLB. I don't think you > currently do sharing of the TLB for parts of your address space on x86 > hardware. It's possible that something like that gets added to future > hardware, but ...
I wasn't in Mathew Wilcox's unconference in Ottawa but I found an older article on LWN: https://lwn.net/Articles/895217/ For what it's worth, FreeBSD hackers have studied this topic too (and it's been done in Android and no doubt other systems before): https://www.cs.rochester.edu/u/sandhya/papers/ispass19.pdf I've shared that paper on this list before in the context of super/huge pages and their benefits (to executable code, and to the buffer pool), but a second topic in that paper is the idea of a shared page table: "We find that sharing PTPs across different processes can reduce execution cycles by as much as 6.9%. Moreover, the combined effects of using superpages to map the main executable and sharing PTPs for the small shared libraries can reduce execution cycles up to 18.2%." And that's just part of it, because those guys are more interested in shared code/libraries and such so that's probably not even getting to the stuff like buffer pool and DSMs that we might tend to think of first. I'm pretty sure PostgreSQL (along with another fork-based RDBMSs mentioned in this thread) must be one of the worst offenders for page table bloat, simply because we can have a lot of processes and touch a lot of memory. I'm no expert in this stuff, but it seems to be that with shared page table schemes you can avoid wasting huge amounts of RAM on duplicated page table entries (pages * processes), and with huge/super pages you can reduce the number of pages, but AFAIK you still can't escape the TLB shootdown cost, which is all-or-nothing (PCID level at best). The only way to avoid TLB shootdowns on context switches is to have *exactly the same memory map*. Or, as Robert succinctly shouted, "THREADS".