On Tue, Jan 14, 2014 at 12:20 PM, James Bottomley <james.bottom...@hansenpartnership.com> wrote: > On Tue, 2014-01-14 at 15:15 -0200, Claudio Freire wrote: >> On Tue, Jan 14, 2014 at 2:12 PM, Robert Haas <robertmh...@gmail.com> wrote: >> > In terms of avoiding double-buffering, here's my thought after reading >> > what's been written so far. Suppose we read a page into our buffer >> > pool. Until the page is clean, it would be ideal for the mapping to >> > be shared between the buffer cache and our pool, sort of like >> > copy-on-write. That way, if we decide to evict the page, it will >> > still be in the OS cache if we end up needing it again (remember, the >> > OS cache is typically much larger than our buffer pool). But if the >> > page is dirtied, then instead of copying it, just have the buffer pool >> > forget about it, because at that point we know we're going to write >> > the page back out anyway before evicting it. >> > >> > This would be pretty similar to copy-on-write, except without the >> > copying. It would just be forget-from-the-buffer-pool-on-write. >> >> But... either copy-on-write or forget-on-write needs a page fault, and >> thus a page mapping. >> >> Is a page fault more expensive than copying 8k? >> >> (I really don't know). > > A page fault can be expensive, yes ... but perhaps you don't need one. > > What you want is a range of memory that's read from a file but treated > as anonymous for writeout (i.e. written to swap if we need to reclaim > it). Then at some time later, you want to designate it as written back > to the file instead so you control the writeout order. I'm not sure we > can do this: the separation between file backed and anonymous pages is > pretty deeply ingrained into the OS, but if it were possible, is that > what you want?
Doesn't sound exactly like what I had in mind. What I was suggesting is an analogue of read() that, if it reads full pages of data to a page-aligned address, shares the data with the buffer cache until it's first written instead of actually copying the data. The pages are write-protected so that an attempt to write the address range causes a page fault. In response to such a fault, the pages become anonymous memory and the buffer cache no longer holds a reference to the page. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers