On 12/5/13 9:59 AM, Tom Lane wrote:
Greg Stark <st...@mit.edu> writes:
I think the way to use mmap would be to mmap very large chunks,
possibly whole tables. We would need some way to control page flushes
that doesn't involve splitting mappings and can be efficiently
controlled without having the kernel storing arbitrarily large tags on
page tables or searching through all the page tables to mark pages

I might be missing something, but AFAICS mmap's API is just fundamentally
wrong for this.  The kernel is allowed to write-back a modified mmap'd
page to the underlying file at any time, and will do so if say it's under
memory pressure.  You can tell the kernel to sync now, but you can't tell
it *not* to sync.  I suppose you are thinking that some wart could be
grafted onto that API to reverse that, but I wouldn't have a lot of
confidence in it.  Any VM bug that caused the kernel to sometimes write
too soon would result in nigh unfindable data consistency hazards.

Something else to ponder on... a Segate researcher gave a talk on upcoming hard drive 
technology it RICON East this spring. The interesting bit is that 1 or 2 generations down 
the road HDs will start using "shingling": The write head has to be bigger than 
the read head, so they're going to set it up so you can not modify a range of tracks 
after they've been written. They'll do this by keeping a journal inside the HD. This is 
somewhat similar to how SSDs work too (you can only erase large pages of data, you can't 
update individual bytes/sectors/filesystem blocks.

So long-term, random access updates to permanent storage will be less efficient 
than today. (Of course, non-volatile memory could turn all this on it's head..)
Jim C. Nasby, Data Architect                       j...@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to