Re: [HACKERS] MMAP Buffers

Radosław Smogura Sat, 16 Apr 2011 04:50:57 -0700

Greg Stark <gsst...@mit.edu> Saturday 16 April 2011 13:00:19
> On Sat, Apr 16, 2011 at 7:24 AM, Robert Haas <robertmh...@gmail.com> wrote:
> > The OP says that this patch maintains the WAL-before-data rule without
> > any explanation of how it accomplishes that seemingly quite amazing
> > feat.  I assume I'm going to have to read this patch at some point to
> > refute this assertion, and I think that sucks. I am pretty nearly 100%
> > confident that this approach is utterly doomed, and I don't want to
> > spend a lot of time on it unless someone can provide me with a
> > compelling explanation of why my confidence is misplaced.
> 
> Fwiw he did explain how he did that. Or at least I think he did --
> it's possible I read what I expected because what he came up with is
> something I've recently been thinking about.
> 
> What he did, I gather, is treat the mmapped buffers as a read-only
> copy of the data. To actually make any modifications he copies it into
> shared buffers and treats them like normal. When the buffers get
> flushed from memory they get written and then the pointers get
> repointed back at the mmapped copy. Effectively this means the shared
> buffers get extended to include all of the filesystem cache instead of
> having to evict buffers from shared buffers just because you want to
> read another one that's already in filesystem cache.
> 
> It doesn't save the copying between filesystem cache and shared
> buffers for buffers that are actually being written to. But it does
> save some amount of other copies on read-only traffic and it can even
> save some i/o. It does require a function call before each buffer
> modification where the pattern is currently <lock buffer>, <mutate
> buffer>, <mark buffer dirty>. From what he describes he needs to add a
> <prepare buffer for mutation> between the lock and mutate.
> 
> I think it's an interesting experiment and it's good to know how to
> solve some of the subproblems. Notably, how do you extend files or
> drop them atomically across processes? And how do you deal with
> getting the mappings to be the same across all the processes or deal
> with them being different? But I don't think it's a great long-term
> direction. It just seems clunky to have to copy things from mmapped
> buffers to local buffers and back. Perhaps the performance testing
> will show that clunkiness is well worth it but we'll need to see that
> for a wide variety of workloads to judge that.


In short words, I swap, exchange (clash of terms) VM pages to prevent pointers 
(only if needed). I tried to directly point to new memory area, but I saw that 
some parts of code really depends on memory pointed by original pointers, e.g. 
Vaccumm uses hint bits setted by previous scan (it depends on this if bit is 
set or not! so for it it's not only hint). Just from this case I can't assume 
there is no more such places, so VM pages swap does it for me.

Stand alone tests shows for me that this process (with copy from mmap) is 
2x-3x time longer then previous. But until someone will not update whole 
table, then benefit will be taken from pre-update scan, index scans, larger 
availability of memory (you don't eat cache memory to keep copy of cache in 
ShMem). Everything may be slower when database fits in ShMem, and similarly 
(2nd level bufferes may increase performance slightly).

I reserve memory for whole segment even if file is smaller. Extending is by 
wirte one byte at the end of block (here may come deal with Unfiorm Buffer 
Caches, if I remember name well). For current processors, and current 
implementation database size is limited to about 260TB (no dynamic segment 
reservation is performed).

Truncation not implemented.

Each buffer descriptor has tagVersion to simple check if buffer tag has 
changed. Descriptors (partially) are mirrored in local memory, and versions 
are checked. Currently each re-read (is pointed to smgr/md), but introduce 
shared segment id, and assuming each segment has constant maximum number of 
blocks, will make it faster (this will be something like current buffer tag), 
even version field will be unneeded.

I saw problems with vacuum, as it reopens relation and I got mappings of same 
file twice (minor problem). Important will be about deletion, when pointers 
must invalidated in "good way".

Regards,
Radek.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MMAP Buffers

Reply via email to