Robert Haas <robertmh...@gmail.com> Sunday 17 April 2011 22:01:55
> On Sun, Apr 17, 2011 at 11:48 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> > =?utf-8?q?Rados=C5=82aw_Smogura?= <rsmog...@softperience.eu> writes:
> >> Tom Lane <t...@sss.pgh.pa.us> Sunday 17 April 2011 01:35:45
> >> 
> >>> ... Huh?  Are you saying that you ask the kernel to map each individual
> >>> shared buffer separately?  I can't believe that's going to scale to
> >>> realistic applications.
> >> 
> >> No, I do
> >> mrempa(mmap_buff_A, MAP_FIXED, temp);
> >> mremap(shared_buff_Y, MAP_FIXED, mmap_buff_A),
> >> mrempa(tmp, MAP_FIXED, mmap_buff_A).
> > 
> > There's no mremap() in the Single Unix Spec, nor on my ancient HPUX box,
> > nor on my quite-up-to-date OS X box.  The Linux man page for it says
> > "This call is Linux-specific, and should not be used in programs
> > intended to be portable."  So if the patch is dependent on that call,
> > it's dead on arrival from a portability standpoint.
> > 
> > But in any case, you didn't explain how use of mremap() avoids the
> > problem of the kernel having to maintain a separate page-mapping-table
> > entry for each individual buffer.  (Per process, yet.)  If that's what's
> > happening, it's going to be a significant performance penalty as well as
> > (I suspect) a serious constraint on how many buffers can be managed.
> 
> I share your suspicions, although no harm in measuring it.
> 
> But I don't understand is how this approach avoids the problem of
> different processes seeing different buffer contents.  If backend A
> has the buffer mmap'd and backend B wants to modify it (and changes
> the mapping), backend A is still looking at the old buffer contents,
> isn't it?  And then things go boom.

Each process has simple "mirror" of shared descriptors.

I "believe" that modifications to buffer content may be only done when holding 
exclusive lock (with some simple exceptions) (+ MVCC), actually I saw only two 
things that can change already loaded data and cause damage, you have 
described (setting hint bits during scan, and vacuum - 1st may only cause, I 
think, that two processes will ask for same transaction statuses <except 
vacuum>, 2nd one is impossible as vacumm requires exclusive pin). When buffer 
tag is changed the version of buffer is bumped up, and checked against local 
version - this about reading buffer. 

In other cases after obtaining lock check is done if buffer has associated 
updatable buffer and if local "mirror" has it too, then swap should take 
place.

Logic about updatable buffers is similar to "shared buffers", each updatable 
buffer has pin count, and updatable buffer can't be free if someone uses it, 
but in contrast to "normal buffers", updatable buffers doesn't have any 
support for locking etc. Updatable buffers exists only on free list, or when 
associated with buffer.

In future, I will change version to shared segment id, something like 
relation's oid + block, but ids will have continuous numbering 1,2,3..., so I 
will be able to bypass smgr/md during read, and tag version check - this looks 
like faster solution.

Regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to