Re: [HACKERS] MMAP Buffers

2011-04-18 Thread Radosław Smogura

On Sun, 17 Apr 2011 21:06:17 -0400, Robert Haas wrote:

On Sun, Apr 17, 2011 at 5:32 PM, Radosław Smogura
rsmog...@softperience.eu wrote:

Each process has simple mirror of shared descriptors.

I believe that modifications to buffer content may be only done 
when holding
exclusive lock (with some simple exceptions) (+ MVCC), actually I 
saw only two
things that can change already loaded data and cause damage, you 
have
described (setting hint bits during scan, and vacuum - 1st may only 
cause, I
think, that two processes will ask for same transaction statuses 
except
vacuum, 2nd one is impossible as vacumm requires exclusive pin). 
When buffer
tag is changed the version of buffer is bumped up, and checked 
against local

version - this about reading buffer.


Yes, an exclusive lock is required for substantive content changes.
But if vacuum cleaning up the buffer is an issue for your patch, then
it's probably also a problem if someone grabs an exclusive content
lock and deletes the tuple (by setting XMAX) and some other backend
later sees the old buffer contents after having in the meanwhile 
taken
a new snapshot; or if likewise someone grabs an exclusive-lock, adds 
a

tuple, and then your backend takes a new snapshot and then sees the
old buffer contents.  Basically, any time someone grabs an
exclusive-lock and releases it, it's necessary for all observers to
see the updated contents by the time the exclusive lock is released.

In other cases after obtaining lock check is done if buffer has 
associated
updatable buffer and if local mirror has it too, then swap should 
take

place.


I think this check would have to be done every time someone
share-locks the buffer, which seems rather expensive.

Logic about updatable buffers is similar to shared buffers, each 
updatable
buffer has pin count, and updatable buffer can't be free if someone 
uses it,
but in contrast to normal buffers, updatable buffers doesn't have 
any
support for locking etc. Updatable buffers exists only on free list, 
or when

associated with buffer.


I don't see how you're going to get away with removing buffer locks.
They exist for a reason, and adding mmap() to the mix is going to
require MORE locking, not less.

In future, I will change version to shared segment id, something 
like
relation's oid + block, but ids will have continuous numbering 
1,2,3..., so I
will be able to bypass smgr/md during read, and tag version check - 
this looks

like faster solution.


I don't understand this part at all.


To my previous post I want to clarify that updatable buffers are 
implemented in shared memory, so there is no way that process has own 
copy of data.


Regards,
Radek.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-17 Thread Radosław Smogura
Tom Lane t...@sss.pgh.pa.us Sunday 17 April 2011 01:35:45
 =?utf-8?q?Rados=C5=82aw_Smogura?= rsmog...@softperience.eu writes:
  No, no, no :) I wanted to do this, but from above reason I skipped it. I
  swap VM pages, I do remap, in place where the shared buffer was I put
  mmaped page, and in place where mmaped page was I put shared page (in
  certain cases, which should be optimized by e. g. read for update, for
  initial read of page in process I directly points to shared buffer), it
  can be imagined as I affects TLB. This what I call VM swap is
  remapping, so I don't change pointers, I change only where this pointers
  points in physical memory, preserving same pointer in Virtual Memory.
 
 ... Huh?  Are you saying that you ask the kernel to map each individual
 shared buffer separately?  I can't believe that's going to scale to
 realistic applications.
 
   regards, tom lane

No, I do 
mrempa(mmap_buff_A, MAP_FIXED, temp); 
mremap(shared_buff_Y, MAP_FIXED, mmap_buff_A), 
mrempa(tmp, MAP_FIXED, mmap_buff_A). 

This is this additional overhead - and may have some disadvantages. All 
regions SysV / Posix MMAP are mapped before. 

I couldn't believe too, but as I done some work about read, I was in dead 
corner: 
1. Create Read Before Buffer (connect with XLOG) that will store each page 
before modification (page should be flushed and synced to log)
2. Rewrite whole db to repoint pointers or similar stuff (I done few steps for 
this).
3. Or find something different. 

I couldn't believe too, it's way I still work on it. I saw it gains speed for 
few simple updates. I'm not quite sure why it gets it. I only may think it  
was from pre update reads. But full checks will go after some good point of 
updates.

Regards,
Radek







-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-17 Thread Tom Lane
=?utf-8?q?Rados=C5=82aw_Smogura?= rsmog...@softperience.eu writes:
 Tom Lane t...@sss.pgh.pa.us Sunday 17 April 2011 01:35:45
 ... Huh?  Are you saying that you ask the kernel to map each individual
 shared buffer separately?  I can't believe that's going to scale to
 realistic applications.

 No, I do 
 mrempa(mmap_buff_A, MAP_FIXED, temp); 
 mremap(shared_buff_Y, MAP_FIXED, mmap_buff_A), 
 mrempa(tmp, MAP_FIXED, mmap_buff_A). 

There's no mremap() in the Single Unix Spec, nor on my ancient HPUX box,
nor on my quite-up-to-date OS X box.  The Linux man page for it says
This call is Linux-specific, and should not be used in programs
intended to be portable.  So if the patch is dependent on that call,
it's dead on arrival from a portability standpoint.

But in any case, you didn't explain how use of mremap() avoids the
problem of the kernel having to maintain a separate page-mapping-table
entry for each individual buffer.  (Per process, yet.)  If that's what's
happening, it's going to be a significant performance penalty as well as
(I suspect) a serious constraint on how many buffers can be managed.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


The big picture for patch submission (was Re: [HACKERS] MMAP Buffers)

2011-04-17 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 ... But please, everyone feel free to continue bashing me for
 wanting a readable patch with some understandable submission notes.

What he said.  All this obsessing over whether the mmap patch could or
should have been run through pgindent is missing the big picture.
Namely, that no design documentation or theory-of-operation was offered,
and people are trying to extract that information from the code, and
it's just too messy for that to be feasible.  (The patch isn't really
short of comments, but half of the comments seem to be TODOs or author's
questions to himself about whether something will work, and so they just
aren't particularly helpful to someone trying to understand what the
patch does or whether it will work.)

I think that rather than complaining about formatting, we should be
complaining about not following the overall patch submission process
and not providing adequate documentation.  Most of the questions that
people are asking right now could have been answered on the strength of
a design sketch, before any code had been written at all.  For a patch
as complicated and invasive as this, there should be a design sketch,
which perhaps gets fleshed out into a README file in the final patch.

The Submitting_a_Patch wiki page does touch on the point of getting some
early design feedback before you even try to write a patch, but I think
it could do with more emphasis on the issue.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-17 Thread Radosław Smogura
Tom Lane t...@sss.pgh.pa.us Sunday 17 April 2011 17:48:56
 =?utf-8?q?Rados=C5=82aw_Smogura?= rsmog...@softperience.eu writes:
  Tom Lane t...@sss.pgh.pa.us Sunday 17 April 2011 01:35:45
  
  ... Huh?  Are you saying that you ask the kernel to map each individual
  shared buffer separately?  I can't believe that's going to scale to
  realistic applications.
  
  No, I do
  mrempa(mmap_buff_A, MAP_FIXED, temp);
  mremap(shared_buff_Y, MAP_FIXED, mmap_buff_A),
  mrempa(tmp, MAP_FIXED, mmap_buff_A).
 
 There's no mremap() in the Single Unix Spec, nor on my ancient HPUX box,
 nor on my quite-up-to-date OS X box.  The Linux man page for it says
 This call is Linux-specific, and should not be used in programs
 intended to be portable.  So if the patch is dependent on that call,
 it's dead on arrival from a portability standpoint.
Good point. This is from initial concept, and actually I done this to do not 
leave gaps in VM in which library or something could be mmaped. Last time I 
think about using mmap to replace just one VM page.

 But in any case, you didn't explain how use of mremap() avoids the
 problem of the kernel having to maintain a separate page-mapping-table
 entry for each individual buffer.  (Per process, yet.)  If that's what's
 happening, it's going to be a significant performance penalty as well as
 (I suspect) a serious constraint on how many buffers can be managed.
 
   regards, tom lane
Kernel merges vm_structs. So mappings are compacted. I'm not kernel 
specialist, but skipping memory consumption, for not compacted mappings, 
kernel uses btrees for dealing with  TLB, so it should not matter if there is 
100 vm_structs or 10 vm_structs.

Swap isn't made everywhere. When buffer is initialy read (privaterefcount 
==1), then any access to this buffer will directly point to latest valid area. 
If it has assigned shmem area then this will be used. I plan to add 
readbuffer for update to prevent swaps, when it's almost sure that buffer 
will be used for update.

I measured performance of page modifications (with unpining, full process on 
stand alone unit test) it's 2x-3x more time of normal page reads, but this 
result may not be sure, as I saw memcpy to memory above 2GB is slower then 
memcpy to first 2GB (this may be idea to try to put some shared structs  
2GB).

I know that this patch is big question. Sometimes I'm optimistic, and 
sometimes I'm pessimistic about final result.

Regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-17 Thread Andres Freund
On Sunday 17 April 2011 19:26:31 Radosław Smogura wrote:
 Kernel merges vm_structs. So mappings are compacted. I'm not kernel 
 specialist, but skipping memory consumption, for not compacted mappings, 
 kernel uses btrees for dealing with  TLB, so it should not matter if there
 is  100 vm_structs or 10 vm_structs.
But the CPUs TLB cache has maybe 16/256 (1lvl, 2nd) to 64/512 entries. That 
will mean that there will be cachemisses all over.
Additionally your scheme requires flushing it regularly...

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-17 Thread Joshua Berkus
Robert,

 Actually, I'd walk through fire for a 10% performance improvement if
 it meant only a *risk* to stability.

Depends on the degree of risk.  MMAP has the potential to introduce instability 
into areas of the code which have been completely reliable for years.  Adding 
20 new coredump cases with data loss for a 10% improvement seems like a poor 
bargain to me.  It doesn't help that the only DB to rely heavily on MMAP 
(MongoDB) is OSSDB's paragon of data loss.

However, in the case where the database is larger than RAM ... or better, 90% 
of RAM ... MMAP has the theoretical potential to improve performance quite a 
bit more than 10% ... try up to 900% on some queries.  However, I'd like to 
prove that in a test before we bother even debating the fundamental obstacles 
to using MMAP.  It's possible that these theoretical performance benefits will 
not materialize, even without data safeguards.
 
 The problem is that this is
 likely unfixably broken. In particular, I think the first sentence of
 Tom's response hit it right on the nose, and mirrors my own thoughts
 on the subject. To have any chance of working, you'd need to track
 buffer pins and shared/exclusive content locks for the pages that were
 being accessed outside of shared buffers; otherwise someone might be
 looking at a stale copy of the page.

Nothing is unfixable.  The question is whether it's worth the cost.  Let me see 
if I can build a tree with Radislaw's patch, and do some real performance tests.

I, for one, am glad he did this work.  We've discussed MMAP in the code off and 
on for years, but nobody wanted to do the work to test it.  Now someone has, 
and we can decide whether it's worth pursuing based on the numbers.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
San Francisco

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-17 Thread Tom Lane
Joshua Berkus j...@agliodbs.com writes:
 I, for one, am glad he did this work.  We've discussed MMAP in the code off 
 and on for years, but nobody wanted to do the work to test it.  Now someone 
 has, and we can decide whether it's worth pursuing based on the numbers.

Well, the troubling issue is that it's not clear whether this patch is
realistic enough to think that performance measurements based on it
are representative of the whole idea of using mmap.  The business of
remapping individual buffers in order to transition them to writable
state seems likely to me to be a huge performance penalty --- first
there's the direct cost of having to incur a kernel call each time we
do that, and second there's the distributed cost of asking the kernel
to manage thousands or millions of tiny mappings.

IOW, if this patch shows little or no performance improvement (as seems
likely to happen at scale), that doesn't prove that mmap in general
isn't potentially interesting, only that this isn't the right way to
approach it.

Still, if you do some tests and don't find a win, that might save time
compared to actually trying to understand and vet the patch ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-17 Thread Robert Haas
On Sun, Apr 17, 2011 at 11:48 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 =?utf-8?q?Rados=C5=82aw_Smogura?= rsmog...@softperience.eu writes:
 Tom Lane t...@sss.pgh.pa.us Sunday 17 April 2011 01:35:45
 ... Huh?  Are you saying that you ask the kernel to map each individual
 shared buffer separately?  I can't believe that's going to scale to
 realistic applications.

 No, I do
 mrempa(mmap_buff_A, MAP_FIXED, temp);
 mremap(shared_buff_Y, MAP_FIXED, mmap_buff_A),
 mrempa(tmp, MAP_FIXED, mmap_buff_A).

 There's no mremap() in the Single Unix Spec, nor on my ancient HPUX box,
 nor on my quite-up-to-date OS X box.  The Linux man page for it says
 This call is Linux-specific, and should not be used in programs
 intended to be portable.  So if the patch is dependent on that call,
 it's dead on arrival from a portability standpoint.

 But in any case, you didn't explain how use of mremap() avoids the
 problem of the kernel having to maintain a separate page-mapping-table
 entry for each individual buffer.  (Per process, yet.)  If that's what's
 happening, it's going to be a significant performance penalty as well as
 (I suspect) a serious constraint on how many buffers can be managed.

I share your suspicions, although no harm in measuring it.

But I don't understand is how this approach avoids the problem of
different processes seeing different buffer contents.  If backend A
has the buffer mmap'd and backend B wants to modify it (and changes
the mapping), backend A is still looking at the old buffer contents,
isn't it?  And then things go boom.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-17 Thread Radosław Smogura
Andres Freund and...@anarazel.de Sunday 17 April 2011 20:02:11
 On Sunday 17 April 2011 19:26:31 Radosław Smogura wrote:
  Kernel merges vm_structs. So mappings are compacted. I'm not kernel
  specialist, but skipping memory consumption, for not compacted mappings,
  kernel uses btrees for dealing with  TLB, so it should not matter if
  there is  100 vm_structs or 10 vm_structs.
 
 But the CPUs TLB cache has maybe 16/256 (1lvl, 2nd) to 64/512 entries. That
 will mean that there will be cachemisses all over.
 Additionally your scheme requires flushing it regularly...
 
 Andres

I only know Phenom has 4096 entries I think and this covers 16MB of memory. 
But I was taking about memory usage of struct vm_struct in kernel. I tries as 
well with huge pages, but I can't write really fast allocator for this, it's 
slower then malloc, maybe from different reasons.

Regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-17 Thread Andres Freund
On Sunday 17 April 2011 22:09:24 Radosław Smogura wrote:
 I only know Phenom has 4096 entries I think and this covers 16MB of
 memory. 
The numbers I cited where intels before and after core2.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-17 Thread Radosław Smogura
Robert Haas robertmh...@gmail.com Sunday 17 April 2011 22:01:55
 On Sun, Apr 17, 2011 at 11:48 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  =?utf-8?q?Rados=C5=82aw_Smogura?= rsmog...@softperience.eu writes:
  Tom Lane t...@sss.pgh.pa.us Sunday 17 April 2011 01:35:45
  
  ... Huh?  Are you saying that you ask the kernel to map each individual
  shared buffer separately?  I can't believe that's going to scale to
  realistic applications.
  
  No, I do
  mrempa(mmap_buff_A, MAP_FIXED, temp);
  mremap(shared_buff_Y, MAP_FIXED, mmap_buff_A),
  mrempa(tmp, MAP_FIXED, mmap_buff_A).
  
  There's no mremap() in the Single Unix Spec, nor on my ancient HPUX box,
  nor on my quite-up-to-date OS X box.  The Linux man page for it says
  This call is Linux-specific, and should not be used in programs
  intended to be portable.  So if the patch is dependent on that call,
  it's dead on arrival from a portability standpoint.
  
  But in any case, you didn't explain how use of mremap() avoids the
  problem of the kernel having to maintain a separate page-mapping-table
  entry for each individual buffer.  (Per process, yet.)  If that's what's
  happening, it's going to be a significant performance penalty as well as
  (I suspect) a serious constraint on how many buffers can be managed.
 
 I share your suspicions, although no harm in measuring it.
 
 But I don't understand is how this approach avoids the problem of
 different processes seeing different buffer contents.  If backend A
 has the buffer mmap'd and backend B wants to modify it (and changes
 the mapping), backend A is still looking at the old buffer contents,
 isn't it?  And then things go boom.

Each process has simple mirror of shared descriptors.

I believe that modifications to buffer content may be only done when holding 
exclusive lock (with some simple exceptions) (+ MVCC), actually I saw only two 
things that can change already loaded data and cause damage, you have 
described (setting hint bits during scan, and vacuum - 1st may only cause, I 
think, that two processes will ask for same transaction statuses except 
vacuum, 2nd one is impossible as vacumm requires exclusive pin). When buffer 
tag is changed the version of buffer is bumped up, and checked against local 
version - this about reading buffer. 

In other cases after obtaining lock check is done if buffer has associated 
updatable buffer and if local mirror has it too, then swap should take 
place.

Logic about updatable buffers is similar to shared buffers, each updatable 
buffer has pin count, and updatable buffer can't be free if someone uses it, 
but in contrast to normal buffers, updatable buffers doesn't have any 
support for locking etc. Updatable buffers exists only on free list, or when 
associated with buffer.

In future, I will change version to shared segment id, something like 
relation's oid + block, but ids will have continuous numbering 1,2,3..., so I 
will be able to bypass smgr/md during read, and tag version check - this looks 
like faster solution.

Regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-17 Thread Robert Haas
On Sun, Apr 17, 2011 at 5:32 PM, Radosław Smogura
rsmog...@softperience.eu wrote:
 Each process has simple mirror of shared descriptors.

 I believe that modifications to buffer content may be only done when holding
 exclusive lock (with some simple exceptions) (+ MVCC), actually I saw only two
 things that can change already loaded data and cause damage, you have
 described (setting hint bits during scan, and vacuum - 1st may only cause, I
 think, that two processes will ask for same transaction statuses except
 vacuum, 2nd one is impossible as vacumm requires exclusive pin). When buffer
 tag is changed the version of buffer is bumped up, and checked against local
 version - this about reading buffer.

Yes, an exclusive lock is required for substantive content changes.
But if vacuum cleaning up the buffer is an issue for your patch, then
it's probably also a problem if someone grabs an exclusive content
lock and deletes the tuple (by setting XMAX) and some other backend
later sees the old buffer contents after having in the meanwhile taken
a new snapshot; or if likewise someone grabs an exclusive-lock, adds a
tuple, and then your backend takes a new snapshot and then sees the
old buffer contents.  Basically, any time someone grabs an
exclusive-lock and releases it, it's necessary for all observers to
see the updated contents by the time the exclusive lock is released.

 In other cases after obtaining lock check is done if buffer has associated
 updatable buffer and if local mirror has it too, then swap should take
 place.

I think this check would have to be done every time someone
share-locks the buffer, which seems rather expensive.

 Logic about updatable buffers is similar to shared buffers, each updatable
 buffer has pin count, and updatable buffer can't be free if someone uses it,
 but in contrast to normal buffers, updatable buffers doesn't have any
 support for locking etc. Updatable buffers exists only on free list, or when
 associated with buffer.

I don't see how you're going to get away with removing buffer locks.
They exist for a reason, and adding mmap() to the mix is going to
require MORE locking, not less.

 In future, I will change version to shared segment id, something like
 relation's oid + block, but ids will have continuous numbering 1,2,3..., so I
 will be able to bypass smgr/md during read, and tag version check - this looks
 like faster solution.

I don't understand this part at all.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-17 Thread Radosław Smogura
Robert Haas robertmh...@gmail.com Monday 18 April 2011 03:06:17
 On Sun, Apr 17, 2011 at 5:32 PM, Radosław Smogura
 
 rsmog...@softperience.eu wrote:
  Each process has simple mirror of shared descriptors.
  
  I believe that modifications to buffer content may be only done when
  holding exclusive lock (with some simple exceptions) (+ MVCC), actually
  I saw only two things that can change already loaded data and cause
  damage, you have described (setting hint bits during scan, and vacuum -
  1st may only cause, I think, that two processes will ask for same
  transaction statuses except vacuum, 2nd one is impossible as vacumm
  requires exclusive pin). When buffer tag is changed the version of
  buffer is bumped up, and checked against local version - this about
  reading buffer.
 
 Yes, an exclusive lock is required for substantive content changes.
 But if vacuum cleaning up the buffer is an issue for your patch, then
 it's probably also a problem if someone grabs an exclusive content
 lock and deletes the tuple (by setting XMAX) and some other backend
 later sees the old buffer contents after having in the meanwhile taken
 a new snapshot; or if likewise someone grabs an exclusive-lock, adds a
 tuple, and then your backend takes a new snapshot and then sees the
 old buffer contents.  Basically, any time someone grabs an
 exclusive-lock and releases it, it's necessary for all observers to
 see the updated contents by the time the exclusive lock is released.
  In other cases after obtaining lock check is done if buffer has
  associated updatable buffer and if local mirror has it too, then swap
  should take place.
 
 I think this check would have to be done every time someone
 share-locks the buffer, which seems rather expensive.
I don't treat as issues, but it's disadvantage.

  Logic about updatable buffers is similar to shared buffers, each
  updatable buffer has pin count, and updatable buffer can't be free if
  someone uses it, but in contrast to normal buffers, updatable buffers
  doesn't have any support for locking etc. Updatable buffers exists only
  on free list, or when associated with buffer.
 
 I don't see how you're going to get away with removing buffer locks.
 They exist for a reason, and adding mmap() to the mix is going to
 require MORE locking, not less.
 
  In future, I will change version to shared segment id, something like
  relation's oid + block, but ids will have continuous numbering 1,2,3...,
  so I will be able to bypass smgr/md during read, and tag version check -
  this looks like faster solution.
 
 I don't understand this part at all.
Versioning is witch approach where I thought about really often changes of 
mmaped areas, I allocated part of segments, but now the segment is mmaped with 
reservation, to it's full possible size, addresses of segments can't change 
(problem is only with segment deletion).

Regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-16 Thread Robert Haas
On Apr 16, 2011, at 1:48 AM, Greg Smith g...@2ndquadrant.com wrote:
 P.S. You know what else I feel should earn an automatic rejection without any 
 reviewer even looking at the code?

Greg is absolutely right. And to the two he listed, let me add another of my 
own gripes: failing to provide submission notes that explain how the patch 
works, and how it addresses the conceptually difficult issues raised 
previously. The OP says that this patch maintains the WAL-before-data rule 
without any explanation of how it accomplishes that seemingly quite amazing 
feat.  I assume I'm going to have to read this patch at some point to refute 
this assertion, and I think that sucks. I am pretty nearly 100% confident that 
this approach is utterly doomed, and I don't want to spend a lot of time on it 
unless someone can provide me with a compelling explanation of why my 
confidence is misplaced. But spending a lot of time on it is exactly what I'm 
going to have to do, because reading a undocumented patch full of spurious 
garbage to refute a hand-wavy claim of correctness is time-consuming, and if I 
give up on it without reading it, someone will yell unfair, unfair!

None of this is to say that I don't appreciate Radoslaw's interest in 
contributing, because I very much do. But I also think it's important to 
realize that we have a finite number of reviewers and they have finite time. 
Trying to minimize the amount of time that it takes someone to review or commit 
your patch is a service to the whole community, and we should acknowledge that 
it has value and appreciate the people who consistently do it.

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-16 Thread Tom Lane
Greg Smith g...@2ndquadrant.com writes:
 Reading one's own diff and reflecting on what you've changed is one of 
 the extremely underappreciated practices of good open-source software 
 development.  Minimizing the size of that diff is perhaps the most 
 important thing someone can do in order to make their changes to a piece 
 of software better.  Not saying something that leads in that direction 
 would be a disservice to the submitter.

A couple further comments on that thought:

* Another argument for avoiding unnecessary changes is that the larger
your patch's change footprint, the more likely it is to create merge
conflicts for people working on other patches.  Now, if they're
necessary changes, that's the price of parallelism in development.
But gratuitous whitespace changes add nothing and they do have costs.

* On the other side of the coin, I have seen many a patch that was
written to minimize the length of the diff to the detriment of
readability or maintainability of the resulting code, and that's *not*
a good tradeoff.  Always do what makes the most sense from a long-run
perspective.

I keep wanting to do a talk arguing that everything you need to know
about good patch style can be derived from the mantra Make the patch
look like the code had always been there.  If the functionality had
been designed in on day one, where would it be placed and how would it
be coded?  You might be able to make the patch diff shorter with some
shortcut or other, but that's not the way to do it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-16 Thread Greg Smith

Tom Lane wrote:

* On the other side of the coin, I have seen many a patch that was
written to minimize the length of the diff to the detriment of
readability or maintainability of the resulting code, and that's *not*
a good tradeoff.
  


Sure. that's possible.  But based on the reviews I've done, I'd say that 
the fact someone is even aware that minimizing their diff is something 
important to consider automatically puts them far ahead of the average 
new submitter.  There are a high percentage of patches where the 
submitter generates a diff and sents it without even looking at it.  
That a person would look at their diff and go too far without trying to 
make it small doesn't happen nearly as much.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-16 Thread Greg Smith

Robert Haas wrote:

The OP says that this patch maintains the WAL-before-data rule without any 
explanation of how it accomplishes that seemingly quite amazing feat.  I assume 
I'm going to have to read this patch at some point to refute this assertion, 
and I think that sucks.


I don't think you have to read any patch that doesn't follow the 
submission guidelines.  The fact that you do is a great contribution to 
the community.  But if I were suggesting how your time would be best 
spent improving PostgreSQL, reviewing patches that don't meet coding 
standards would be at the bottom of the list.  There's always something 
better for the project you could be working on instead.


I just added 
http://wiki.postgresql.org/wiki/Submitting_a_Patch#Reasons_your_patch_might_be_returned 
, recycling some existing text, adding some new suggestions.


I hope I got the tone of that text right.  The intention was to have a 
polite but clear place to point submitters to when their suggestion 
doesn't meet the normal standards here, such that they might even get 
bounced before even entering normal CommitFest review.  This MMAP patch 
looks like it has all 5 of the problems mentioned on that now more 
focused list.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-16 Thread Greg Stark
On Sat, Apr 16, 2011 at 7:24 AM, Robert Haas robertmh...@gmail.com wrote:
 The OP says that this patch maintains the WAL-before-data rule without any 
 explanation of how it accomplishes that seemingly quite amazing feat.  I 
 assume I'm going to have to read this patch at some point to refute this 
 assertion, and I think that sucks. I am pretty nearly 100% confident that 
 this approach is utterly doomed, and I don't want to spend a lot of time on 
 it unless someone can provide me with a compelling explanation of why my 
 confidence is misplaced.

Fwiw he did explain how he did that. Or at least I think he did --
it's possible I read what I expected because what he came up with is
something I've recently been thinking about.

What he did, I gather, is treat the mmapped buffers as a read-only
copy of the data. To actually make any modifications he copies it into
shared buffers and treats them like normal. When the buffers get
flushed from memory they get written and then the pointers get
repointed back at the mmapped copy. Effectively this means the shared
buffers get extended to include all of the filesystem cache instead of
having to evict buffers from shared buffers just because you want to
read another one that's already in filesystem cache.

It doesn't save the copying between filesystem cache and shared
buffers for buffers that are actually being written to. But it does
save some amount of other copies on read-only traffic and it can even
save some i/o. It does require a function call before each buffer
modification where the pattern is currently lock buffer, mutate
buffer, mark buffer dirty. From what he describes he needs to add a
prepare buffer for mutation between the lock and mutate.

I think it's an interesting experiment and it's good to know how to
solve some of the subproblems. Notably, how do you extend files or
drop them atomically across processes? And how do you deal with
getting the mappings to be the same across all the processes or deal
with them being different? But I don't think it's a great long-term
direction. It just seems clunky to have to copy things from mmapped
buffers to local buffers and back. Perhaps the performance testing
will show that clunkiness is well worth it but we'll need to see that
for a wide variety of workloads to judge that.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-16 Thread Radosław Smogura
Greg Stark gsst...@mit.edu Saturday 16 April 2011 13:00:19
 On Sat, Apr 16, 2011 at 7:24 AM, Robert Haas robertmh...@gmail.com wrote:
  The OP says that this patch maintains the WAL-before-data rule without
  any explanation of how it accomplishes that seemingly quite amazing
  feat.  I assume I'm going to have to read this patch at some point to
  refute this assertion, and I think that sucks. I am pretty nearly 100%
  confident that this approach is utterly doomed, and I don't want to
  spend a lot of time on it unless someone can provide me with a
  compelling explanation of why my confidence is misplaced.
 
 Fwiw he did explain how he did that. Or at least I think he did --
 it's possible I read what I expected because what he came up with is
 something I've recently been thinking about.
 
 What he did, I gather, is treat the mmapped buffers as a read-only
 copy of the data. To actually make any modifications he copies it into
 shared buffers and treats them like normal. When the buffers get
 flushed from memory they get written and then the pointers get
 repointed back at the mmapped copy. Effectively this means the shared
 buffers get extended to include all of the filesystem cache instead of
 having to evict buffers from shared buffers just because you want to
 read another one that's already in filesystem cache.
 
 It doesn't save the copying between filesystem cache and shared
 buffers for buffers that are actually being written to. But it does
 save some amount of other copies on read-only traffic and it can even
 save some i/o. It does require a function call before each buffer
 modification where the pattern is currently lock buffer, mutate
 buffer, mark buffer dirty. From what he describes he needs to add a
 prepare buffer for mutation between the lock and mutate.
 
 I think it's an interesting experiment and it's good to know how to
 solve some of the subproblems. Notably, how do you extend files or
 drop them atomically across processes? And how do you deal with
 getting the mappings to be the same across all the processes or deal
 with them being different? But I don't think it's a great long-term
 direction. It just seems clunky to have to copy things from mmapped
 buffers to local buffers and back. Perhaps the performance testing
 will show that clunkiness is well worth it but we'll need to see that
 for a wide variety of workloads to judge that.

In short words, I swap, exchange (clash of terms) VM pages to prevent pointers 
(only if needed). I tried to directly point to new memory area, but I saw that 
some parts of code really depends on memory pointed by original pointers, e.g. 
Vaccumm uses hint bits setted by previous scan (it depends on this if bit is 
set or not! so for it it's not only hint). Just from this case I can't assume 
there is no more such places, so VM pages swap does it for me.

Stand alone tests shows for me that this process (with copy from mmap) is 
2x-3x time longer then previous. But until someone will not update whole 
table, then benefit will be taken from pre-update scan, index scans, larger 
availability of memory (you don't eat cache memory to keep copy of cache in 
ShMem). Everything may be slower when database fits in ShMem, and similarly 
(2nd level bufferes may increase performance slightly).

I reserve memory for whole segment even if file is smaller. Extending is by 
wirte one byte at the end of block (here may come deal with Unfiorm Buffer 
Caches, if I remember name well). For current processors, and current 
implementation database size is limited to about 260TB (no dynamic segment 
reservation is performed).

Truncation not implemented.

Each buffer descriptor has tagVersion to simple check if buffer tag has 
changed. Descriptors (partially) are mirrored in local memory, and versions 
are checked. Currently each re-read (is pointed to smgr/md), but introduce 
shared segment id, and assuming each segment has constant maximum number of 
blocks, will make it faster (this will be something like current buffer tag), 
even version field will be unneeded.

I saw problems with vacuum, as it reopens relation and I got mappings of same 
file twice (minor problem). Important will be about deletion, when pointers 
must invalidated in good way.

Regards,
Radek.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-16 Thread Marko Kreen
On Sat, Apr 16, 2011 at 8:48 AM, Greg Smith g...@2ndquadrant.com wrote:
 Joshua Berkus wrote:

 Guys, can we *please* focus on the patch for now, rather than the
 formatting, which is fixable with sed?


 Never, and that's not true.  Heikki was being nice; I wouldn't have even
 slogged through it long enough to ask the questions he did before kicking it
 back as unusable.  A badly formatted patch makes it impossible to evaluate
 whether the changes from a submission are reasonable or not without the
 reviewer fixing it first.  And you can't automate correcting it, it takes a
 lot of tedious manual work.  Start doing a patch review every CommitFest
 cycle and you very quickly realize it's not an ignorable problem.  And lack
 of discipline in minimizing one's diff is always a sign of other code
 quality issues.

 Potential contributors to PostgreSQL should know that a badly formatted
 patch faces an automatic rejection, because no reviewer can work with it
 easily.  This fact is not a mystery; in fact it's documented at
 http://wiki.postgresql.org/wiki/Submitting_a_Patch :  The easiest way to
 get your patch rejected is to make lots of unrelated changes, like
 reformatting lines, correcting comments you felt were poorly worded etc.
 Each patch should have the minimum set of changes required to fulfil the
 single stated objective.  I think I'll go improve that text next--something
 like Ways to get your patch rejected should be its own section.

 The problem here isn't whether someone used an IDE or not, it's that this
 proves they didn't read their own patch before submitting it.  Reading one's
 own diff and reflecting on what you've changed is one of the extremely
 underappreciated practices of good open-source software development.
  Minimizing the size of that diff is perhaps the most important thing
 someone can do in order to make their changes to a piece of software better.
  Not saying something that leads in that direction would be a disservice to
 the submitter.

 P.S. You know what else I feel should earn an automatic rejection without
 any reviewer even looking at the code?  Submitting a patch that claims to
 improve performance and not attaching the test case you used, along with
 detailed notes about before/after tests on your own hardware.  A hand wave
 it's faster is never good enough, and it's extremely wasteful of our
 limited reviewer resources to try and duplicate what the submitter claimed.
  Going to add something about that to the submission guidelines too.

Give the OP a break - he was not re-styling, he was clearly trying to make
crappily indented Postgres code readable in his editor.  Reading the patch
would not matter because the original code would still be crappily indented.

Yes, such patch is bad, but what should the proper response be in such
situation?

Hint: it can be both polite and short.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-16 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 What he did, I gather, is treat the mmapped buffers as a read-only
 copy of the data. To actually make any modifications he copies it into
 shared buffers and treats them like normal. When the buffers get
 flushed from memory they get written and then the pointers get
 repointed back at the mmapped copy.

That seems much too late --- won't other processes still be looking at
the stale mmap'ed version of the page until a write-out happens?

I'm pretty concerned about the memory efficiency of this too, since it
seems like it's making it *guaranteed*, not just somewhat probable,
that there are two copies in RAM of every database page that's been
modified since the last checkpoint (or so).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-16 Thread Peter Eisentraut
On Fri, 2011-04-15 at 12:32 +0200, Radosław Smogura wrote:
 I didn't included this, as diff, because of ~150kb size (mainly 
  configure scripts, which are included in SVC). Due to this, You may 
  download it from 
  http://softperience.eu/downloads/pg_mmap_20110415.diff.bz2 (Legal:
 Work 
  under PostgreSQL BSD Lincense). Patch is just GIT diff, later I will
 try 
  to grab some git.

Btw., about 87% of this patch are diffs against configure and
pg_config.h.in, which are useless.  If you strip those out, your patch
will be small enough to submit inline.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-16 Thread Joshua Berkus
Radoslaw,

 I think 10% is quite good, as my stand-alone test of mmap vs. read
 shown that
 speed up of copying 100MB data to mem may be from ~20ms to ~100ms
 (depends on
 destination address). Of course deeper, system test simulating real
 usage will
 say more. In any case after good deals with writes, I will speed up
 reads. I
 think to bypass smgr/md much more and to expose shared id's (1,2,3...)
 for
 each file segment.

Well, given the risks to durability and stability associated with using MMAP, I 
doubt anyone would even consider it for a 10% throughput improvement.  However, 
I don't think the test you used demonstrates the best case for MMAP as a 
performance improvement.

 In attachment I sent test-scripts which I used to fill data, nothing
 complex
 (left from 2nd level caches).
 
 Query I've used to measure was SELECT count(substr(content, 1, 1))
 FROM
 testcase1 WHERE multi_id  5;
 
 Timings ware taken from psql.
 
 I didn't made load (I have about 2GB of free sapce at /home, and 4GB
 RAM) and
 stress (I'm not quite ready to try concurrent updates of same page -
 may fail,
 notice is and place to fix is in code) tests yet.

Yes, but this test case doesn't offer much advantage to MMAP.  Where I expect 
it would shine would be cases where the database is almost as big as, or much 
bigger than RAM ... where the extra data copying by current code is both 
frequent and wastes buffer space we need to use.  As well as concurrent reads 
from the same rows.

You can write a relatively simple custom script using pgBench to test this; you 
don't need a big complicated benchmark.  Once we get over the patch cleanup 
issues, I might be able to help with this. 

 Netbeans is quite good, of course it depends who likes what. Just try
 7.0 RC
 2.

I don't know if you've followed the formatting discussion, but apparently 
there's an issue with Netbeans re-indenting lines you didn't even edit.  It 
makes your patch hard to read or apply.  I expect that Netbeans has some method 
to reconfigure indenting, etc.; do you think you could configure it to 
PostgresQL standards so that this doesn't get in the way of evaluation of your 
ideas?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
San Francisco

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-16 Thread Marko Kreen
On Fri, Apr 15, 2011 at 3:12 PM, Radosław Smogura
rsmog...@softperience.eu wrote:
 On Fri, 15 Apr 2011 14:33:37 +0300, Heikki Linnakangas wrote:
 The patch is quite hard to read because of random whitespace changes
 and other stylistic issues, but I have a couple of high-level
 questions on the design:

 Yes, but, hmm... in Netbeans I had really long gaps (probably 8 spaces, from
 tabs), so deeper ifs, comments at the and of variables, went of out my
 screen. I really wanted to not format this, but sometimes I needed.

Seems no one else has mentioned it yet -- Postgres uses non-standard
tab-width of 4, instead of 8.  If you see ugly code in Postgres, thats why...

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-16 Thread Radosław Smogura
Tom Lane t...@sss.pgh.pa.us Saturday 16 April 2011 17:02:32
 Greg Stark gsst...@mit.edu writes:
  What he did, I gather, is treat the mmapped buffers as a read-only
  copy of the data. To actually make any modifications he copies it into
  shared buffers and treats them like normal. When the buffers get
  flushed from memory they get written and then the pointers get
  repointed back at the mmapped copy.
 
 That seems much too late --- won't other processes still be looking at
 the stale mmap'ed version of the page until a write-out happens?
No, no, no :) I wanted to do this, but from above reason I skipped it. I swap 
VM pages, I do remap, in place where the shared buffer was I put mmaped page, 
and in place where mmaped page was I put shared page (in certain cases, which 
should be optimized by e. g. read for update, for initial read of page in 
process I directly points to shared buffer), it can be imagined as I affects 
TLB. This what I call VM swap is remapping, so I don't change pointers, I 
change only where this pointers points in physical memory, preserving same 
pointer in Virtual Memory.

if 0x1 is start of buffer 1 (at relation 1, block 1)
I have 
0x1 - 0x1 + BLCKSZ - mmaped area
0xf1000 - 0xf1000 + BLCKSZ - Shmem

SWAP
0x1 - 0x1 + BLCKSZ - Shmem 
0xf1000 - 0xf1000 + BLCKSZ - mmaped area

It's reason I putted in crash reports /proc/{pid}/maps. For e. g. maps after 
swap looks like (from crash report):

[...]
#Data mappings
7fe69b7e3000-7fe69b7ef000 r--s  08:03 3196408
/home/radek/src/postgresql-2nd-level-cache/db/base/12822/12516
7fe69b7ef000-7fe69b7f1000 rw-s 00148000 00:04 8880132
/SYSV0052ea91 (deleted)
7fe69b7f1000-7fe6db7e3000 r--s e000 08:03 3196408
/home/radek/src/postgresql-2nd-level-cache/db/base/12822/12516
[...]
#SysV shmem mappings

7fec60788000-7fec6078c000 rw-s 00144000 00:04 8880132
/SYSV0052ea91 (deleted)
7fec6078c000-7fec6078e000 r--s c000 08:03 3196408
/home/radek/src/postgresql-2nd-level-cache/db/base/12822/12516
7fec6078e000-7fec6079c000 rw-s 0014a000 00:04 8880132
/SYSV0052ea91 (deleted)
[...]

Without swap 12516 should be mapped to one VM region of size equal to 
BLCKSZ*BLOCKS_PER_SEGMENT (which is about 1GB).

When process reads buffer (or after taking lock), the shared buffer descriptor 
is checked if page was modified (currently is it dirty) if yes do swap, if 
page is currently in use, or use directly SysV shared areas if pages is just 
pinned to process.

Regards,
Radek

 I'm pretty concerned about the memory efficiency of this too, since it
 seems like it's making it *guaranteed*, not just somewhat probable,
 that there are two copies in RAM of every database page that's been
 modified since the last checkpoint (or so).
 
   regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-16 Thread Greg Smith

Radosław Smogura wrote:
Yes, but, hmm... in Netbeans I had really long gaps (probably 8 
spaces, from tabs), so deeper ifs, comments at the and of variables, 
went of out my screen. I really wanted to not format this, but 
sometimes I needed.


The guide at 
http://www.open-source-editor.com/editors/how-to-make-netbeans-use-tabs-for-indention.html 
seems to cover how to fix this in Netbeans.  You want it to look like 
that screen shot:  4 spaces per indent with matching tab size of 4, and 
Expand Tabs to Spaces unchecked.


Generally, if you look at the diff you've created, and your new code 
doesn't line up right with what's already there, that means the 
tab/space setup isn't quite right when you were editing.  Reading the 
diff is useful for catching all sorts of other issues, too, so it's just 
generally a good practice.  As Peter already mentioned, the big problem 
here is that you checked in a modified configure file.


I also note that you use C++ style // comments, which aren't allowed 
under the coding guidelines--even though they work fine on many common 
platforms.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-16 Thread Tom Lane
=?utf-8?q?Rados=C5=82aw_Smogura?= rsmog...@softperience.eu writes:
 No, no, no :) I wanted to do this, but from above reason I skipped it. I swap 
 VM pages, I do remap, in place where the shared buffer was I put mmaped page, 
 and in place where mmaped page was I put shared page (in certain cases, which 
 should be optimized by e. g. read for update, for initial read of page in 
 process I directly points to shared buffer), it can be imagined as I affects 
 TLB. This what I call VM swap is remapping, so I don't change pointers, I 
 change only where this pointers points in physical memory, preserving same 
 pointer in Virtual Memory.

... Huh?  Are you saying that you ask the kernel to map each individual
shared buffer separately?  I can't believe that's going to scale to
realistic applications.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-16 Thread Robert Haas
On Sat, Apr 16, 2011 at 2:33 PM, Joshua Berkus j...@agliodbs.com wrote:
 Well, given the risks to durability and stability associated with using MMAP, 
 I doubt anyone would even consider it for a 10% throughput improvement.  
 However, I don't think the test you used demonstrates the best case for MMAP 
 as a performance improvement.

Actually, I'd walk through fire for a 10% performance improvement if
it meant only a *risk* to stability.  The problem is that this is
likely unfixably broken.  In particular, I think the first sentence of
Tom's response hit it right on the nose, and mirrors my own thoughts
on the subject.  To have any chance of working, you'd need to track
buffer pins and shared/exclusive content locks for the pages that were
being accessed outside of shared buffers; otherwise someone might be
looking at a stale copy of the page.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-16 Thread Robert Haas
On Sat, Apr 16, 2011 at 9:31 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Apr 16, 2011 at 2:33 PM, Joshua Berkus j...@agliodbs.com wrote:
 Well, given the risks to durability and stability associated with using 
 MMAP, I doubt anyone would even consider it for a 10% throughput 
 improvement.  However, I don't think the test you used demonstrates the best 
 case for MMAP as a performance improvement.

 Actually, I'd walk through fire for a 10% performance improvement if
 it meant only a *risk* to stability.  The problem is that this is
 likely unfixably broken.  In particular, I think the first sentence of
 Tom's response hit it right on the nose, and mirrors my own thoughts
 on the subject.  To have any chance of working, you'd need to track
 buffer pins and shared/exclusive content locks for the pages that were
 being accessed outside of shared buffers; otherwise someone might be
 looking at a stale copy of the page.

Of course, maybe the patch is doing that.  Rereading the thread, I
grow increasingly confused about what this is actually supposed to do
and how it's supposed to work and why it's supposedly better than what
we do now.  But please, everyone feel free to continue bashing me for
wanting a readable patch with some understandable submission notes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-15 Thread Heikki Linnakangas

On 15.04.2011 13:32, Radosław Smogura wrote:

If I may, I want to share some concept to use mmap in PG. It's far, far
away from perfect, but it's keeps WAL before data. As well I crated
table, with index, inserted few values, and I done vacuum full on this
table. Db inits welcome from orginal sources.

Performance of read (if backend is loaded) is really good, query time
goes down from 450ms to about 410ms. Update may be slower - but work is
in progress (I will start with write, as I went to point when simple
updates may be performed). Even that I didn't covered all aspects off
updating, it's simple to do it, just to call PreopareBufferToUpdate
before modifing buffer, ofc some ideas of increasing this are still in
my head.

Any comments, suggestions welcome.


The patch is quite hard to read because of random whitespace changes and 
other stylistic issues, but I have a couple of high-level questions on 
the design:


* Does each process have its own mmappings, or are the mmapping done to 
shared buffers?


* How do you handle locking? Do you still need to allocate a shared 
buffer for each mmapped page?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-15 Thread Radosław Smogura

On Fri, 15 Apr 2011 14:33:37 +0300, Heikki Linnakangas wrote:

On 15.04.2011 13:32, Radosław Smogura wrote:
If I may, I want to share some concept to use mmap in PG. It's far, 
far

away from perfect, but it's keeps WAL before data. As well I crated
table, with index, inserted few values, and I done vacuum full on 
this

table. Db inits welcome from orginal sources.

Performance of read (if backend is loaded) is really good, query 
time
goes down from 450ms to about 410ms. Update may be slower - but work 
is

in progress (I will start with write, as I went to point when simple
updates may be performed). Even that I didn't covered all aspects 
off

updating, it's simple to do it, just to call PreopareBufferToUpdate
before modifing buffer, ofc some ideas of increasing this are still 
in

my head.

Any comments, suggestions welcome.


The patch is quite hard to read because of random whitespace changes
and other stylistic issues, but I have a couple of high-level
questions on the design:
Yes, but, hmm... in Netbeans I had really long gaps (probably 8 spaces, 
from tabs), so deeper ifs, comments at the and of variables, went of 
out my screen. I really wanted to not format this, but sometimes I 
needed.



* Does each process have its own mmappings, or are the mmapping done
to shared buffers?
Those are MAP_SHARED mappings, but each process has it's own pointer to 
this.



* How do you handle locking?
I do not do locking... I do different thing (worst)... When buffer 
should be updated, it gets shared buffer, content is copied (so 
situation almost same like fread), and depending on situation content is 
used directly or pages between mmaping and shared (updatable) regions 
are swapped - it keeps tuple pointers, etc. I really would be happy if 
such method (lock flushing to file) could exists.



Do you still need to allocate a shared buffer for each mmapped page?
Currently each mmaped page has additional shared buffer, but it's 
almost ready to use independent pool of shared buffers. This will be 
good, as mmaped buffers could cover whole system cache, keeping maybe 
10%-20% of this size for write in SHMEM.


I think about MAP_PRIVATE, but those has some pluses and minuses, e.g. 
MAP_SHARED may be, for less critical systems, simplier equipped with GUC

mmap_direct_write=true.

Regards,
Radek


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-15 Thread Andrew Dunstan



On 04/15/2011 08:12 AM, Radosław Smogura wrote:

The patch is quite hard to read because of random whitespace changes
and other stylistic issues, but I have a couple of high-level
questions on the design:


Yes, but, hmm... in Netbeans I had really long gaps (probably 8 
spaces, from tabs), so deeper ifs, comments at the and of variables, 
went of out my screen. I really wanted to not format this, but 
sometimes I needed.





Netbeans is possibly not very well suited to working on postgres code. 
AFAIK emacs and/or vi(m) are used by almost all the major developers.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-15 Thread Joshua Berkus
Radoslaw,

10% improvement isn't very impressive from a switch to mmap.  What workload did 
you test with?  What I'd really like to see is testing with databases which are 
50%, 90% and 200% the size of RAM ... that's where I'd expect the greatest gain 
from limiting copying. 

 Netbeans is possibly not very well suited to working on postgres code. 
 AFAIK emacs and/or vi(m) are used by almost all the major developers.

Guys, can we *please* focus on the patch for now, rather than the formatting, 
which is fixable with sed?
-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
San Francisco

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-15 Thread Radosław Smogura
Joshua Berkus j...@agliodbs.com Friday 15 April 2011 18:55:04
 Radoslaw,
 
 10% improvement isn't very impressive from a switch to mmap.  What workload
 did you test with?  What I'd really like to see is testing with databases
 which are 50%, 90% and 200% the size of RAM ... that's where I'd expect
 the greatest gain from limiting copying.
I think 10% is quite good, as my stand-alone test of mmap vs. read shown that 
speed up of copying 100MB data to mem may be from ~20ms to ~100ms (depends on 
destination address). Of course deeper, system test simulating real usage will 
say more. In any case after good deals with writes, I will speed up reads. I 
think to bypass smgr/md much more and to expose shared id's (1,2,3...) for 
each file segment.

Going to topic...

In attachment I sent test-scripts which I used to fill data, nothing complex 
(left from 2nd level caches).

Query I've used to measure was SELECT count(substr(content, 1, 1)) FROM 
testcase1 WHERE multi_id  5;

Timings ware taken from psql.

I didn't made load (I have about 2GB of free sapce at /home, and 4GB RAM) and 
stress (I'm not quite ready to try concurrent updates of same page - may fail, 
notice is and place to fix is in code) tests yet.

  Netbeans is possibly not very well suited to working on postgres code.
  AFAIK emacs and/or vi(m) are used by almost all the major developers.
 
 Guys, can we *please* focus on the patch for now, rather than the
 formatting, which is fixable with sed?
Netbeans is quite good, of course it depends who likes what. Just try 7.0 RC 
2.

Regards,
Radek


test-scritps_20110319_0026.tar.bz2
Description: application/bzip-compressed-tar

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-15 Thread Greg Smith

Joshua Berkus wrote:

Guys, can we *please* focus on the patch for now, rather than the formatting, 
which is fixable with sed?
  


Never, and that's not true.  Heikki was being nice; I wouldn't have even 
slogged through it long enough to ask the questions he did before 
kicking it back as unusable.  A badly formatted patch makes it 
impossible to evaluate whether the changes from a submission are 
reasonable or not without the reviewer fixing it first.  And you can't 
automate correcting it, it takes a lot of tedious manual work.  Start 
doing a patch review every CommitFest cycle and you very quickly realize 
it's not an ignorable problem.  And lack of discipline in minimizing 
one's diff is always a sign of other code quality issues.


Potential contributors to PostgreSQL should know that a badly formatted 
patch faces an automatic rejection, because no reviewer can work with it 
easily.  This fact is not a mystery; in fact it's documented at 
http://wiki.postgresql.org/wiki/Submitting_a_Patch :  The easiest way 
to get your patch rejected is to make lots of unrelated changes, like 
reformatting lines, correcting comments you felt were poorly worded etc. 
Each patch should have the minimum set of changes required to fulfil the 
single stated objective.  I think I'll go improve that text 
next--something like Ways to get your patch rejected should be its own 
section.


The problem here isn't whether someone used an IDE or not, it's that 
this proves they didn't read their own patch before submitting it.  
Reading one's own diff and reflecting on what you've changed is one of 
the extremely underappreciated practices of good open-source software 
development.  Minimizing the size of that diff is perhaps the most 
important thing someone can do in order to make their changes to a piece 
of software better.  Not saying something that leads in that direction 
would be a disservice to the submitter.


P.S. You know what else I feel should earn an automatic rejection 
without any reviewer even looking at the code?  Submitting a patch that 
claims to improve performance and not attaching the test case you used, 
along with detailed notes about before/after tests on your own 
hardware.  A hand wave it's faster is never good enough, and it's 
extremely wasteful of our limited reviewer resources to try and 
duplicate what the submitter claimed.  Going to add something about that 
to the submission guidelines too.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers