Re: [Lsf-pc] [HACKERS] Re: Linux kernel impact on PostgreSQL performance (summary v2 2014-1-17)

2014-01-20 Thread Mel Gorman
On Fri, Jan 17, 2014 at 11:01:15AM -0800, Josh Berkus wrote:
 Mel,
 

Hi,

 So we have a few interested parties.  What do we need to do to set up
 the Collab session?
 

This is great and thanks!

There are two summits of interest here -- LSF/MM which will have all the
filesystem, storage and memory managemnet people at it on March 24-25th
and Collaboration Summit which is on March 26-28th. We're interested in
both.

The LSF/MM committe are going through the first round of topic proposals at
the moment and we're aiming to send out the first set of invites soon. We're
hoping to invite two PostgreSQL people to LSF/MM itself for the dedicated
topic and your feedback on other topics and how they may help or hinder
PostgreSQL would be welcomed.

As LSF/MM is a relatively closed forum I'll be looking into having a
follow-up discussion at Collaboration Summit that is open to a wider and
more dedicated group. That hopefully will result in a small number of
concrete proposals that can be turned into patches over time.

-- 
Mel Gorman
SUSE Labs


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Re: Linux kernel impact on PostgreSQL performance (summary v2 2014-1-17)

2014-01-20 Thread Andres Freund
On 2014-01-17 18:34:25 +, Mel Gorman wrote:
  The scheme that'd allow us is the following:
  When postgres reads a data page, it will continue to first look up the
  page in its shared buffers, if it's not there, it will perform a page
  cache backed read, but instruct that read to immediately remove from the
  page cache afterwards (new API or, posix_fadvise() or whatever).
  As long
  as it's in shared_buffers, postgres will not need to issue new reads, so
  there's no no benefit keeping it in the page cache.
  If the page is dirtied, it will be written out normally telling the
  kernel to forget about the caching the page (using 3) or possibly direct
  io).
  When a page in postgres's buffers (which wouldn't be set to very large
  values) isn't needed anymore and *not* dirty, it will seed the kernel
  page cache with the current data.
  
 
 Ordinarily the initial read page could be discarded with fadvise but
 the later write would cause the data to be read back in again which is a
 waste. The details of avoiding that re-read are tricky from a core kernel
 perspective because ordinarily the kernel at that point does not know if
 the write is a full complete aligned write of an underlying filesystem
 structure or not.  It may need a different write path which potentially
 leads into needing changes to the address_space operations on a filesystem
 basis -- that would get messy and be a Linux-specific extension. I have
 not researched this properly at all, I could be way off but I have a
 feeling the details get messy.

Hm. This is surprising me a bit - and I bet it does hurt postgres
noticeably if that's the case since the most frequently modified buffers
will only be written out to the OS once every checkpoint but never be
read-in. So they are likely not to be hot enough to stay cached under
cache-pressure.
So this would be a generally beneficial feature - and I doubt it's only
postgres that'd benefit.

  Now, such a scheme wouldn't likely be zero-copy, but it would avoid
  double buffering.
 
 It wouldn't be zero copy because minimally the data needs to be handed
 over the filesystem for writing to the disk and the interface for that is
 offset,length based, not page based. Maybe sometimes it will be zero copy
 but it would be a filesystem-specific thing.

Exactly.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Re: Linux kernel impact on PostgreSQL performance (summary v2 2014-1-17)

2014-01-17 Thread Mel Gorman
On Fri, Jan 17, 2014 at 06:14:37PM +0100, Andres Freund wrote:
 Hi Mel,
 
 On 2014-01-17 16:31:48 +, Mel Gorman wrote:
  Direct IO, buffered IO, double buffering and wishlists
  --
 3. Hint that a page should be dropped immediately when IO completes.
There is already something like this buried in the kernel internals
and sometimes called immediate reclaim which comes into play when
pages are bgin invalidated. It should just be a case of investigating
if that is visible to userspace, if not why not and do it in a
semi-sensible fashion.
 
 bgin invalidated?
 

s/bgin/being/

I admit that invalidated in this context is very vague and I did
not explain myself. This paragraph should remind anyone familiar with
VM internals about what happens when invalidate_mapping_pages calls
deactivate_page and how PageReclaim pages are treated by both page reclaim
and end_page_writeback handler. It's similar but not identical to what
Postgres wants and is a reasonable starting position for an implementation.

 Generally, +1 on the capability to achieve such a behaviour from
 userspace.
 
 7. Allow userspace process to insert data into the kernel page cache
without marking the page dirty. This would allow the application
to request that the OS use the application copy of data as page
cache if it does not have a copy already. The difficulty here
is that the application has no way of knowing if something else
has altered the underlying file in the meantime via something like
direct IO. Granted, such activity has probably corrupted the database
already but initial reactions are that this is not a safe interface
and there are coherency concerns.
 
 I was one of the people suggesting that capability in this thread (after
 pondering about it on the back on my mind for quite some time), and I
 first though it would never be acceptable for pretty much those
 reasons.
 But on second thought I don't think that line of argument makes too much
 sense. If such an API would require write permissions on the file -
 which it surely would - it wouldn't allow an application to do anything
 it previously wasn't able to.
 And I don't see the dangers of concurrent direct IO as anything
 new. Right now the page's contents reside in userspace memory and aren't
 synced in any way with either the page cache or the actual on disk
 state. And afaik there are already several data races if a file is
 modified and read both via the page cache and direct io.
 

All of this is true.  The objections may not hold up over time and it may
be seem much more reasonable when/if the easier stuff is addressed.

 The scheme that'd allow us is the following:
 When postgres reads a data page, it will continue to first look up the
 page in its shared buffers, if it's not there, it will perform a page
 cache backed read, but instruct that read to immediately remove from the
 page cache afterwards (new API or, posix_fadvise() or whatever).
 As long
 as it's in shared_buffers, postgres will not need to issue new reads, so
 there's no no benefit keeping it in the page cache.
 If the page is dirtied, it will be written out normally telling the
 kernel to forget about the caching the page (using 3) or possibly direct
 io).
 When a page in postgres's buffers (which wouldn't be set to very large
 values) isn't needed anymore and *not* dirty, it will seed the kernel
 page cache with the current data.
 

Ordinarily the initial read page could be discarded with fadvise but
the later write would cause the data to be read back in again which is a
waste. The details of avoiding that re-read are tricky from a core kernel
perspective because ordinarily the kernel at that point does not know if
the write is a full complete aligned write of an underlying filesystem
structure or not.  It may need a different write path which potentially
leads into needing changes to the address_space operations on a filesystem
basis -- that would get messy and be a Linux-specific extension. I have
not researched this properly at all, I could be way off but I have a
feeling the details get messy.

 Now, such a scheme wouldn't likely be zero-copy, but it would avoid
 double buffering.

It wouldn't be zero copy because minimally the data needs to be handed
over the filesystem for writing to the disk and the interface for that is
offset,length based, not page based. Maybe sometimes it will be zero copy
but it would be a filesystem-specific thing.

 I think the cost of buffer copying has been overstated
 in this thread... he major advantage is that all that could easily
 implemented in a very localized manner, without hurting other OSs and it
 could easily degrade on kernels not providing that capability, which
 would surely be the majority of installations for the next couple of
 cases.
 
 So, I think such an interface would be hugely 

Re: [Lsf-pc] [HACKERS] Re: Linux kernel impact on PostgreSQL performance (summary v2 2014-1-17)

2014-01-17 Thread Josh Berkus
Mel,

So we have a few interested parties.  What do we need to do to set up
the Collab session?


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers