On Mon 13-01-14 22:36:06, Mel Gorman wrote:
> On Mon, Jan 13, 2014 at 06:27:03PM -0200, Claudio Freire wrote:
> > On Mon, Jan 13, 2014 at 5:23 PM, Jim Nasby <j...@nasby.net> wrote:
> > > On 1/13/14, 2:19 PM, Claudio Freire wrote:
> > >>
> > >> On Mon, Jan 13, 2014 at 5:15 PM, Robert Haas <robertmh...@gmail.com>
> > >> wrote:
> > >>>
> > >>> On a related note, there's also the problem of double-buffering. When
> > >>> we read a page into shared_buffers, we leave a copy behind in the OS
> > >>> buffers, and similarly on write-out. It's very unclear what to do
> > >>> about this, since the kernel and PostgreSQL don't have intimate
> > >>> knowledge of what each other are doing, but it would be nice to solve
> > >>> somehow.
> > >>
> > >>
> > >>
> > >> There you have a much harder algorithmic problem.
> > >>
> > >> You can basically control duplication with fadvise and WONTNEED. The
> > >> problem here is not the kernel and whether or not it allows postgres
> > >> to be smart about it. The problem is... what kind of smarts
> > >> (algorithm) to use.
> > >
> > >
> > > Isn't this a fairly simple matter of when we read a page into shared
> > > buffers
> > > tell the kernel do forget that page? And a corollary to that for when we
> > > dump a page out of shared_buffers (here kernel, please put this back into
> > > your cache).
> > That's my point. In terms of kernel-postgres interaction, it's fairly
> > simple.
> > What's not so simple, is figuring out what policy to use. Remember,
> > you cannot tell the kernel to put some page in its page cache without
> > reading it or writing it. So, once you make the kernel forget a page,
> > evicting it from shared buffers becomes quite expensive.
> posix_fadvise(POSIX_FADV_WILLNEED) is meant to cover this case by
> forcing readahead. If you evict it prematurely then you do get kinda
> screwed because you pay the IO cost to read it back in again even if you
> had enough memory to cache it. Maybe this is the type of kernel-postgres
> interaction that is annoying you.
> If you don't evict, the kernel eventually steps in and evicts the wrong
> thing. If you do evict and it was unnecessarily you pay an IO cost.
> That could be something we look at. There are cases buried deep in the
> VM where pages get shuffled to the end of the LRU and get tagged for
> reclaim as soon as possible. Maybe you need access to something like
> that via posix_fadvise to say "reclaim this page if you need memory but
> leave it resident if there is no memory pressure" or something similar.
> Not exactly sure what that interface would look like or offhand how it
> could be reliably implemented.
Well, kernel managing user space cache postgres guys talk about looks
pretty much like what "volatile range" patches are trying to achieve.
Note to postgres guys: I think you should have a look at the proposed
'vrange' system call. The latest posting is here:
http://www.spinics.net/lists/linux-mm/msg67328.html. It contains a rather
detailed description of the feature. And if the feature looks good to you,
you can add your 'me to' plus if anyone would be willing to try that out
with postgress that would be most welcome (although I understand you might
not want to burn your time on experimental kernel feature).
Jan Kara <j...@suse.cz>
SUSE Labs, CR
Sent via pgsql-hackers mailing list (email@example.com)
To make changes to your subscription: