On Mon, 2014-01-13 at 19:48 -0500, Trond Myklebust wrote: > On Jan 13, 2014, at 19:03, Hannu Krosing <ha...@2ndquadrant.com> wrote: > > > On 01/13/2014 09:53 PM, Trond Myklebust wrote: > >> On Jan 13, 2014, at 15:40, Andres Freund <and...@2ndquadrant.com> wrote: > >> > >>> On 2014-01-13 15:15:16 -0500, Robert Haas wrote: > >>>> On Mon, Jan 13, 2014 at 1:51 PM, Kevin Grittner <kgri...@ymail.com> > >>>> wrote: > >>>>> I notice, Josh, that you didn't mention the problems many people > >>>>> have run into with Transparent Huge Page defrag and with NUMA > >>>>> access. > >>>> Amen to that. Actually, I think NUMA can be (mostly?) fixed by > >>>> setting zone_reclaim_mode; is there some other problem besides that? > >>> I think that fixes some of the worst instances, but I've seen machines > >>> spending horrible amounts of CPU (& BUS) time in page reclaim > >>> nonetheless. If I analyzed it correctly it's in RAM << working set > >>> workloads where RAM is pretty large and most of it is used as page > >>> cache. The kernel ends up spending a huge percentage of time finding and > >>> potentially defragmenting pages when looking for victim buffers. > >>> > >>>> On a related note, there's also the problem of double-buffering. When > >>>> we read a page into shared_buffers, we leave a copy behind in the OS > >>>> buffers, and similarly on write-out. It's very unclear what to do > >>>> about this, since the kernel and PostgreSQL don't have intimate > >>>> knowledge of what each other are doing, but it would be nice to solve > >>>> somehow. > >>> I've wondered before if there wouldn't be a chance for postgres to say > >>> "my dear OS, that the file range 0-8192 of file x contains y, no need to > >>> reread" and do that when we evict a page from s_b but I never dared to > >>> actually propose that to kernel people... > >> O_DIRECT was specifically designed to solve the problem of double > >> buffering > >> between applications and the kernel. Why are you not able to use that in > >> these situations? > > What is asked is the opposite of O_DIRECT - the write from a buffer inside > > postgresql to linux *buffercache* and telling linux that it is the same > > as what > > is currently on disk, so don't bother to write it back ever. > > I don’t understand. Are we talking about mmap()ed files here? Why > would the kernel be trying to write back pages that aren’t dirty?
No ... if I have it right, it's pretty awful: they want to do a read of a file into a user provided buffer, thus obtaining a page cache entry and a copy in their userspace buffer, then insert the page of the user buffer back into the page cache as the page cache page ... that's right, isn't it postgress people? Effectively you end up with buffered read/write that's also mapped into the page cache. It's a pretty awful way to hack around mmap. James -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers