Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-24 Thread Andreas Pflug
Am 23.01.14 02:14, schrieb Jim Nasby: On 1/19/14, 5:51 PM, Dave Chinner wrote: Postgres is far from being the only application that wants this; many people resort to tmpfs because of this: https://lwn.net/Articles/499410/ Yes, we covered the possibility of using tmpfs much earlier in the

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-23 Thread Gregory Smith
On 1/20/14 9:46 AM, Mel Gorman wrote: They could potentially be used to evalate any IO scheduler changes. For example -- deadline scheduler with these parameters has X transactions/sec throughput with average latency of Y millieseconds and a maximum fsync latency of Z seconds. Evaluate how

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-22 Thread Robert Haas
On Tue, Jan 21, 2014 at 3:20 PM, Jan Kara j...@suse.cz wrote: But that still doesn't work out very well, because now the guy who does the write() has to wait for it to finish before he can do anything else. That's not always what we want, because WAL gets written out from our internal buffers

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-22 Thread Dave Chinner
On Tue, Jan 21, 2014 at 09:20:52PM +0100, Jan Kara wrote: On Fri 17-01-14 08:57:25, Robert Haas wrote: On Fri, Jan 17, 2014 at 7:34 AM, Jeff Layton jlay...@redhat.com wrote: So this says to me that the WAL is a place where DIO should really be reconsidered. It's mostly sequential writes

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-22 Thread Jan Kara
On Wed 22-01-14 09:07:19, Dave Chinner wrote: On Tue, Jan 21, 2014 at 09:20:52PM +0100, Jan Kara wrote: If we're forcing the WAL out to disk because of transaction commit or because we need to write the buffer protected by a certain WAL record only after the WAL hits the platter, then

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-22 Thread Jan Kara
On Fri 17-01-14 08:57:25, Robert Haas wrote: On Fri, Jan 17, 2014 at 7:34 AM, Jeff Layton jlay...@redhat.com wrote: So this says to me that the WAL is a place where DIO should really be reconsidered. It's mostly sequential writes that need to hit the disk ASAP, and you need to know that

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-22 Thread Bruce Momjian
On Tue, Jan 21, 2014 at 09:20:52PM +0100, Jan Kara wrote: If we're forcing the WAL out to disk because of transaction commit or because we need to write the buffer protected by a certain WAL record only after the WAL hits the platter, then it's fine. But sometimes we're writing WAL just

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-22 Thread Jim Nasby
On 1/17/14, 7:57 AM, Robert Haas wrote: - WAL files are written (and sometimes read) sequentially and fsync'd very frequently and it's always good to write the data out to disk as soon as possible - Temp files are written and read sequentially and never fsync'd. They should only be written to

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-22 Thread Jim Nasby
On 1/17/14, 2:24 PM, Gregory Smith wrote: I am skeptical that the database will take over very much of this work and perform better than the Linux kernel does. My take is that our most useful role would be providing test cases kernel developers can add to a performance regression suite. Ugly

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-22 Thread Jim Nasby
On 1/19/14, 5:51 PM, Dave Chinner wrote: Postgres is far from being the only application that wants this; many people resort to tmpfs because of this: https://lwn.net/Articles/499410/ Yes, we covered the possibility of using tmpfs much earlier in the thread, and came to the conclusion that temp

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-22 Thread Claudio Freire
On Wed, Jan 22, 2014 at 10:08 PM, Jim Nasby j...@nasby.net wrote: Probably more useful is the case of index scans; if we pre-read more data from the index we could hand the kernel a list of base relation blocks that we know we'll need. Actually, I've already tried this. The most important

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-20 Thread Marti Raudsepp
On Mon, Jan 20, 2014 at 1:51 AM, Dave Chinner da...@fromorbit.com wrote: Postgres is far from being the only application that wants this; many people resort to tmpfs because of this: https://lwn.net/Articles/499410/ Yes, we covered the possibility of using tmpfs much earlier in the thread,

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-20 Thread Dave Chinner
On Sun, Jan 19, 2014 at 03:37:37AM +0200, Marti Raudsepp wrote: On Wed, Jan 15, 2014 at 5:34 AM, Jim Nasby j...@nasby.net wrote: it's very common to create temporary file data that will never, ever, ever actually NEED to hit disk. Where I work being able to tell the kernel to avoid flushing

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-20 Thread Mel Gorman
On Mon, Jan 20, 2014 at 10:51:41AM +1100, Dave Chinner wrote: On Sun, Jan 19, 2014 at 03:37:37AM +0200, Marti Raudsepp wrote: On Wed, Jan 15, 2014 at 5:34 AM, Jim Nasby j...@nasby.net wrote: it's very common to create temporary file data that will never, ever, ever actually NEED to hit

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-20 Thread Mel Gorman
On Fri, Jan 17, 2014 at 03:24:01PM -0500, Gregory Smith wrote: On 1/17/14 10:37 AM, Mel Gorman wrote: There is not an easy way to tell. To be 100%, it would require an instrumentation patch or a systemtap script to detect when a particular page is being written back and track the context.

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-20 Thread Jeff Layton
On Mon, 20 Jan 2014 10:51:41 +1100 Dave Chinner da...@fromorbit.com wrote: On Sun, Jan 19, 2014 at 03:37:37AM +0200, Marti Raudsepp wrote: On Wed, Jan 15, 2014 at 5:34 AM, Jim Nasby j...@nasby.net wrote: it's very common to create temporary file data that will never, ever, ever actually

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-20 Thread Bruce Momjian
On Wed, Jan 15, 2014 at 11:49:09AM +, Mel Gorman wrote: It may be the case that mmap/madvise is still required to handle a double buffering problem but it's far from being a free lunch and it has costs that read/write does not have to deal with. Maybe some of these problems can be fixed or

Re: [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-18 Thread Marti Raudsepp
On Wed, Jan 15, 2014 at 5:34 AM, Jim Nasby j...@nasby.net wrote: it's very common to create temporary file data that will never, ever, ever actually NEED to hit disk. Where I work being able to tell the kernel to avoid flushing those files unless the kernel thinks it's got better things to do

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-17 Thread Dave Chinner
On Thu, Jan 16, 2014 at 03:58:56PM -0800, Jeff Janes wrote: On Thu, Jan 16, 2014 at 3:23 PM, Dave Chinner da...@fromorbit.com wrote: On Wed, Jan 15, 2014 at 06:14:18PM -0600, Jim Nasby wrote: On 1/15/14, 12:00 AM, Claudio Freire wrote: My completely unproven theory is that swapping is

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-17 Thread Dave Chinner
On Wed, Jan 15, 2014 at 06:14:18PM -0600, Jim Nasby wrote: On 1/15/14, 12:00 AM, Claudio Freire wrote: My completely unproven theory is that swapping is overwhelmed by near-misses. Ie: a process touches a page, and before it's actually swapped in, another process touches it too, blocking on

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-17 Thread Dave Chinner
On Thu, Jan 16, 2014 at 08:48:24PM -0500, Robert Haas wrote: On Thu, Jan 16, 2014 at 7:31 PM, Dave Chinner da...@fromorbit.com wrote: But there's something here that I'm not getting - you're talking about a data set that you want ot keep cache resident that is at least an order of magnitude

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-17 Thread Jeff Layton
On Thu, 16 Jan 2014 20:48:24 -0500 Robert Haas robertmh...@gmail.com wrote: On Thu, Jan 16, 2014 at 7:31 PM, Dave Chinner da...@fromorbit.com wrote: But there's something here that I'm not getting - you're talking about a data set that you want ot keep cache resident that is at least an

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-17 Thread Robert Haas
On Fri, Jan 17, 2014 at 7:34 AM, Jeff Layton jlay...@redhat.com wrote: So this says to me that the WAL is a place where DIO should really be reconsidered. It's mostly sequential writes that need to hit the disk ASAP, and you need to know that they have hit the disk before you can proceed with

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-17 Thread Mel Gorman
On Thu, Jan 16, 2014 at 04:30:59PM -0800, Jeff Janes wrote: On Wed, Jan 15, 2014 at 2:08 AM, Mel Gorman mgor...@suse.de wrote: On Tue, Jan 14, 2014 at 09:30:19AM -0800, Jeff Janes wrote: That could be something we look at. There are cases buried deep in the VM where pages get

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-17 Thread Hannu Krosing
On 01/17/2014 06:40 AM, Dave Chinner wrote: On Thu, Jan 16, 2014 at 08:48:24PM -0500, Robert Haas wrote: On Thu, Jan 16, 2014 at 7:31 PM, Dave Chinner da...@fromorbit.com wrote: But there's something here that I'm not getting - you're talking about a data set that you want ot keep cache

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-17 Thread Gregory Smith
On 1/17/14 10:37 AM, Mel Gorman wrote: There is not an easy way to tell. To be 100%, it would require an instrumentation patch or a systemtap script to detect when a particular page is being written back and track the context. There are approximations though. Monitor nr_dirty pages over time.

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-16 Thread knizhnik
I wonder if kernel can sometimes provide weaker version of fsync() which is not enforcing all pending data to be written immediately but just servers as write barrier, guaranteeing that all write operations preceding fsync() will be completed before any of subsequent operations. It will allow

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-16 Thread Jeremy Harris
On 14/01/14 22:23, Dave Chinner wrote: On Tue, Jan 14, 2014 at 11:40:38AM -0800, Kevin Grittner wrote: To quantify that, in a production setting we were seeing pauses of up to two minutes with shared_buffers set to 8GB and default dirty

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-16 Thread Jan Kara
On Wed 15-01-14 21:37:16, Robert Haas wrote: On Wed, Jan 15, 2014 at 8:41 PM, Jan Kara j...@suse.cz wrote: On Wed 15-01-14 10:12:38, Robert Haas wrote: On Wed, Jan 15, 2014 at 4:35 AM, Jan Kara j...@suse.cz wrote: Filesystems could in theory provide facility like atomic write (at least

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-16 Thread Jan Kara
On Wed 15-01-14 10:12:38, Robert Haas wrote: On Wed, Jan 15, 2014 at 4:35 AM, Jan Kara j...@suse.cz wrote: Filesystems could in theory provide facility like atomic write (at least up to a certain size say in MB range) but it's not so easy and when there are no strong usecases fs people are

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-16 Thread Dave Chinner
On Wed, Jan 15, 2014 at 07:31:15PM -0500, Tom Lane wrote: Dave Chinner da...@fromorbit.com writes: On Wed, Jan 15, 2014 at 07:08:18PM -0500, Tom Lane wrote: No, we'd be happy to re-request it during each checkpoint cycle, as long as that wasn't an unduly expensive call to make. I'm not

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-16 Thread Jeff Layton
On Wed, 15 Jan 2014 21:37:16 -0500 Robert Haas robertmh...@gmail.com wrote: On Wed, Jan 15, 2014 at 8:41 PM, Jan Kara j...@suse.cz wrote: On Wed 15-01-14 10:12:38, Robert Haas wrote: On Wed, Jan 15, 2014 at 4:35 AM, Jan Kara j...@suse.cz wrote: Filesystems could in theory provide facility

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-16 Thread Theodore Ts'o
On Wed, Jan 15, 2014 at 10:35:44AM +0100, Jan Kara wrote: Filesystems could in theory provide facility like atomic write (at least up to a certain size say in MB range) but it's not so easy and when there are no strong usecases fs people are reluctant to make their code more complex

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-16 Thread Jeff Janes
On Thu, Jan 16, 2014 at 3:23 PM, Dave Chinner da...@fromorbit.com wrote: On Wed, Jan 15, 2014 at 06:14:18PM -0600, Jim Nasby wrote: On 1/15/14, 12:00 AM, Claudio Freire wrote: My completely unproven theory is that swapping is overwhelmed by near-misses. Ie: a process touches a page, and

Re: [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-16 Thread Jeff Janes
On Wed, Jan 15, 2014 at 2:08 AM, Mel Gorman mgor...@suse.de wrote: On Tue, Jan 14, 2014 at 09:30:19AM -0800, Jeff Janes wrote: That could be something we look at. There are cases buried deep in the VM where pages get shuffled to the end of the LRU and get tagged for reclaim as soon

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-16 Thread Robert Haas
On Thu, Jan 16, 2014 at 7:31 PM, Dave Chinner da...@fromorbit.com wrote: But there's something here that I'm not getting - you're talking about a data set that you want ot keep cache resident that is at least an order of magnitude larger than the cyclic 5-15 minute WAL dataset that ongoing

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Heikki Linnakangas
On 01/15/2014 06:01 AM, Jim Nasby wrote: For the sake of completeness... it's theoretically silly that Postgres is doing all this stuff with WAL when the filesystem is doing something very similar with it's journal. And an SSD drive (and next generation spinning rust) is doing the same thing

Re: [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Mel Gorman
On Tue, Jan 14, 2014 at 09:30:19AM -0800, Jeff Janes wrote: What's not so simple, is figuring out what policy to use. Remember, you cannot tell the kernel to put some page in its page cache without reading it or writing it. So, once you make the kernel forget a page, evicting it from

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Hannu Krosing
On 01/14/2014 06:12 PM, Robert Haas wrote: This would be pretty similar to copy-on-write, except without the copying. It would just be forget-from-the-buffer-pool-on-write. +1 A version of this could probably already be implement using MADV_DONTNEED and MADV_WILLNEED Thet is, just after

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Hannu Krosing
On 01/15/2014 12:16 PM, Hannu Krosing wrote: On 01/14/2014 06:12 PM, Robert Haas wrote: This would be pretty similar to copy-on-write, except without the copying. It would just be forget-from-the-buffer-pool-on-write. +1 A version of this could probably already be implement using

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Mel Gorman
On Mon, Jan 13, 2014 at 02:19:56PM -0800, James Bottomley wrote: On Mon, 2014-01-13 at 22:12 +0100, Andres Freund wrote: On 2014-01-13 12:34:35 -0800, James Bottomley wrote: On Mon, 2014-01-13 at 14:32 -0600, Jim Nasby wrote: Well, if we were to collaborate with the kernel community on

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Dave Chinner
On Tue, Jan 14, 2014 at 09:54:20PM -0600, Jim Nasby wrote: On 1/14/14, 3:41 PM, Dave Chinner wrote: On Tue, Jan 14, 2014 at 09:40:48AM -0500, Robert Haas wrote: On Mon, Jan 13, 2014 at 5:26 PM, Mel Gorman mgor...@suse.de wrote: Whether the problem is with the system call or the programmer is

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Jan Kara
On Wed 15-01-14 10:27:26, Heikki Linnakangas wrote: On 01/15/2014 06:01 AM, Jim Nasby wrote: For the sake of completeness... it's theoretically silly that Postgres is doing all this stuff with WAL when the filesystem is doing something very similar with it's journal. And an SSD drive (and next

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Jan Kara
On Wed 15-01-14 12:16:50, Hannu Krosing wrote: On 01/14/2014 06:12 PM, Robert Haas wrote: This would be pretty similar to copy-on-write, except without the copying. It would just be forget-from-the-buffer-pool-on-write. +1 A version of this could probably already be implement using

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Hannu Krosing
On 01/15/2014 02:01 PM, Jan Kara wrote: On Wed 15-01-14 12:16:50, Hannu Krosing wrote: On 01/14/2014 06:12 PM, Robert Haas wrote: This would be pretty similar to copy-on-write, except without the copying. It would just be forget-from-the-buffer-pool-on-write. +1 A version of this could

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Robert Haas
On Tue, Jan 14, 2014 at 4:23 PM, James Bottomley james.bottom...@hansenpartnership.com wrote: Yes, that's what I was thinking: it's a cache. About how many files comprise this cache? Are you thinking it's too difficult for every process to map the files? No, I'm thinking that would throw

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Robert Haas
On Tue, Jan 14, 2014 at 5:23 PM, Dave Chinner da...@fromorbit.com wrote: By default, background writeback doesn't start until 10% of memory is dirtied, and on your machine that's 25GB of RAM. That's way to high for your workload. It appears to me that we are seeing large memory machines much

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Heikki Linnakangas
On 01/15/2014 07:50 AM, Dave Chinner wrote: However, the first problem is dealing with the IO storm problem on fsync. Then we can measure the effect of spreading those writes out in time and determine what triggers read starvations (if they are apparent). The we can look at whether IO scheduling

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes: On 01/15/2014 07:50 AM, Dave Chinner wrote: FWIW [and I know you're probably sick of hearing this by now], but the blk-io throttling works almost perfectly with applications that use direct IO. For checkpoint writes, direct I/O actually

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Robert Haas
On Wed, Jan 15, 2014 at 4:35 AM, Jan Kara j...@suse.cz wrote: Filesystems could in theory provide facility like atomic write (at least up to a certain size say in MB range) but it's not so easy and when there are no strong usecases fs people are reluctant to make their code more complex

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Stephen Frost
* Claudio Freire (klaussfre...@gmail.com) wrote: Yes, that's basically zero-copy reads. It could be done. The kernel can remap the page to the physical page holding the shared buffer and mark it read-only, then expire the buffer and transfer ownership of the page if any page fault happens.

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Claudio Freire
On Wed, Jan 15, 2014 at 1:35 PM, Stephen Frost sfr...@snowman.net wrote: And there's a nice bingo. Had forgotten about KSM. KSM could help lots. I could try to see of madvising shared_buffers as mergeable helps. But this should be an automatic case of KSM - ie, when reading into a

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Stephen Frost
* Claudio Freire (klaussfre...@gmail.com) wrote: But, still, the implementation is very similar to what postgres needs: sharing a physical page for two distinct logical pages, efficiently, with efficient copy-on-write. Agreed, except that KSM seems like it'd be slow/lazy about it and I'm

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Claudio Freire
On Wed, Jan 15, 2014 at 3:41 PM, Stephen Frost sfr...@snowman.net wrote: * Claudio Freire (klaussfre...@gmail.com) wrote: But, still, the implementation is very similar to what postgres needs: sharing a physical page for two distinct logical pages, efficiently, with efficient copy-on-write.

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Jan Kara
On Wed 15-01-14 14:38:44, Hannu Krosing wrote: On 01/15/2014 02:01 PM, Jan Kara wrote: On Wed 15-01-14 12:16:50, Hannu Krosing wrote: On 01/14/2014 06:12 PM, Robert Haas wrote: This would be pretty similar to copy-on-write, except without the copying. It would just be

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Jeff Janes
On Wed, Jan 15, 2014 at 7:12 AM, Tom Lane t...@sss.pgh.pa.us wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: On 01/15/2014 07:50 AM, Dave Chinner wrote: FWIW [and I know you're probably sick of hearing this by now], but the blk-io throttling works almost perfectly with

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Tom Lane
Dave Chinner da...@fromorbit.com writes: On Wed, Jan 15, 2014 at 10:12:38AM -0500, Tom Lane wrote: What we'd really like for checkpointing is to hand the kernel a boatload (several GB) of dirty pages and say how about you push all this to disk over the next few minutes, in whatever way seems

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Tom Lane
Dave Chinner da...@fromorbit.com writes: On Wed, Jan 15, 2014 at 02:29:40PM -0800, Jeff Janes wrote: And most importantly, Also, please don't freeze up everything else in the process If you hand writeback off to the kernel, then writeback for memory reclaim needs to take precedence over

Re: [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Jim Nasby
On 1/15/14, 12:00 AM, Claudio Freire wrote: My completely unproven theory is that swapping is overwhelmed by near-misses. Ie: a process touches a page, and before it's actually swapped in, another process touches it too, blocking on the other process' read. But the second process doesn't account

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Robert Haas
On Wed, Jan 15, 2014 at 7:22 PM, Dave Chinner da...@fromorbit.com wrote: No, I meant the opposite - in low memory situations, the system is going to go to hell in a handbasket because we are going to cause a writeback IO storm cleaning memory regardless of these IO priorities. i.e. there is no

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Tom Lane
Dave Chinner da...@fromorbit.com writes: On Wed, Jan 15, 2014 at 07:08:18PM -0500, Tom Lane wrote: No, we'd be happy to re-request it during each checkpoint cycle, as long as that wasn't an unduly expensive call to make. I'm not quite sure where such requests ought to live though. One idea

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: I don't see that as a problem. What we're struggling with today is that, until we fsync(), the system is too lazy about writing back dirty pages. And then when we fsync(), it becomes very aggressive and system-wide throughput goes into the tank.

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Dave Chinner
On Wed, Jan 15, 2014 at 10:12:38AM -0500, Tom Lane wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: On 01/15/2014 07:50 AM, Dave Chinner wrote: FWIW [and I know you're probably sick of hearing this by now], but the blk-io throttling works almost perfectly with applications that

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Dave Chinner
On Wed, Jan 15, 2014 at 02:29:40PM -0800, Jeff Janes wrote: On Wed, Jan 15, 2014 at 7:12 AM, Tom Lane t...@sss.pgh.pa.us wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: On 01/15/2014 07:50 AM, Dave Chinner wrote: FWIW [and I know you're probably sick of hearing this by now],

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Dave Chinner
On Wed, Jan 15, 2014 at 10:12:38AM -0500, Robert Haas wrote: On Wed, Jan 15, 2014 at 4:35 AM, Jan Kara j...@suse.cz wrote: Filesystems could in theory provide facility like atomic write (at least up to a certain size say in MB range) but it's not so easy and when there are no strong

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Dave Chinner
On Wed, Jan 15, 2014 at 07:13:27PM -0500, Tom Lane wrote: Dave Chinner da...@fromorbit.com writes: On Wed, Jan 15, 2014 at 02:29:40PM -0800, Jeff Janes wrote: And most importantly, Also, please don't freeze up everything else in the process If you hand writeback off to the kernel, then

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Dave Chinner
On Wed, Jan 15, 2014 at 07:08:18PM -0500, Tom Lane wrote: Dave Chinner da...@fromorbit.com writes: On Wed, Jan 15, 2014 at 10:12:38AM -0500, Tom Lane wrote: What we'd really like for checkpointing is to hand the kernel a boatload (several GB) of dirty pages and say how about you push all

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-15 Thread Robert Haas
On Wed, Jan 15, 2014 at 8:41 PM, Jan Kara j...@suse.cz wrote: On Wed 15-01-14 10:12:38, Robert Haas wrote: On Wed, Jan 15, 2014 at 4:35 AM, Jan Kara j...@suse.cz wrote: Filesystems could in theory provide facility like atomic write (at least up to a certain size say in MB range) but it's not

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Hannu Krosing
On 01/14/2014 03:44 AM, Dave Chinner wrote: On Tue, Jan 14, 2014 at 02:26:25AM +0100, Andres Freund wrote: On 2014-01-13 17:13:51 -0800, James Bottomley wrote: a file into a user provided buffer, thus obtaining a page cache entry and a copy in their userspace buffer, then insert the page of

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Claudio Freire
On Tue, Jan 14, 2014 at 5:08 AM, Hannu Krosing ha...@2ndquadrant.com wrote: Again, as said above the linux file system is doing fine. What we want is a few ways to interact with it to let it do even better when working with postgresql by telling it some stuff it otherwise would have to second

Re: [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Heikki Linnakangas
On 01/14/2014 12:26 AM, Mel Gorman wrote: On Mon, Jan 13, 2014 at 03:15:16PM -0500, Robert Haas wrote: The other thing that comes to mind is the kernel's caching behavior. We've talked a lot over the years about the difficulties of getting the kernel to write data out when we want it to and to

Re: [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Mel Gorman
On Mon, Jan 13, 2014 at 03:24:38PM -0800, Josh Berkus wrote: On 01/13/2014 02:26 PM, Mel Gorman wrote: Really? zone_reclaim_mode is often a complete disaster unless the workload is partitioned to fit within NUMA nodes. On older kernels enabling it would sometimes cause massive stalls.

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Hannu Krosing
On 01/14/2014 09:39 AM, Claudio Freire wrote: On Tue, Jan 14, 2014 at 5:08 AM, Hannu Krosing ha...@2ndquadrant.com wrote: Again, as said above the linux file system is doing fine. What we want is a few ways to interact with it to let it do even better when working with postgresql by telling it

Re: [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Robert Haas
On Mon, Jan 13, 2014 at 5:26 PM, Mel Gorman mgor...@suse.de wrote: Amen to that. Actually, I think NUMA can be (mostly?) fixed by setting zone_reclaim_mode; is there some other problem besides that? Really? zone_reclaim_mode is often a complete disaster unless the workload is partitioned

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Kevin Grittner
First off, I want to give a +1 on everything in the recent posts from Heikki and Hannu. Jan Kara j...@suse.cz wrote: Now the aging of pages marked as volatile as it is currently implemented needn't be perfect for your needs but you still have time to influence what gets implemented...

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Claudio Freire
On Tue, Jan 14, 2014 at 11:39 AM, Hannu Krosing ha...@2ndquadrant.com wrote: On 01/14/2014 09:39 AM, Claudio Freire wrote: On Tue, Jan 14, 2014 at 5:08 AM, Hannu Krosing ha...@2ndquadrant.com wrote: Again, as said above the linux file system is doing fine. What we want is a few ways to

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Robert Haas
On Tue, Jan 14, 2014 at 3:39 AM, Claudio Freire klaussfre...@gmail.com wrote: On Tue, Jan 14, 2014 at 5:08 AM, Hannu Krosing ha...@2ndquadrant.com wrote: Again, as said above the linux file system is doing fine. What we want is a few ways to interact with it to let it do even better when

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Robert Haas
On Tue, Jan 14, 2014 at 5:00 AM, Jan Kara j...@suse.cz wrote: I thought that instead of injecting pages into pagecache for aging as you describe in 3), you would mark pages as volatile (i.e. for reclaim by kernel) through vrange() syscall. Next time you need the page, you check whether the

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Tom Lane
James Bottomley james.bottom...@hansenpartnership.com writes: The current mechanism for coherency between a userspace cache and the in-kernel page cache is mmap ... that's the only way you get the same page in both currently. Right. glibc used to have an implementation of read/write in terms

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Claudio Freire
On Tue, Jan 14, 2014 at 12:42 PM, Trond Myklebust tron...@gmail.com wrote: James Bottomley james.bottom...@hansenpartnership.com writes: The current mechanism for coherency between a userspace cache and the in-kernel page cache is mmap ... that's the only way you get the same page in both

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Tom Lane
Trond Myklebust tron...@gmail.com writes: On Jan 14, 2014, at 10:39, Tom Lane t...@sss.pgh.pa.us wrote: Don't be aggressive isn't good enough. The prohibition on early write has to be absolute, because writing a dirty page before we've done whatever else we need to do results in a corrupt

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Jan Kara
On Tue 14-01-14 09:08:40, Hannu Krosing wrote: Effectively you end up with buffered read/write that's also mapped into the page cache. It's a pretty awful way to hack around mmap. Well, the problem is that you can't really use mmap() for the things we do. Postgres' durability works by

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Jan Kara
On Tue 14-01-14 11:11:28, Heikki Linnakangas wrote: On 01/14/2014 12:26 AM, Mel Gorman wrote: On Mon, Jan 13, 2014 at 03:15:16PM -0500, Robert Haas wrote: The other thing that comes to mind is the kernel's caching behavior. We've talked a lot over the years about the difficulties of getting

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Dave Chinner
On Mon, Jan 13, 2014 at 03:24:38PM -0800, Josh Berkus wrote: On 01/13/2014 02:26 PM, Mel Gorman wrote: Really? zone_reclaim_mode is often a complete disaster unless the workload is partitioned to fit within NUMA nodes. On older kernels enabling it would sometimes cause massive stalls.

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Dave Chinner
On Tue, Jan 14, 2014 at 02:26:25AM +0100, Andres Freund wrote: On 2014-01-13 17:13:51 -0800, James Bottomley wrote: a file into a user provided buffer, thus obtaining a page cache entry and a copy in their userspace buffer, then insert the page of the user buffer back into the page cache as

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread James Bottomley
On Mon, 2014-01-13 at 19:48 -0500, Trond Myklebust wrote: On Jan 13, 2014, at 19:03, Hannu Krosing ha...@2ndquadrant.com wrote: On 01/13/2014 09:53 PM, Trond Myklebust wrote: On Jan 13, 2014, at 15:40, Andres Freund and...@2ndquadrant.com wrote: On 2014-01-13 15:15:16 -0500, Robert

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread James Bottomley
On Tue, 2014-01-14 at 15:39 +0100, Hannu Krosing wrote: On 01/14/2014 09:39 AM, Claudio Freire wrote: On Tue, Jan 14, 2014 at 5:08 AM, Hannu Krosing ha...@2ndquadrant.com wrote: Again, as said above the linux file system is doing fine. What we want is a few ways to interact with it to

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Trond Myklebust
On Jan 14, 2014, at 10:39, Tom Lane t...@sss.pgh.pa.us wrote: James Bottomley james.bottom...@hansenpartnership.com writes: The current mechanism for coherency between a userspace cache and the in-kernel page cache is mmap ... that's the only way you get the same page in both currently.

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Robert Haas
On Tue, Jan 14, 2014 at 11:44 AM, James Bottomley james.bottom...@hansenpartnership.com wrote: No, I'm sorry, that's never going to be possible. No user space application has all the facts. If we give you an interface to force unconditional holding of dirty pages in core you'll livelock the

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Claudio Freire
On Tue, Jan 14, 2014 at 1:48 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Jan 14, 2014 at 11:44 AM, James Bottomley james.bottom...@hansenpartnership.com wrote: No, I'm sorry, that's never going to be possible. No user space application has all the facts. If we give you an interface

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Heikki Linnakangas
On 01/14/2014 06:08 PM, Tom Lane wrote: Trond Myklebust tron...@gmail.com writes: On Jan 14, 2014, at 10:39, Tom Lane t...@sss.pgh.pa.us wrote: Don't be aggressive isn't good enough. The prohibition on early write has to be absolute, because writing a dirty page before we've done whatever

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Robert Haas
On Tue, Jan 14, 2014 at 11:57 AM, James Bottomley james.bottom...@hansenpartnership.com wrote: On Tue, 2014-01-14 at 11:48 -0500, Robert Haas wrote: On Tue, Jan 14, 2014 at 11:44 AM, James Bottomley james.bottom...@hansenpartnership.com wrote: No, I'm sorry, that's never going to be possible.

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Robert Haas
On Tue, Jan 14, 2014 at 12:12 PM, Robert Haas robertmh...@gmail.com wrote: In terms of avoiding double-buffering, here's my thought after reading what's been written so far. Suppose we read a page into our buffer pool. Until the page is clean, it would be ideal for the mapping to Correction:

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Claudio Freire
On Tue, Jan 14, 2014 at 2:12 PM, Robert Haas robertmh...@gmail.com wrote: In terms of avoiding double-buffering, here's my thought after reading what's been written so far. Suppose we read a page into our buffer pool. Until the page is clean, it would be ideal for the mapping to be shared

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Robert Haas
On Tue, Jan 14, 2014 at 12:15 PM, Claudio Freire klaussfre...@gmail.com wrote: On Tue, Jan 14, 2014 at 2:12 PM, Robert Haas robertmh...@gmail.com wrote: In terms of avoiding double-buffering, here's my thought after reading what's been written so far. Suppose we read a page into our buffer

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Kevin Grittner
Claudio Freire klaussfre...@gmail.com wrote: Robert Haas robertmh...@gmail.com wrote: James Bottomley james.bottom...@hansenpartnership.com wrote: I don't understand why this has to be absolute: if you advise us to hold the pages dirty and we do up until it becomes a choice to hold on to the

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Hannu Krosing
On 01/14/2014 05:44 PM, James Bottomley wrote: On Tue, 2014-01-14 at 10:39 -0500, Tom Lane wrote: James Bottomley james.bottom...@hansenpartnership.com writes: The current mechanism for coherency between a userspace cache and the in-kernel page cache is mmap ... that's the only way you get the

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Kevin Grittner
James Bottomley james.bottom...@hansenpartnership.com wrote: you mean the order of write out, if we have to do it, is important.  In the rest of the kernel, we do this with barriers which causes ordered grouping of I/O chunks.  If we could force a similar ordering in the writeout code, is

Re: [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Jeff Janes
On Mon, Jan 13, 2014 at 2:36 PM, Mel Gorman mgor...@suse.de wrote: On Mon, Jan 13, 2014 at 06:27:03PM -0200, Claudio Freire wrote: On Mon, Jan 13, 2014 at 5:23 PM, Jim Nasby j...@nasby.net wrote: On 1/13/14, 2:19 PM, Claudio Freire wrote: On Mon, Jan 13, 2014 at 5:15 PM, Robert Haas

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-14 Thread Robert Haas
On Tue, Jan 14, 2014 at 12:20 PM, James Bottomley james.bottom...@hansenpartnership.com wrote: On Tue, 2014-01-14 at 15:15 -0200, Claudio Freire wrote: On Tue, Jan 14, 2014 at 2:12 PM, Robert Haas robertmh...@gmail.com wrote: In terms of avoiding double-buffering, here's my thought after

  1   2   >