Am 23.01.14 02:14, schrieb Jim Nasby:
On 1/19/14, 5:51 PM, Dave Chinner wrote:
Postgres is far from being the only application that wants this; many
people resort to tmpfs because of this:
https://lwn.net/Articles/499410/
Yes, we covered the possibility of using tmpfs much earlier in the
On 1/20/14 9:46 AM, Mel Gorman wrote:
They could potentially be used to evalate any IO scheduler changes.
For example -- deadline scheduler with these parameters has X
transactions/sec throughput with average latency of Y millieseconds
and a maximum fsync latency of Z seconds. Evaluate how
On Tue, Jan 21, 2014 at 3:20 PM, Jan Kara j...@suse.cz wrote:
But that still doesn't work out very well, because now the guy who
does the write() has to wait for it to finish before he can do
anything else. That's not always what we want, because WAL gets
written out from our internal buffers
On Tue, Jan 21, 2014 at 09:20:52PM +0100, Jan Kara wrote:
On Fri 17-01-14 08:57:25, Robert Haas wrote:
On Fri, Jan 17, 2014 at 7:34 AM, Jeff Layton jlay...@redhat.com wrote:
So this says to me that the WAL is a place where DIO should really be
reconsidered. It's mostly sequential writes
On Wed 22-01-14 09:07:19, Dave Chinner wrote:
On Tue, Jan 21, 2014 at 09:20:52PM +0100, Jan Kara wrote:
If we're forcing the WAL out to disk because of transaction commit or
because we need to write the buffer protected by a certain WAL record
only after the WAL hits the platter, then
On Fri 17-01-14 08:57:25, Robert Haas wrote:
On Fri, Jan 17, 2014 at 7:34 AM, Jeff Layton jlay...@redhat.com wrote:
So this says to me that the WAL is a place where DIO should really be
reconsidered. It's mostly sequential writes that need to hit the disk
ASAP, and you need to know that
On Tue, Jan 21, 2014 at 09:20:52PM +0100, Jan Kara wrote:
If we're forcing the WAL out to disk because of transaction commit or
because we need to write the buffer protected by a certain WAL record
only after the WAL hits the platter, then it's fine. But sometimes
we're writing WAL just
On 1/17/14, 7:57 AM, Robert Haas wrote:
- WAL files are written (and sometimes read) sequentially and fsync'd
very frequently and it's always good to write the data out to disk as
soon as possible
- Temp files are written and read sequentially and never fsync'd.
They should only be written to
On 1/17/14, 2:24 PM, Gregory Smith wrote:
I am skeptical that the database will take over very much of this work and perform better
than the Linux kernel does. My take is that our most useful role would be providing test
cases kernel developers can add to a performance regression suite. Ugly
On 1/19/14, 5:51 PM, Dave Chinner wrote:
Postgres is far from being the only application that wants this; many
people resort to tmpfs because of this:
https://lwn.net/Articles/499410/
Yes, we covered the possibility of using tmpfs much earlier in the
thread, and came to the conclusion that temp
On Wed, Jan 22, 2014 at 10:08 PM, Jim Nasby j...@nasby.net wrote:
Probably more useful is the case of index scans; if we pre-read more data
from the index we could hand the kernel a list of base relation blocks that
we know we'll need.
Actually, I've already tried this. The most important
On Mon, Jan 20, 2014 at 1:51 AM, Dave Chinner da...@fromorbit.com wrote:
Postgres is far from being the only application that wants this; many
people resort to tmpfs because of this:
https://lwn.net/Articles/499410/
Yes, we covered the possibility of using tmpfs much earlier in the
thread,
On Sun, Jan 19, 2014 at 03:37:37AM +0200, Marti Raudsepp wrote:
On Wed, Jan 15, 2014 at 5:34 AM, Jim Nasby j...@nasby.net wrote:
it's very common to create temporary file data that will never, ever, ever
actually NEED to hit disk. Where I work being able to tell the kernel to
avoid flushing
On Mon, Jan 20, 2014 at 10:51:41AM +1100, Dave Chinner wrote:
On Sun, Jan 19, 2014 at 03:37:37AM +0200, Marti Raudsepp wrote:
On Wed, Jan 15, 2014 at 5:34 AM, Jim Nasby j...@nasby.net wrote:
it's very common to create temporary file data that will never, ever, ever
actually NEED to hit
On Fri, Jan 17, 2014 at 03:24:01PM -0500, Gregory Smith wrote:
On 1/17/14 10:37 AM, Mel Gorman wrote:
There is not an easy way to tell. To be 100%, it would require an
instrumentation patch or a systemtap script to detect when a
particular page is being written back and track the context.
On Mon, 20 Jan 2014 10:51:41 +1100
Dave Chinner da...@fromorbit.com wrote:
On Sun, Jan 19, 2014 at 03:37:37AM +0200, Marti Raudsepp wrote:
On Wed, Jan 15, 2014 at 5:34 AM, Jim Nasby j...@nasby.net wrote:
it's very common to create temporary file data that will never, ever, ever
actually
On Wed, Jan 15, 2014 at 11:49:09AM +, Mel Gorman wrote:
It may be the case that mmap/madvise is still required to handle a double
buffering problem but it's far from being a free lunch and it has costs
that read/write does not have to deal with. Maybe some of these problems
can be fixed or
On Wed, Jan 15, 2014 at 5:34 AM, Jim Nasby j...@nasby.net wrote:
it's very common to create temporary file data that will never, ever, ever
actually NEED to hit disk. Where I work being able to tell the kernel to
avoid flushing those files unless the kernel thinks it's got better things
to do
On Thu, Jan 16, 2014 at 03:58:56PM -0800, Jeff Janes wrote:
On Thu, Jan 16, 2014 at 3:23 PM, Dave Chinner da...@fromorbit.com wrote:
On Wed, Jan 15, 2014 at 06:14:18PM -0600, Jim Nasby wrote:
On 1/15/14, 12:00 AM, Claudio Freire wrote:
My completely unproven theory is that swapping is
On Wed, Jan 15, 2014 at 06:14:18PM -0600, Jim Nasby wrote:
On 1/15/14, 12:00 AM, Claudio Freire wrote:
My completely unproven theory is that swapping is overwhelmed by
near-misses. Ie: a process touches a page, and before it's
actually swapped in, another process touches it too, blocking on
On Thu, Jan 16, 2014 at 08:48:24PM -0500, Robert Haas wrote:
On Thu, Jan 16, 2014 at 7:31 PM, Dave Chinner da...@fromorbit.com wrote:
But there's something here that I'm not getting - you're talking
about a data set that you want ot keep cache resident that is at
least an order of magnitude
On Thu, 16 Jan 2014 20:48:24 -0500
Robert Haas robertmh...@gmail.com wrote:
On Thu, Jan 16, 2014 at 7:31 PM, Dave Chinner da...@fromorbit.com wrote:
But there's something here that I'm not getting - you're talking
about a data set that you want ot keep cache resident that is at
least an
On Fri, Jan 17, 2014 at 7:34 AM, Jeff Layton jlay...@redhat.com wrote:
So this says to me that the WAL is a place where DIO should really be
reconsidered. It's mostly sequential writes that need to hit the disk
ASAP, and you need to know that they have hit the disk before you can
proceed with
On Thu, Jan 16, 2014 at 04:30:59PM -0800, Jeff Janes wrote:
On Wed, Jan 15, 2014 at 2:08 AM, Mel Gorman mgor...@suse.de wrote:
On Tue, Jan 14, 2014 at 09:30:19AM -0800, Jeff Janes wrote:
That could be something we look at. There are cases buried deep in the
VM where pages get
On 01/17/2014 06:40 AM, Dave Chinner wrote:
On Thu, Jan 16, 2014 at 08:48:24PM -0500, Robert Haas wrote:
On Thu, Jan 16, 2014 at 7:31 PM, Dave Chinner da...@fromorbit.com wrote:
But there's something here that I'm not getting - you're talking
about a data set that you want ot keep cache
On 1/17/14 10:37 AM, Mel Gorman wrote:
There is not an easy way to tell. To be 100%, it would require an
instrumentation patch or a systemtap script to detect when a
particular page is being written back and track the context. There are
approximations though. Monitor nr_dirty pages over time.
I wonder if kernel can sometimes provide weaker version of fsync() which
is not enforcing all pending data to be written immediately but just
servers as write barrier, guaranteeing
that all write operations preceding fsync() will be completed before any
of subsequent operations.
It will allow
On 14/01/14 22:23, Dave Chinner wrote:
On Tue, Jan 14, 2014 at 11:40:38AM -0800, Kevin Grittner wrote:
To quantify that, in a production setting we were seeing pauses of
up to two minutes with shared_buffers set to 8GB and default dirty
On Wed 15-01-14 21:37:16, Robert Haas wrote:
On Wed, Jan 15, 2014 at 8:41 PM, Jan Kara j...@suse.cz wrote:
On Wed 15-01-14 10:12:38, Robert Haas wrote:
On Wed, Jan 15, 2014 at 4:35 AM, Jan Kara j...@suse.cz wrote:
Filesystems could in theory provide facility like atomic write (at least
On Wed 15-01-14 10:12:38, Robert Haas wrote:
On Wed, Jan 15, 2014 at 4:35 AM, Jan Kara j...@suse.cz wrote:
Filesystems could in theory provide facility like atomic write (at least up
to a certain size say in MB range) but it's not so easy and when there are
no strong usecases fs people are
On Wed, Jan 15, 2014 at 07:31:15PM -0500, Tom Lane wrote:
Dave Chinner da...@fromorbit.com writes:
On Wed, Jan 15, 2014 at 07:08:18PM -0500, Tom Lane wrote:
No, we'd be happy to re-request it during each checkpoint cycle, as
long as that wasn't an unduly expensive call to make. I'm not
On Wed, 15 Jan 2014 21:37:16 -0500
Robert Haas robertmh...@gmail.com wrote:
On Wed, Jan 15, 2014 at 8:41 PM, Jan Kara j...@suse.cz wrote:
On Wed 15-01-14 10:12:38, Robert Haas wrote:
On Wed, Jan 15, 2014 at 4:35 AM, Jan Kara j...@suse.cz wrote:
Filesystems could in theory provide facility
On Wed, Jan 15, 2014 at 10:35:44AM +0100, Jan Kara wrote:
Filesystems could in theory provide facility like atomic write (at least up
to a certain size say in MB range) but it's not so easy and when there are
no strong usecases fs people are reluctant to make their code more complex
On Thu, Jan 16, 2014 at 3:23 PM, Dave Chinner da...@fromorbit.com wrote:
On Wed, Jan 15, 2014 at 06:14:18PM -0600, Jim Nasby wrote:
On 1/15/14, 12:00 AM, Claudio Freire wrote:
My completely unproven theory is that swapping is overwhelmed by
near-misses. Ie: a process touches a page, and
On Wed, Jan 15, 2014 at 2:08 AM, Mel Gorman mgor...@suse.de wrote:
On Tue, Jan 14, 2014 at 09:30:19AM -0800, Jeff Janes wrote:
That could be something we look at. There are cases buried deep in the
VM where pages get shuffled to the end of the LRU and get tagged for
reclaim as soon
On Thu, Jan 16, 2014 at 7:31 PM, Dave Chinner da...@fromorbit.com wrote:
But there's something here that I'm not getting - you're talking
about a data set that you want ot keep cache resident that is at
least an order of magnitude larger than the cyclic 5-15 minute WAL
dataset that ongoing
On 01/15/2014 06:01 AM, Jim Nasby wrote:
For the sake of completeness... it's theoretically silly that Postgres
is doing all this stuff with WAL when the filesystem is doing something
very similar with it's journal. And an SSD drive (and next generation
spinning rust) is doing the same thing
On Tue, Jan 14, 2014 at 09:30:19AM -0800, Jeff Janes wrote:
What's not so simple, is figuring out what policy to use. Remember,
you cannot tell the kernel to put some page in its page cache without
reading it or writing it. So, once you make the kernel forget a page,
evicting it from
On 01/14/2014 06:12 PM, Robert Haas wrote:
This would be pretty similar to copy-on-write, except
without the copying. It would just be
forget-from-the-buffer-pool-on-write.
+1
A version of this could probably already be implement using MADV_DONTNEED
and MADV_WILLNEED
Thet is, just after
On 01/15/2014 12:16 PM, Hannu Krosing wrote:
On 01/14/2014 06:12 PM, Robert Haas wrote:
This would be pretty similar to copy-on-write, except
without the copying. It would just be
forget-from-the-buffer-pool-on-write.
+1
A version of this could probably already be implement using
On Mon, Jan 13, 2014 at 02:19:56PM -0800, James Bottomley wrote:
On Mon, 2014-01-13 at 22:12 +0100, Andres Freund wrote:
On 2014-01-13 12:34:35 -0800, James Bottomley wrote:
On Mon, 2014-01-13 at 14:32 -0600, Jim Nasby wrote:
Well, if we were to collaborate with the kernel community on
On Tue, Jan 14, 2014 at 09:54:20PM -0600, Jim Nasby wrote:
On 1/14/14, 3:41 PM, Dave Chinner wrote:
On Tue, Jan 14, 2014 at 09:40:48AM -0500, Robert Haas wrote:
On Mon, Jan 13, 2014 at 5:26 PM, Mel Gorman mgor...@suse.de
wrote: Whether the problem is with the system call or the
programmer is
On Wed 15-01-14 10:27:26, Heikki Linnakangas wrote:
On 01/15/2014 06:01 AM, Jim Nasby wrote:
For the sake of completeness... it's theoretically silly that Postgres
is doing all this stuff with WAL when the filesystem is doing something
very similar with it's journal. And an SSD drive (and next
On Wed 15-01-14 12:16:50, Hannu Krosing wrote:
On 01/14/2014 06:12 PM, Robert Haas wrote:
This would be pretty similar to copy-on-write, except
without the copying. It would just be
forget-from-the-buffer-pool-on-write.
+1
A version of this could probably already be implement using
On 01/15/2014 02:01 PM, Jan Kara wrote:
On Wed 15-01-14 12:16:50, Hannu Krosing wrote:
On 01/14/2014 06:12 PM, Robert Haas wrote:
This would be pretty similar to copy-on-write, except
without the copying. It would just be
forget-from-the-buffer-pool-on-write.
+1
A version of this could
On Tue, Jan 14, 2014 at 4:23 PM, James Bottomley
james.bottom...@hansenpartnership.com wrote:
Yes, that's what I was thinking: it's a cache. About how many files
comprise this cache? Are you thinking it's too difficult for every
process to map the files?
No, I'm thinking that would throw
On Tue, Jan 14, 2014 at 5:23 PM, Dave Chinner da...@fromorbit.com wrote:
By default, background writeback doesn't start until 10% of memory
is dirtied, and on your machine that's 25GB of RAM. That's way to
high for your workload.
It appears to me that we are seeing large memory machines much
On 01/15/2014 07:50 AM, Dave Chinner wrote:
However, the first problem is dealing with the IO storm problem on
fsync. Then we can measure the effect of spreading those writes out
in time and determine what triggers read starvations (if they are
apparent). The we can look at whether IO scheduling
Heikki Linnakangas hlinnakan...@vmware.com writes:
On 01/15/2014 07:50 AM, Dave Chinner wrote:
FWIW [and I know you're probably sick of hearing this by now], but
the blk-io throttling works almost perfectly with applications that
use direct IO.
For checkpoint writes, direct I/O actually
On Wed, Jan 15, 2014 at 4:35 AM, Jan Kara j...@suse.cz wrote:
Filesystems could in theory provide facility like atomic write (at least up
to a certain size say in MB range) but it's not so easy and when there are
no strong usecases fs people are reluctant to make their code more complex
* Claudio Freire (klaussfre...@gmail.com) wrote:
Yes, that's basically zero-copy reads.
It could be done. The kernel can remap the page to the physical page
holding the shared buffer and mark it read-only, then expire the
buffer and transfer ownership of the page if any page fault happens.
On Wed, Jan 15, 2014 at 1:35 PM, Stephen Frost sfr...@snowman.net wrote:
And there's a nice bingo. Had forgotten about KSM. KSM could help lots.
I could try to see of madvising shared_buffers as mergeable helps. But
this should be an automatic case of KSM - ie, when reading into a
* Claudio Freire (klaussfre...@gmail.com) wrote:
But, still, the implementation is very similar to what postgres needs:
sharing a physical page for two distinct logical pages, efficiently,
with efficient copy-on-write.
Agreed, except that KSM seems like it'd be slow/lazy about it and I'm
On Wed, Jan 15, 2014 at 3:41 PM, Stephen Frost sfr...@snowman.net wrote:
* Claudio Freire (klaussfre...@gmail.com) wrote:
But, still, the implementation is very similar to what postgres needs:
sharing a physical page for two distinct logical pages, efficiently,
with efficient copy-on-write.
On Wed 15-01-14 14:38:44, Hannu Krosing wrote:
On 01/15/2014 02:01 PM, Jan Kara wrote:
On Wed 15-01-14 12:16:50, Hannu Krosing wrote:
On 01/14/2014 06:12 PM, Robert Haas wrote:
This would be pretty similar to copy-on-write, except
without the copying. It would just be
On Wed, Jan 15, 2014 at 7:12 AM, Tom Lane t...@sss.pgh.pa.us wrote:
Heikki Linnakangas hlinnakan...@vmware.com writes:
On 01/15/2014 07:50 AM, Dave Chinner wrote:
FWIW [and I know you're probably sick of hearing this by now], but
the blk-io throttling works almost perfectly with
Dave Chinner da...@fromorbit.com writes:
On Wed, Jan 15, 2014 at 10:12:38AM -0500, Tom Lane wrote:
What we'd really like for checkpointing is to hand the kernel a boatload
(several GB) of dirty pages and say how about you push all this to disk
over the next few minutes, in whatever way seems
Dave Chinner da...@fromorbit.com writes:
On Wed, Jan 15, 2014 at 02:29:40PM -0800, Jeff Janes wrote:
And most importantly, Also, please don't freeze up everything else in the
process
If you hand writeback off to the kernel, then writeback for memory
reclaim needs to take precedence over
On 1/15/14, 12:00 AM, Claudio Freire wrote:
My completely unproven theory is that swapping is overwhelmed by
near-misses. Ie: a process touches a page, and before it's actually
swapped in, another process touches it too, blocking on the other
process' read. But the second process doesn't account
On Wed, Jan 15, 2014 at 7:22 PM, Dave Chinner da...@fromorbit.com wrote:
No, I meant the opposite - in low memory situations, the system is
going to go to hell in a handbasket because we are going to cause a
writeback IO storm cleaning memory regardless of these IO
priorities. i.e. there is no
Dave Chinner da...@fromorbit.com writes:
On Wed, Jan 15, 2014 at 07:08:18PM -0500, Tom Lane wrote:
No, we'd be happy to re-request it during each checkpoint cycle, as
long as that wasn't an unduly expensive call to make. I'm not quite
sure where such requests ought to live though. One idea
Robert Haas robertmh...@gmail.com writes:
I don't see that as a problem. What we're struggling with today is
that, until we fsync(), the system is too lazy about writing back
dirty pages. And then when we fsync(), it becomes very aggressive and
system-wide throughput goes into the tank.
On Wed, Jan 15, 2014 at 10:12:38AM -0500, Tom Lane wrote:
Heikki Linnakangas hlinnakan...@vmware.com writes:
On 01/15/2014 07:50 AM, Dave Chinner wrote:
FWIW [and I know you're probably sick of hearing this by now], but
the blk-io throttling works almost perfectly with applications that
On Wed, Jan 15, 2014 at 02:29:40PM -0800, Jeff Janes wrote:
On Wed, Jan 15, 2014 at 7:12 AM, Tom Lane t...@sss.pgh.pa.us wrote:
Heikki Linnakangas hlinnakan...@vmware.com writes:
On 01/15/2014 07:50 AM, Dave Chinner wrote:
FWIW [and I know you're probably sick of hearing this by now],
On Wed, Jan 15, 2014 at 10:12:38AM -0500, Robert Haas wrote:
On Wed, Jan 15, 2014 at 4:35 AM, Jan Kara j...@suse.cz wrote:
Filesystems could in theory provide facility like atomic write (at least up
to a certain size say in MB range) but it's not so easy and when there are
no strong
On Wed, Jan 15, 2014 at 07:13:27PM -0500, Tom Lane wrote:
Dave Chinner da...@fromorbit.com writes:
On Wed, Jan 15, 2014 at 02:29:40PM -0800, Jeff Janes wrote:
And most importantly, Also, please don't freeze up everything else in the
process
If you hand writeback off to the kernel, then
On Wed, Jan 15, 2014 at 07:08:18PM -0500, Tom Lane wrote:
Dave Chinner da...@fromorbit.com writes:
On Wed, Jan 15, 2014 at 10:12:38AM -0500, Tom Lane wrote:
What we'd really like for checkpointing is to hand the kernel a boatload
(several GB) of dirty pages and say how about you push all
On Wed, Jan 15, 2014 at 8:41 PM, Jan Kara j...@suse.cz wrote:
On Wed 15-01-14 10:12:38, Robert Haas wrote:
On Wed, Jan 15, 2014 at 4:35 AM, Jan Kara j...@suse.cz wrote:
Filesystems could in theory provide facility like atomic write (at least up
to a certain size say in MB range) but it's not
On 01/14/2014 03:44 AM, Dave Chinner wrote:
On Tue, Jan 14, 2014 at 02:26:25AM +0100, Andres Freund wrote:
On 2014-01-13 17:13:51 -0800, James Bottomley wrote:
a file into a user provided buffer, thus obtaining a page cache entry
and a copy in their userspace buffer, then insert the page of
On Tue, Jan 14, 2014 at 5:08 AM, Hannu Krosing ha...@2ndquadrant.com wrote:
Again, as said above the linux file system is doing fine. What we
want is a few ways to interact with it to let it do even better when
working with postgresql by telling it some stuff it otherwise would
have to second
On 01/14/2014 12:26 AM, Mel Gorman wrote:
On Mon, Jan 13, 2014 at 03:15:16PM -0500, Robert Haas wrote:
The other thing that comes to mind is the kernel's caching behavior.
We've talked a lot over the years about the difficulties of getting
the kernel to write data out when we want it to and to
On Mon, Jan 13, 2014 at 03:24:38PM -0800, Josh Berkus wrote:
On 01/13/2014 02:26 PM, Mel Gorman wrote:
Really?
zone_reclaim_mode is often a complete disaster unless the workload is
partitioned to fit within NUMA nodes. On older kernels enabling it would
sometimes cause massive stalls.
On 01/14/2014 09:39 AM, Claudio Freire wrote:
On Tue, Jan 14, 2014 at 5:08 AM, Hannu Krosing ha...@2ndquadrant.com wrote:
Again, as said above the linux file system is doing fine. What we
want is a few ways to interact with it to let it do even better when
working with postgresql by telling it
On Mon, Jan 13, 2014 at 5:26 PM, Mel Gorman mgor...@suse.de wrote:
Amen to that. Actually, I think NUMA can be (mostly?) fixed by
setting zone_reclaim_mode; is there some other problem besides that?
Really?
zone_reclaim_mode is often a complete disaster unless the workload is
partitioned
First off, I want to give a +1 on everything in the recent posts
from Heikki and Hannu.
Jan Kara j...@suse.cz wrote:
Now the aging of pages marked as volatile as it is currently
implemented needn't be perfect for your needs but you still have
time to influence what gets implemented...
On Tue, Jan 14, 2014 at 11:39 AM, Hannu Krosing ha...@2ndquadrant.com wrote:
On 01/14/2014 09:39 AM, Claudio Freire wrote:
On Tue, Jan 14, 2014 at 5:08 AM, Hannu Krosing ha...@2ndquadrant.com wrote:
Again, as said above the linux file system is doing fine. What we
want is a few ways to
On Tue, Jan 14, 2014 at 3:39 AM, Claudio Freire klaussfre...@gmail.com wrote:
On Tue, Jan 14, 2014 at 5:08 AM, Hannu Krosing ha...@2ndquadrant.com wrote:
Again, as said above the linux file system is doing fine. What we
want is a few ways to interact with it to let it do even better when
On Tue, Jan 14, 2014 at 5:00 AM, Jan Kara j...@suse.cz wrote:
I thought that instead of injecting pages into pagecache for aging as you
describe in 3), you would mark pages as volatile (i.e. for reclaim by
kernel) through vrange() syscall. Next time you need the page, you check
whether the
James Bottomley james.bottom...@hansenpartnership.com writes:
The current mechanism for coherency between a userspace cache and the
in-kernel page cache is mmap ... that's the only way you get the same
page in both currently.
Right.
glibc used to have an implementation of read/write in terms
On Tue, Jan 14, 2014 at 12:42 PM, Trond Myklebust tron...@gmail.com wrote:
James Bottomley james.bottom...@hansenpartnership.com writes:
The current mechanism for coherency between a userspace cache and the
in-kernel page cache is mmap ... that's the only way you get the same
page in both
Trond Myklebust tron...@gmail.com writes:
On Jan 14, 2014, at 10:39, Tom Lane t...@sss.pgh.pa.us wrote:
Don't be aggressive isn't good enough. The prohibition on early write
has to be absolute, because writing a dirty page before we've done
whatever else we need to do results in a corrupt
On Tue 14-01-14 09:08:40, Hannu Krosing wrote:
Effectively you end up with buffered read/write that's also mapped into
the page cache. It's a pretty awful way to hack around mmap.
Well, the problem is that you can't really use mmap() for the things we
do. Postgres' durability works by
On Tue 14-01-14 11:11:28, Heikki Linnakangas wrote:
On 01/14/2014 12:26 AM, Mel Gorman wrote:
On Mon, Jan 13, 2014 at 03:15:16PM -0500, Robert Haas wrote:
The other thing that comes to mind is the kernel's caching behavior.
We've talked a lot over the years about the difficulties of getting
On Mon, Jan 13, 2014 at 03:24:38PM -0800, Josh Berkus wrote:
On 01/13/2014 02:26 PM, Mel Gorman wrote:
Really?
zone_reclaim_mode is often a complete disaster unless the workload is
partitioned to fit within NUMA nodes. On older kernels enabling it would
sometimes cause massive stalls.
On Tue, Jan 14, 2014 at 02:26:25AM +0100, Andres Freund wrote:
On 2014-01-13 17:13:51 -0800, James Bottomley wrote:
a file into a user provided buffer, thus obtaining a page cache entry
and a copy in their userspace buffer, then insert the page of the user
buffer back into the page cache as
On Mon, 2014-01-13 at 19:48 -0500, Trond Myklebust wrote:
On Jan 13, 2014, at 19:03, Hannu Krosing ha...@2ndquadrant.com wrote:
On 01/13/2014 09:53 PM, Trond Myklebust wrote:
On Jan 13, 2014, at 15:40, Andres Freund and...@2ndquadrant.com wrote:
On 2014-01-13 15:15:16 -0500, Robert
On Tue, 2014-01-14 at 15:39 +0100, Hannu Krosing wrote:
On 01/14/2014 09:39 AM, Claudio Freire wrote:
On Tue, Jan 14, 2014 at 5:08 AM, Hannu Krosing ha...@2ndquadrant.com
wrote:
Again, as said above the linux file system is doing fine. What we
want is a few ways to interact with it to
On Jan 14, 2014, at 10:39, Tom Lane t...@sss.pgh.pa.us wrote:
James Bottomley james.bottom...@hansenpartnership.com writes:
The current mechanism for coherency between a userspace cache and the
in-kernel page cache is mmap ... that's the only way you get the same
page in both currently.
On Tue, Jan 14, 2014 at 11:44 AM, James Bottomley
james.bottom...@hansenpartnership.com wrote:
No, I'm sorry, that's never going to be possible. No user space
application has all the facts. If we give you an interface to force
unconditional holding of dirty pages in core you'll livelock the
On Tue, Jan 14, 2014 at 1:48 PM, Robert Haas robertmh...@gmail.com wrote:
On Tue, Jan 14, 2014 at 11:44 AM, James Bottomley
james.bottom...@hansenpartnership.com wrote:
No, I'm sorry, that's never going to be possible. No user space
application has all the facts. If we give you an interface
On 01/14/2014 06:08 PM, Tom Lane wrote:
Trond Myklebust tron...@gmail.com writes:
On Jan 14, 2014, at 10:39, Tom Lane t...@sss.pgh.pa.us wrote:
Don't be aggressive isn't good enough. The prohibition on early write
has to be absolute, because writing a dirty page before we've done
whatever
On Tue, Jan 14, 2014 at 11:57 AM, James Bottomley
james.bottom...@hansenpartnership.com wrote:
On Tue, 2014-01-14 at 11:48 -0500, Robert Haas wrote:
On Tue, Jan 14, 2014 at 11:44 AM, James Bottomley
james.bottom...@hansenpartnership.com wrote:
No, I'm sorry, that's never going to be possible.
On Tue, Jan 14, 2014 at 12:12 PM, Robert Haas robertmh...@gmail.com wrote:
In terms of avoiding double-buffering, here's my thought after reading
what's been written so far. Suppose we read a page into our buffer
pool. Until the page is clean, it would be ideal for the mapping to
Correction:
On Tue, Jan 14, 2014 at 2:12 PM, Robert Haas robertmh...@gmail.com wrote:
In terms of avoiding double-buffering, here's my thought after reading
what's been written so far. Suppose we read a page into our buffer
pool. Until the page is clean, it would be ideal for the mapping to
be shared
On Tue, Jan 14, 2014 at 12:15 PM, Claudio Freire klaussfre...@gmail.com wrote:
On Tue, Jan 14, 2014 at 2:12 PM, Robert Haas robertmh...@gmail.com wrote:
In terms of avoiding double-buffering, here's my thought after reading
what's been written so far. Suppose we read a page into our buffer
Claudio Freire klaussfre...@gmail.com wrote:
Robert Haas robertmh...@gmail.com wrote:
James Bottomley james.bottom...@hansenpartnership.com wrote:
I don't understand why this has to be absolute: if you advise
us to hold the pages dirty and we do up until it becomes a
choice to hold on to the
On 01/14/2014 05:44 PM, James Bottomley wrote:
On Tue, 2014-01-14 at 10:39 -0500, Tom Lane wrote:
James Bottomley james.bottom...@hansenpartnership.com writes:
The current mechanism for coherency between a userspace cache and the
in-kernel page cache is mmap ... that's the only way you get the
James Bottomley james.bottom...@hansenpartnership.com wrote:
you mean the order of write out, if we have to do it, is
important. In the rest of the kernel, we do this with barriers
which causes ordered grouping of I/O chunks. If we could force a
similar ordering in the writeout code, is
On Mon, Jan 13, 2014 at 2:36 PM, Mel Gorman mgor...@suse.de wrote:
On Mon, Jan 13, 2014 at 06:27:03PM -0200, Claudio Freire wrote:
On Mon, Jan 13, 2014 at 5:23 PM, Jim Nasby j...@nasby.net wrote:
On 1/13/14, 2:19 PM, Claudio Freire wrote:
On Mon, Jan 13, 2014 at 5:15 PM, Robert Haas
On Tue, Jan 14, 2014 at 12:20 PM, James Bottomley
james.bottom...@hansenpartnership.com wrote:
On Tue, 2014-01-14 at 15:15 -0200, Claudio Freire wrote:
On Tue, Jan 14, 2014 at 2:12 PM, Robert Haas robertmh...@gmail.com wrote:
In terms of avoiding double-buffering, here's my thought after
1 - 100 of 169 matches
Mail list logo