Am 23.01.14 02:14, schrieb Jim Nasby:
> On 1/19/14, 5:51 PM, Dave Chinner wrote:
>>> Postgres is far from being the only application that wants this; many
>>> >people resort to tmpfs because of this:
>>> >https://lwn.net/Articles/499410/
>> Yes, we covered the possibility of using tmpfs much earlie
On 1/20/14 9:46 AM, Mel Gorman wrote:
They could potentially be used to evalate any IO scheduler changes.
For example -- deadline scheduler with these parameters has X
transactions/sec throughput with average latency of Y millieseconds
and a maximum fsync latency of Z seconds. Evaluate how well
On Wed, Jan 22, 2014 at 10:08 PM, Jim Nasby wrote:
>
> Probably more useful is the case of index scans; if we pre-read more data
> from the index we could hand the kernel a list of base relation blocks that
> we know we'll need.
Actually, I've already tried this. The most important part is fetch
On 1/19/14, 5:51 PM, Dave Chinner wrote:
Postgres is far from being the only application that wants this; many
>people resort to tmpfs because of this:
>https://lwn.net/Articles/499410/
Yes, we covered the possibility of using tmpfs much earlier in the
thread, and came to the conclusion that tem
On 1/17/14, 2:24 PM, Gregory Smith wrote:
I am skeptical that the database will take over very much of this work and perform better
than the Linux kernel does. My take is that our most useful role would be providing test
cases kernel developers can add to a performance regression suite. Ugly
On 1/17/14, 7:57 AM, Robert Haas wrote:
- WAL files are written (and sometimes read) sequentially and fsync'd
very frequently and it's always good to write the data out to disk as
soon as possible
- Temp files are written and read sequentially and never fsync'd.
They should only be written to dis
On Tue, Jan 21, 2014 at 09:20:52PM +0100, Jan Kara wrote:
> > If we're forcing the WAL out to disk because of transaction commit or
> > because we need to write the buffer protected by a certain WAL record
> > only after the WAL hits the platter, then it's fine. But sometimes
> > we're writing WAL
On Fri 17-01-14 08:57:25, Robert Haas wrote:
> On Fri, Jan 17, 2014 at 7:34 AM, Jeff Layton wrote:
> > So this says to me that the WAL is a place where DIO should really be
> > reconsidered. It's mostly sequential writes that need to hit the disk
> > ASAP, and you need to know that they have hit t
On Wed 22-01-14 09:07:19, Dave Chinner wrote:
> On Tue, Jan 21, 2014 at 09:20:52PM +0100, Jan Kara wrote:
> > > If we're forcing the WAL out to disk because of transaction commit or
> > > because we need to write the buffer protected by a certain WAL record
> > > only after the WAL hits the platter
On Tue, Jan 21, 2014 at 09:20:52PM +0100, Jan Kara wrote:
> On Fri 17-01-14 08:57:25, Robert Haas wrote:
> > On Fri, Jan 17, 2014 at 7:34 AM, Jeff Layton wrote:
> > > So this says to me that the WAL is a place where DIO should really be
> > > reconsidered. It's mostly sequential writes that need t
On Tue, Jan 21, 2014 at 3:20 PM, Jan Kara wrote:
>> But that still doesn't work out very well, because now the guy who
>> does the write() has to wait for it to finish before he can do
>> anything else. That's not always what we want, because WAL gets
>> written out from our internal buffers for
On Wed, Jan 15, 2014 at 11:49:09AM +, Mel Gorman wrote:
> It may be the case that mmap/madvise is still required to handle a double
> buffering problem but it's far from being a free lunch and it has costs
> that read/write does not have to deal with. Maybe some of these problems
> can be fixed
On Mon, 20 Jan 2014 10:51:41 +1100
Dave Chinner wrote:
> On Sun, Jan 19, 2014 at 03:37:37AM +0200, Marti Raudsepp wrote:
> > On Wed, Jan 15, 2014 at 5:34 AM, Jim Nasby wrote:
> > > it's very common to create temporary file data that will never, ever, ever
> > > actually NEED to hit disk. Where I
On Fri, Jan 17, 2014 at 03:24:01PM -0500, Gregory Smith wrote:
> On 1/17/14 10:37 AM, Mel Gorman wrote:
> >There is not an easy way to tell. To be 100%, it would require an
> >instrumentation patch or a systemtap script to detect when a
> >particular page is being written back and track the context
On Mon, Jan 20, 2014 at 10:51:41AM +1100, Dave Chinner wrote:
> On Sun, Jan 19, 2014 at 03:37:37AM +0200, Marti Raudsepp wrote:
> > On Wed, Jan 15, 2014 at 5:34 AM, Jim Nasby wrote:
> > > it's very common to create temporary file data that will never, ever, ever
> > > actually NEED to hit disk. Wh
On Sun, Jan 19, 2014 at 03:37:37AM +0200, Marti Raudsepp wrote:
> On Wed, Jan 15, 2014 at 5:34 AM, Jim Nasby wrote:
> > it's very common to create temporary file data that will never, ever, ever
> > actually NEED to hit disk. Where I work being able to tell the kernel to
> > avoid flushing those f
On Mon, Jan 20, 2014 at 1:51 AM, Dave Chinner wrote:
>> Postgres is far from being the only application that wants this; many
>> people resort to tmpfs because of this:
>> https://lwn.net/Articles/499410/
>
> Yes, we covered the possibility of using tmpfs much earlier in the
> thread, and came to
On 1/17/14 10:37 AM, Mel Gorman wrote:
There is not an easy way to tell. To be 100%, it would require an
instrumentation patch or a systemtap script to detect when a
particular page is being written back and track the context. There are
approximations though. Monitor nr_dirty pages over time.
On 01/17/2014 06:40 AM, Dave Chinner wrote:
> On Thu, Jan 16, 2014 at 08:48:24PM -0500, Robert Haas wrote:
>> On Thu, Jan 16, 2014 at 7:31 PM, Dave Chinner wrote:
>>> But there's something here that I'm not getting - you're talking
>>> about a data set that you want ot keep cache resident that is
On Thu, Jan 16, 2014 at 04:30:59PM -0800, Jeff Janes wrote:
> On Wed, Jan 15, 2014 at 2:08 AM, Mel Gorman wrote:
>
> > On Tue, Jan 14, 2014 at 09:30:19AM -0800, Jeff Janes wrote:
> > > >
> > > > That could be something we look at. There are cases buried deep in the
> > > > VM where pages get shuf
On Fri, Jan 17, 2014 at 7:34 AM, Jeff Layton wrote:
> So this says to me that the WAL is a place where DIO should really be
> reconsidered. It's mostly sequential writes that need to hit the disk
> ASAP, and you need to know that they have hit the disk before you can
> proceed with other operation
On Thu, 16 Jan 2014 20:48:24 -0500
Robert Haas wrote:
> On Thu, Jan 16, 2014 at 7:31 PM, Dave Chinner wrote:
> > But there's something here that I'm not getting - you're talking
> > about a data set that you want ot keep cache resident that is at
> > least an order of magnitude larger than the c
On Thu, Jan 16, 2014 at 08:48:24PM -0500, Robert Haas wrote:
> On Thu, Jan 16, 2014 at 7:31 PM, Dave Chinner wrote:
> > But there's something here that I'm not getting - you're talking
> > about a data set that you want ot keep cache resident that is at
> > least an order of magnitude larger than
On Wed, Jan 15, 2014 at 06:14:18PM -0600, Jim Nasby wrote:
> On 1/15/14, 12:00 AM, Claudio Freire wrote:
> >My completely unproven theory is that swapping is overwhelmed by
> >near-misses. Ie: a process touches a page, and before it's
> >actually swapped in, another process touches it too, blocking
On Thu, Jan 16, 2014 at 03:58:56PM -0800, Jeff Janes wrote:
> On Thu, Jan 16, 2014 at 3:23 PM, Dave Chinner wrote:
>
> > On Wed, Jan 15, 2014 at 06:14:18PM -0600, Jim Nasby wrote:
> > > On 1/15/14, 12:00 AM, Claudio Freire wrote:
> > > >My completely unproven theory is that swapping is overwhelme
On Thu, Jan 16, 2014 at 7:31 PM, Dave Chinner wrote:
> But there's something here that I'm not getting - you're talking
> about a data set that you want ot keep cache resident that is at
> least an order of magnitude larger than the cyclic 5-15 minute WAL
> dataset that ongoing operations need to
On Thu, Jan 16, 2014 at 3:23 PM, Dave Chinner wrote:
> On Wed, Jan 15, 2014 at 06:14:18PM -0600, Jim Nasby wrote:
> > On 1/15/14, 12:00 AM, Claudio Freire wrote:
> > >My completely unproven theory is that swapping is overwhelmed by
> > >near-misses. Ie: a process touches a page, and before it's
>
On Wed, Jan 15, 2014 at 10:35:44AM +0100, Jan Kara wrote:
> Filesystems could in theory provide facility like atomic write (at least up
> to a certain size say in MB range) but it's not so easy and when there are
> no strong usecases fs people are reluctant to make their code more complex
> unneces
On Wed, 15 Jan 2014 21:37:16 -0500
Robert Haas wrote:
> On Wed, Jan 15, 2014 at 8:41 PM, Jan Kara wrote:
> > On Wed 15-01-14 10:12:38, Robert Haas wrote:
> >> On Wed, Jan 15, 2014 at 4:35 AM, Jan Kara wrote:
> >> > Filesystems could in theory provide facility like atomic write (at least
> >> >
On Wed, Jan 15, 2014 at 07:31:15PM -0500, Tom Lane wrote:
> Dave Chinner writes:
> > On Wed, Jan 15, 2014 at 07:08:18PM -0500, Tom Lane wrote:
> >> No, we'd be happy to re-request it during each checkpoint cycle, as
> >> long as that wasn't an unduly expensive call to make. I'm not quite
> >> sur
On Wed 15-01-14 10:12:38, Robert Haas wrote:
> On Wed, Jan 15, 2014 at 4:35 AM, Jan Kara wrote:
> > Filesystems could in theory provide facility like atomic write (at least up
> > to a certain size say in MB range) but it's not so easy and when there are
> > no strong usecases fs people are reluct
On Wed 15-01-14 21:37:16, Robert Haas wrote:
> On Wed, Jan 15, 2014 at 8:41 PM, Jan Kara wrote:
> > On Wed 15-01-14 10:12:38, Robert Haas wrote:
> >> On Wed, Jan 15, 2014 at 4:35 AM, Jan Kara wrote:
> >> > Filesystems could in theory provide facility like atomic write (at least
> >> > up
> >> >
On 14/01/14 22:23, Dave Chinner wrote:
On Tue, Jan 14, 2014 at 11:40:38AM -0800, Kevin Grittner wrote:
To quantify that, in a production setting we were seeing pauses of
up to two minutes with shared_buffers set to 8GB and default dirty
^^
I wonder if kernel can sometimes provide weaker version of fsync() which
is not enforcing all pending data to be written immediately but just
servers as write barrier, guaranteeing
that all write operations preceding fsync() will be completed before any
of subsequent operations.
It will allow
On Wed, Jan 15, 2014 at 8:41 PM, Jan Kara wrote:
> On Wed 15-01-14 10:12:38, Robert Haas wrote:
>> On Wed, Jan 15, 2014 at 4:35 AM, Jan Kara wrote:
>> > Filesystems could in theory provide facility like atomic write (at least up
>> > to a certain size say in MB range) but it's not so easy and whe
On Wed, Jan 15, 2014 at 07:08:18PM -0500, Tom Lane wrote:
> Dave Chinner writes:
> > On Wed, Jan 15, 2014 at 10:12:38AM -0500, Tom Lane wrote:
> >> What we'd really like for checkpointing is to hand the kernel a boatload
> >> (several GB) of dirty pages and say "how about you push all this to disk
On Wed, Jan 15, 2014 at 10:12:38AM -0500, Robert Haas wrote:
> On Wed, Jan 15, 2014 at 4:35 AM, Jan Kara wrote:
> > Filesystems could in theory provide facility like atomic write (at least up
> > to a certain size say in MB range) but it's not so easy and when there are
> > no strong usecases fs p
On Wed, Jan 15, 2014 at 07:13:27PM -0500, Tom Lane wrote:
> Dave Chinner writes:
> > On Wed, Jan 15, 2014 at 02:29:40PM -0800, Jeff Janes wrote:
> >> And most importantly, "Also, please don't freeze up everything else in the
> >> process"
>
> > If you hand writeback off to the kernel, then writeb
On Wed, Jan 15, 2014 at 02:29:40PM -0800, Jeff Janes wrote:
> On Wed, Jan 15, 2014 at 7:12 AM, Tom Lane wrote:
>
> > Heikki Linnakangas writes:
> > > On 01/15/2014 07:50 AM, Dave Chinner wrote:
> > >> FWIW [and I know you're probably sick of hearing this by now], but
> > >> the blk-io throttling
On Wed, Jan 15, 2014 at 10:12:38AM -0500, Tom Lane wrote:
> Heikki Linnakangas writes:
> > On 01/15/2014 07:50 AM, Dave Chinner wrote:
> >> FWIW [and I know you're probably sick of hearing this by now], but
> >> the blk-io throttling works almost perfectly with applications that
> >> use direct IO
Robert Haas writes:
> I don't see that as a problem. What we're struggling with today is
> that, until we fsync(), the system is too lazy about writing back
> dirty pages. And then when we fsync(), it becomes very aggressive and
> system-wide throughput goes into the tank. What we're aiming to
Dave Chinner writes:
> On Wed, Jan 15, 2014 at 07:08:18PM -0500, Tom Lane wrote:
>> No, we'd be happy to re-request it during each checkpoint cycle, as
>> long as that wasn't an unduly expensive call to make. I'm not quite
>> sure where such requests ought to "live" though. One idea is to tie
>>
On Wed, Jan 15, 2014 at 7:22 PM, Dave Chinner wrote:
> No, I meant the opposite - in low memory situations, the system is
> going to go to hell in a handbasket because we are going to cause a
> writeback IO storm cleaning memory regardless of these IO
> priorities. i.e. there is no way we'll let "
Dave Chinner writes:
> On Wed, Jan 15, 2014 at 02:29:40PM -0800, Jeff Janes wrote:
>> And most importantly, "Also, please don't freeze up everything else in the
>> process"
> If you hand writeback off to the kernel, then writeback for memory
> reclaim needs to take precedence over "metered writeb
Dave Chinner writes:
> On Wed, Jan 15, 2014 at 10:12:38AM -0500, Tom Lane wrote:
>> What we'd really like for checkpointing is to hand the kernel a boatload
>> (several GB) of dirty pages and say "how about you push all this to disk
>> over the next few minutes, in whatever way seems optimal given
On Wed, Jan 15, 2014 at 7:12 AM, Tom Lane wrote:
> Heikki Linnakangas writes:
> > On 01/15/2014 07:50 AM, Dave Chinner wrote:
> >> FWIW [and I know you're probably sick of hearing this by now], but
> >> the blk-io throttling works almost perfectly with applications that
> >> use direct IO.
>
On Wed 15-01-14 14:38:44, Hannu Krosing wrote:
> On 01/15/2014 02:01 PM, Jan Kara wrote:
> > On Wed 15-01-14 12:16:50, Hannu Krosing wrote:
> >> On 01/14/2014 06:12 PM, Robert Haas wrote:
> >>> This would be pretty similar to copy-on-write, except
> >>> without the copying. It would just be
> >>> f
On Wed, Jan 15, 2014 at 3:41 PM, Stephen Frost wrote:
> * Claudio Freire (klaussfre...@gmail.com) wrote:
>> But, still, the implementation is very similar to what postgres needs:
>> sharing a physical page for two distinct logical pages, efficiently,
>> with efficient copy-on-write.
>
> Agreed, ex
* Claudio Freire (klaussfre...@gmail.com) wrote:
> But, still, the implementation is very similar to what postgres needs:
> sharing a physical page for two distinct logical pages, efficiently,
> with efficient copy-on-write.
Agreed, except that KSM seems like it'd be slow/lazy about it and I'm
gue
On Wed, Jan 15, 2014 at 1:35 PM, Stephen Frost wrote:
>> And there's a nice bingo. Had forgotten about KSM. KSM could help lots.
>>
>> I could try to see of madvising shared_buffers as mergeable helps. But
>> this should be an automatic case of KSM - ie, when reading into a
>> page-aligned address
* Claudio Freire (klaussfre...@gmail.com) wrote:
> Yes, that's basically zero-copy reads.
>
> It could be done. The kernel can remap the page to the physical page
> holding the shared buffer and mark it read-only, then expire the
> buffer and transfer ownership of the page if any page fault happen
On Wed, Jan 15, 2014 at 4:35 AM, Jan Kara wrote:
> Filesystems could in theory provide facility like atomic write (at least up
> to a certain size say in MB range) but it's not so easy and when there are
> no strong usecases fs people are reluctant to make their code more complex
> unnecessarily.
Heikki Linnakangas writes:
> On 01/15/2014 07:50 AM, Dave Chinner wrote:
>> FWIW [and I know you're probably sick of hearing this by now], but
>> the blk-io throttling works almost perfectly with applications that
>> use direct IO.
> For checkpoint writes, direct I/O actually would be reasona
On 01/15/2014 07:50 AM, Dave Chinner wrote:
However, the first problem is dealing with the IO storm problem on
fsync. Then we can measure the effect of spreading those writes out
in time and determine what triggers read starvations (if they are
apparent). The we can look at whether IO scheduling
On Tue, Jan 14, 2014 at 5:23 PM, Dave Chinner wrote:
> By default, background writeback doesn't start until 10% of memory
> is dirtied, and on your machine that's 25GB of RAM. That's way to
> high for your workload.
>
> It appears to me that we are seeing large memory machines much more
> commonly
On Tue, Jan 14, 2014 at 4:23 PM, James Bottomley
wrote:
> Yes, that's what I was thinking: it's a cache. About how many files
> comprise this cache? Are you thinking it's too difficult for every
> process to map the files?
No, I'm thinking that would throw cache coherency out the window.
Separa
On 01/15/2014 02:01 PM, Jan Kara wrote:
> On Wed 15-01-14 12:16:50, Hannu Krosing wrote:
>> On 01/14/2014 06:12 PM, Robert Haas wrote:
>>> This would be pretty similar to copy-on-write, except
>>> without the copying. It would just be
>>> forget-from-the-buffer-pool-on-write.
>> +1
>>
>> A version
On Wed 15-01-14 12:16:50, Hannu Krosing wrote:
> On 01/14/2014 06:12 PM, Robert Haas wrote:
> > This would be pretty similar to copy-on-write, except
> > without the copying. It would just be
> > forget-from-the-buffer-pool-on-write.
>
> +1
>
> A version of this could probably already be impleme
On Wed 15-01-14 10:27:26, Heikki Linnakangas wrote:
> On 01/15/2014 06:01 AM, Jim Nasby wrote:
> >For the sake of completeness... it's theoretically silly that Postgres
> >is doing all this stuff with WAL when the filesystem is doing something
> >very similar with it's journal. And an SSD drive (an
On Tue, Jan 14, 2014 at 09:54:20PM -0600, Jim Nasby wrote:
> On 1/14/14, 3:41 PM, Dave Chinner wrote:
> >On Tue, Jan 14, 2014 at 09:40:48AM -0500, Robert Haas wrote:
> >>On Mon, Jan 13, 2014 at 5:26 PM, Mel Gorman
> >>wrote: Whether the problem is with the system call or the
> >>programmer is hard
On Mon, Jan 13, 2014 at 02:19:56PM -0800, James Bottomley wrote:
> On Mon, 2014-01-13 at 22:12 +0100, Andres Freund wrote:
> > On 2014-01-13 12:34:35 -0800, James Bottomley wrote:
> > > On Mon, 2014-01-13 at 14:32 -0600, Jim Nasby wrote:
> > > > Well, if we were to collaborate with the kernel commu
On 01/15/2014 12:16 PM, Hannu Krosing wrote:
> On 01/14/2014 06:12 PM, Robert Haas wrote:
>> This would be pretty similar to copy-on-write, except
>> without the copying. It would just be
>> forget-from-the-buffer-pool-on-write.
> +1
>
> A version of this could probably already be implement using
On 01/14/2014 06:12 PM, Robert Haas wrote:
> This would be pretty similar to copy-on-write, except
> without the copying. It would just be
> forget-from-the-buffer-pool-on-write.
+1
A version of this could probably already be implement using MADV_DONTNEED
and MADV_WILLNEED
Thet is, just after r
On 01/15/2014 06:01 AM, Jim Nasby wrote:
For the sake of completeness... it's theoretically silly that Postgres
is doing all this stuff with WAL when the filesystem is doing something
very similar with it's journal. And an SSD drive (and next generation
spinning rust) is doing the same thing *aga
On 1/14/14, 10:08 AM, Tom Lane wrote:
Trond Myklebust writes:
On Jan 14, 2014, at 10:39, Tom Lane wrote:
"Don't be aggressive" isn't good enough. The prohibition on early write
has to be absolute, because writing a dirty page before we've done
whatever else we need to do results in a corrupt
On 1/14/14, 3:41 PM, Dave Chinner wrote:
On Tue, Jan 14, 2014 at 09:40:48AM -0500, Robert Haas wrote:
On Mon, Jan 13, 2014 at 5:26 PM, Mel Gorman wrote:
IOWs, using sync_file_range() does not avoid the need to fsync() a
file for data integrity purposes...
I belive the PG community understand
On Tue, Jan 14, 2014 at 03:03:39PM -0800, Kevin Grittner wrote:
> Dave Chinner write:
>
> > Essentially, changing dirty_background_bytes, dirty_bytes and
> > dirty_expire_centiseconds to be much smaller should make the
> > kernel start writeback much sooner and so you shouldn't have to
> > limit
On Tue, 2014-01-14 at 15:09 -0500, Robert Haas wrote:
> On Tue, Jan 14, 2014 at 3:00 PM, James Bottomley
> wrote:
> >> Doesn't sound exactly like what I had in mind. What I was suggesting
> >> is an analogue of read() that, if it reads full pages of data to a
> >> page-aligned address, shares the
On Tue, Jan 14, 2014 at 11:40:38AM -0800, Kevin Grittner wrote:
> Robert Haas wrote:
> > Jan Kara wrote:
> >
> >> Just to get some idea about the sizes - how large are the
> >> checkpoints we are talking about that cause IO stalls?
> >
> > Big.
>
> To quantify that, in a production setting we we
On Tue, Jan 14, 2014 at 09:40:48AM -0500, Robert Haas wrote:
> On Mon, Jan 13, 2014 at 5:26 PM, Mel Gorman wrote:
> >> Amen to that. Actually, I think NUMA can be (mostly?) fixed by
> >> setting zone_reclaim_mode; is there some other problem besides that?
> >
> > Really?
> >
> > zone_reclaim_mode
On Tue, 2014-01-14 at 12:39 -0500, Robert Haas wrote:
> On Tue, Jan 14, 2014 at 12:20 PM, James Bottomley
> wrote:
> > On Tue, 2014-01-14 at 15:15 -0200, Claudio Freire wrote:
> >> On Tue, Jan 14, 2014 at 2:12 PM, Robert Haas wrote:
> >> > In terms of avoiding double-buffering, here's my thought
Dave Chinner write:
> Essentially, changing dirty_background_bytes, dirty_bytes and
> dirty_expire_centiseconds to be much smaller should make the
> kernel start writeback much sooner and so you shouldn't have to
> limit the amount of buffers the application has to prevent major
> fsync triggered
James Bottomley wrote:
>> We start by creating a chunk of shared memory that all processes
>> (we do not use threads) will have mapped at a common address,
>> and we read() and write() into that chunk.
>
> Yes, that's what I was thinking: it's a cache. About how many
> files comprise this cache?
I wrote:
> to avoid write gluts it must often be limited to 1GB to 1GB.
That should have been "1GB to 2GB."
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
James Bottomley wrote:
> About how many files comprise this cache? Are you thinking it's
> too difficult for every process to map the files?
The shared_buffers area can be mapping anywhere from about 200
files to millions of files, representing a total space of about 6MB
on the low end to over
On Tue, Jan 14, 2014 at 3:00 PM, James Bottomley
wrote:
>> Doesn't sound exactly like what I had in mind. What I was suggesting
>> is an analogue of read() that, if it reads full pages of data to a
>> page-aligned address, shares the data with the buffer cache until it's
>> first written instead
Robert Haas wrote:
> Jan Kara wrote:
>
>> Just to get some idea about the sizes - how large are the
>> checkpoints we are talking about that cause IO stalls?
>
> Big.
To quantify that, in a production setting we were seeing pauses of
up to two minutes with shared_buffers set to 8GB and default d
* Robert Haas (robertmh...@gmail.com) wrote:
> I dunno what a typical checkpoint size is but I don't think you'll be
> exaggerating much if you imagine that everything that could possibly
> be dirty is.
This is not uncommon for us, at least:
checkpoint complete: wrote 425844 buffers (20.3%); 0 tr
On Tue 14-01-14 10:04:16, Robert Haas wrote:
> On Tue, Jan 14, 2014 at 5:00 AM, Jan Kara wrote:
> > I thought that instead of injecting pages into pagecache for aging as you
> > describe in 3), you would mark pages as volatile (i.e. for reclaim by
> > kernel) through vrange() syscall. Next time yo
On Tue, Jan 14, 2014 at 1:37 PM, Jan Kara wrote:
> Just to get some idea about the sizes - how large are the checkpoints we
> are talking about that cause IO stalls?
Big. Potentially, we might have dirtied all of shared_buffers and
then started evicting pages from there to the OS buffer pool and
On Tue 14-01-14 06:42:43, Kevin Grittner wrote:
> First off, I want to give a +1 on everything in the recent posts
> from Heikki and Hannu.
>
> Jan Kara wrote:
>
> > Now the aging of pages marked as volatile as it is currently
> > implemented needn't be perfect for your needs but you still have
* Claudio Freire (klaussfre...@gmail.com) wrote:
> On Tue, Jan 14, 2014 at 2:17 PM, Robert Haas wrote:
> > I don't know either. I wasn't thinking so much that it would save CPU
> > time as that it would save memory. Consider a system with 32GB of
> > RAM. If you set shared_buffers=8GB, then in
On Tue, Jan 14, 2014 at 2:39 PM, Robert Haas wrote:
> On Tue, Jan 14, 2014 at 12:20 PM, James Bottomley
> wrote:
>> On Tue, 2014-01-14 at 15:15 -0200, Claudio Freire wrote:
>>> On Tue, Jan 14, 2014 at 2:12 PM, Robert Haas wrote:
>>> > In terms of avoiding double-buffering, here's my thought afte
On Tue, Jan 14, 2014 at 2:17 PM, Robert Haas wrote:
> On Tue, Jan 14, 2014 at 12:15 PM, Claudio Freire
> wrote:
>> On Tue, Jan 14, 2014 at 2:12 PM, Robert Haas wrote:
>>> In terms of avoiding double-buffering, here's my thought after reading
>>> what's been written so far. Suppose we read a pa
On Mon, Jan 13, 2014 at 6:44 PM, Dave Chinner wrote:
> On Tue, Jan 14, 2014 at 02:26:25AM +0100, Andres Freund wrote:
> > On 2014-01-13 17:13:51 -0800, James Bottomley wrote:
> > > a file into a user provided buffer, thus obtaining a page cache entry
> > > and a copy in their userspace buffer, th
On Tue, 2014-01-14 at 10:39 -0500, Tom Lane wrote:
> James Bottomley writes:
> > The current mechanism for coherency between a userspace cache and the
> > in-kernel page cache is mmap ... that's the only way you get the same
> > page in both currently.
>
> Right.
>
> > glibc used to have an impl
On Tue, 2014-01-14 at 15:15 -0200, Claudio Freire wrote:
> On Tue, Jan 14, 2014 at 2:12 PM, Robert Haas wrote:
> >
> > In terms of avoiding double-buffering, here's my thought after reading
> > what's been written so far. Suppose we read a page into our buffer
> > pool. Until the page is clean,
On Tue, 2014-01-14 at 11:48 -0500, Robert Haas wrote:
> On Tue, Jan 14, 2014 at 11:44 AM, James Bottomley
> wrote:
> > No, I'm sorry, that's never going to be possible. No user space
> > application has all the facts. If we give you an interface to force
> > unconditional holding of dirty pages
Robert Haas writes:
> On Tue, Jan 14, 2014 at 11:57 AM, James Bottomley
> wrote:
>> No, I do ... you mean the order of write out, if we have to do it, is
>> important. In the rest of the kernel, we do this with barriers which
>> causes ordered grouping of I/O chunks. If we could force a similar
On Tue, Jan 14, 2014 at 12:20 PM, James Bottomley
wrote:
> On Tue, 2014-01-14 at 15:15 -0200, Claudio Freire wrote:
>> On Tue, Jan 14, 2014 at 2:12 PM, Robert Haas wrote:
>> > In terms of avoiding double-buffering, here's my thought after reading
>> > what's been written so far. Suppose we read
James Bottomley wrote:
> you mean the order of write out, if we have to do it, is
> important. In the rest of the kernel, we do this with barriers
> which causes ordered grouping of I/O chunks. If we could force a
> similar ordering in the writeout code, is that enough?
Unless it can be betwee
On 01/14/2014 05:44 PM, James Bottomley wrote:
> On Tue, 2014-01-14 at 10:39 -0500, Tom Lane wrote:
>> James Bottomley writes:
>>> The current mechanism for coherency between a userspace cache and the
>>> in-kernel page cache is mmap ... that's the only way you get the same
>>> page in both curren
Claudio Freire wrote:
> Robert Haas wrote:
>> James Bottomley wrote:
>>> I don't understand why this has to be absolute: if you advise
>>> us to hold the pages dirty and we do up until it becomes a
>>> choice to hold on to the pages or to thrash the system into a
>>> livelock, why would you eve
On Tue, Jan 14, 2014 at 12:15 PM, Claudio Freire wrote:
> On Tue, Jan 14, 2014 at 2:12 PM, Robert Haas wrote:
>> In terms of avoiding double-buffering, here's my thought after reading
>> what's been written so far. Suppose we read a page into our buffer
>> pool. Until the page is clean, it woul
On Tue, Jan 14, 2014 at 2:12 PM, Robert Haas wrote:
>
> In terms of avoiding double-buffering, here's my thought after reading
> what's been written so far. Suppose we read a page into our buffer
> pool. Until the page is clean, it would be ideal for the mapping to
> be shared between the buffer
On Tue, Jan 14, 2014 at 12:12 PM, Robert Haas wrote:
> In terms of avoiding double-buffering, here's my thought after reading
> what's been written so far. Suppose we read a page into our buffer
> pool. Until the page is clean, it would be ideal for the mapping to
Correction: "For so long as th
On Tue, Jan 14, 2014 at 11:57 AM, James Bottomley
wrote:
> On Tue, 2014-01-14 at 11:48 -0500, Robert Haas wrote:
>> On Tue, Jan 14, 2014 at 11:44 AM, James Bottomley
>> wrote:
>> > No, I'm sorry, that's never going to be possible. No user space
>> > application has all the facts. If we give you
On 01/14/2014 06:08 PM, Tom Lane wrote:
Trond Myklebust writes:
On Jan 14, 2014, at 10:39, Tom Lane wrote:
"Don't be aggressive" isn't good enough. The prohibition on early write
has to be absolute, because writing a dirty page before we've done
whatever else we need to do results in a corru
On Tue, Jan 14, 2014 at 1:48 PM, Robert Haas wrote:
> On Tue, Jan 14, 2014 at 11:44 AM, James Bottomley
> wrote:
>> No, I'm sorry, that's never going to be possible. No user space
>> application has all the facts. If we give you an interface to force
>> unconditional holding of dirty pages in c
On Tue, Jan 14, 2014 at 11:44 AM, James Bottomley
wrote:
> No, I'm sorry, that's never going to be possible. No user space
> application has all the facts. If we give you an interface to force
> unconditional holding of dirty pages in core you'll livelock the system
> eventually because you made
1 - 100 of 135 matches
Mail list logo