Looks like it's time to close the book on this one for 9.1
development...the unfortunate results are at
http://www.2ndquadrant.us/pgbench-results/index.htm Test set #12 is the
one with spread sync I was hoping would turn out better than #9, the
reference I was trying to improve on. TPS is
On Thu, Feb 10, 2011 at 10:30 PM, Greg Smith g...@2ndquadrant.com wrote:
3) The existing write spreading code in the background writer needs to be
overhauled, too, before spreading the syncs around is going to give the
benefits I was hoping for.
I've been thinking about this problem a bit. It
2011/2/7 Greg Smith g...@2ndquadrant.com:
Robert Haas wrote:
With the fsync queue compaction patch applied, I think most of this is
now not needed. Attached please find an attempt to isolate the
portion that looks like it might still be useful. The basic idea of
what remains here is to
Cédric Villemain wrote:
Is it worth a new thread with the different IO improvements done so
far or on-going and how we may add new GUC(if required !!!) with
intelligence between those patches ? ( For instance, hint bit IO limit
needs probably a tunable to define something similar to
Greg Smith g...@2ndquadrant.com wrote:
As a larger statement on this topic, I'm never very excited about
redesigning here starting from any point other than saw a
bottleneck doing x on a production system. There's a long list
of such things already around waiting to be addressed, and I've
Kevin Grittner wrote:
There are occasional posts from those wondering why their read-only
queries are so slow after a bulk load, and why they are doing heavy
writes. (I remember when I posted about that, as a relative newbie,
and I know I've seen others.)
Sure; I created
Robert Haas wrote:
With the fsync queue compaction patch applied, I think most of this is
now not needed. Attached please find an attempt to isolate the
portion that looks like it might still be useful. The basic idea of
what remains here is to make the background writer still do its normal
Michael Banck wrote:
On Sat, Jan 15, 2011 at 05:47:24AM -0500, Greg Smith wrote:
For example, the pre-release Squeeze numbers we're seeing are awful so
far, but it's not really done yet either.
Unfortunately, it does not look like Debian squeeze will change any more
(or has changed
As already mentioned in the broader discussion at
http://archives.postgresql.org/message-id/4d4c4610.1030...@2ndquadrant.com
, I'm seeing no solid performance swing in the checkpoint sorting code
itself. Better sometimes, worse others, but never by a large amount.
Here's what the statistics
On Fri, Feb 4, 2011 at 2:08 PM, Greg Smith g...@2ndquadrant.com wrote:
-The total number of buffers I'm computing based on the checkpoint writes
being sorted it not a perfect match to the number reported by the
checkpoint complete status line. Sometimes they are the same, sometimes
not. Not
On Sat, Jan 15, 2011 at 05:47:24AM -0500, Greg Smith wrote:
For example, the pre-release Squeeze numbers we're seeing are awful so
far, but it's not really done yet either.
Unfortunately, it does not look like Debian squeeze will change any more
(or has changed much since your post) at this
Greg Smith wrote:
I think the right way to compute relations to sync is to finish the
sorted writes patch I sent over a not quite right yet update to already
Attached update now makes much more sense than the misguided patch I
submitted two weesk ago. This takes the original sorted write
On Mon, Jan 31, 2011 at 4:28 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Robert Haas robertmh...@gmail.com writes:
Back to the idea at hand - I proposed something a bit along these
lines upthread, but my idea was to proactively perform the fsyncs on
the relations that had gone the longest without a
Robert Haas robertmh...@gmail.com wrote:
I also think Bruce's idea of calling fsync() on each relation just
*before* we start writing the pages from that relation might have
some merit.
What bothers me about that is that you may have a lot of the same
dirty pages in the OS cache as the
Robert Haas wrote:
Back to your idea: One problem with trying to bound the unflushed data
is that it's not clear what the bound should be. I've had this mental
model where we want the OS to write out pages to disk, but that's not
always true, per Greg Smith's recent posts about Linux kernel
Greg Smith wrote:
Greg Smith wrote:
I think the right way to compute relations to sync is to finish the
sorted writes patch I sent over a not quite right yet update to already
Attached update now makes much more sense than the misguided patch I
submitted two weesk ago. This takes the
On Tue, Feb 1, 2011 at 12:58 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
Robert Haas robertmh...@gmail.com wrote:
I also think Bruce's idea of calling fsync() on each relation just
*before* we start writing the pages from that relation might have
some merit.
What bothers me about
Kevin Grittner wrote:
Robert Haas robertmh...@gmail.com wrote:
I also think Bruce's idea of calling fsync() on each relation just
*before* we start writing the pages from that relation might have
some merit.
What bothers me about that is that you may have a lot of the same
dirty
Bruce Momjian br...@momjian.us writes:
My trivial idea was: let's assume we checkpoint every 10 minutes, and
it takes 5 minutes for us to write the data to the kernel. If no one
else is writing to those files, we can safely wait maybe 5 more minutes
before issuing the fsync. If, however,
Tom Lane wrote:
Bruce Momjian br...@momjian.us writes:
My trivial idea was: let's assume we checkpoint every 10 minutes, and
it takes 5 minutes for us to write the data to the kernel. If no one
else is writing to those files, we can safely wait maybe 5 more minutes
before issuing the
On Mon, Jan 31, 2011 at 13:41, Robert Haas robertmh...@gmail.com wrote:
1. Absorb fsync requests a lot more often during the sync phase.
2. Still try to run the cleaning scan during the sync phase.
3. Pause for 3 seconds after every fsync.
So if we want the checkpoint
to finish in, say, 20
On Mon, Jan 31, 2011 at 3:04 AM, Itagaki Takahiro
itagaki.takah...@gmail.com wrote:
On Mon, Jan 31, 2011 at 13:41, Robert Haas robertmh...@gmail.com wrote:
1. Absorb fsync requests a lot more often during the sync phase.
2. Still try to run the cleaning scan during the sync phase.
3. Pause for
On 31.01.2011 16:44, Robert Haas wrote:
On Mon, Jan 31, 2011 at 3:04 AM, Itagaki Takahiro
itagaki.takah...@gmail.com wrote:
On Mon, Jan 31, 2011 at 13:41, Robert Haasrobertmh...@gmail.com wrote:
1. Absorb fsync requests a lot more often during the sync phase.
2. Still try to run the cleaning
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
IMHO we should re-consider the patch to sort the writes. Not so much
because of the performance gain that gives, but because we can then
re-arrange the fsyncs so that you write one file, then fsync it, then
write the next file
On Mon, Jan 31, 2011 at 11:29 AM, Tom Lane t...@sss.pgh.pa.us wrote:
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
IMHO we should re-consider the patch to sort the writes. Not so much
because of the performance gain that gives, but because we can then
re-arrange the fsyncs so
Robert Haas robertmh...@gmail.com writes:
On Mon, Jan 31, 2011 at 11:29 AM, Tom Lane t...@sss.pgh.pa.us wrote:
That sounds like you have an entirely wrong mental model of where the
cost comes from. Those times are not independent.
Yeah, Greg Smith made the same point a week or three ago.
On Mon, Jan 31, 2011 at 11:51 AM, Tom Lane t...@sss.pgh.pa.us wrote:
Robert Haas robertmh...@gmail.com writes:
On Mon, Jan 31, 2011 at 11:29 AM, Tom Lane t...@sss.pgh.pa.us wrote:
That sounds like you have an entirely wrong mental model of where the
cost comes from. Those times are not
Robert Haas robertmh...@gmail.com writes:
3. Pause for 3 seconds after every fsync.
I think something along the lines of #3 is probably a good idea,
Really? Any particular delay is guaranteed wrong.
regards, tom lane
--
Sent via pgsql-hackers mailing list
On Mon, Jan 31, 2011 at 12:01 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Robert Haas robertmh...@gmail.com writes:
3. Pause for 3 seconds after every fsync.
I think something along the lines of #3 is probably a good idea,
Really? Any particular delay is guaranteed wrong.
What I was getting at
Robert Haas robertmh...@gmail.com writes:
On Mon, Jan 31, 2011 at 11:51 AM, Tom Lane t...@sss.pgh.pa.us wrote:
I wonder whether it'd be useful to keep track of the total amount of
data written-and-not-yet-synced, and to issue fsyncs often enough to
keep that below some parameter; the idea
On Mon, Jan 31, 2011 at 12:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Robert Haas robertmh...@gmail.com writes:
On Mon, Jan 31, 2011 at 11:51 AM, Tom Lane t...@sss.pgh.pa.us wrote:
I wonder whether it'd be useful to keep track of the total amount of
data written-and-not-yet-synced, and to issue
Robert Haas wrote:
Back to the idea at hand - I proposed something a bit along these
lines upthread, but my idea was to proactively perform the fsyncs on
the relations that had gone the longest without a write, rather than
the ones with the most dirty data. I'm not sure which is better.
Tom Lane wrote:
I wonder whether it'd be useful to keep track of the total amount of
data written-and-not-yet-synced, and to issue fsyncs often enough to
keep that below some parameter; the idea being that the parameter would
limit how much dirty kernel disk cache there is. Of course, ideally
Robert Haas robertmh...@gmail.com writes:
Back to the idea at hand - I proposed something a bit along these
lines upthread, but my idea was to proactively perform the fsyncs on
the relations that had gone the longest without a write, rather than
the ones with the most dirty data.
Yeah. What
Tom Lane wrote:
Robert Haas robertmh...@gmail.com writes:
3. Pause for 3 seconds after every fsync.
I think something along the lines of #3 is probably a good idea,
Really? Any particular delay is guaranteed wrong.
'3 seconds' is just a placeholder for whatever comes
On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith g...@2ndquadrant.com wrote:
I've attached an updated version of the initial sync spreading patch here,
one that applies cleanly on top of HEAD and over top of the sync
instrumentation patch too. The conflict that made that hard before is gone
now.
On Fri, Jan 28, 2011 at 12:53 AM, Greg Smith g...@2ndquadrant.com wrote:
Where there are still very ugly maximum latency figures here in every case,
these periods just aren't as wide with the patch in place.
OK, committed the patch, with some additional commenting, and after
fixing the compiler
Greg Smith wrote:
I think a helpful next step here would be to put Robert's fsync
compaction patch into here and see if that helps. There are enough
backend syncs showing up in the difficult workloads (scale=1000,
clients =32) that its impact should be obvious.
Initial tests show everything
On Thu, Jan 27, 2011 at 12:18 PM, Greg Smith g...@2ndquadrant.com wrote:
Greg Smith wrote:
I think a helpful next step here would be to put Robert's fsync compaction
patch into here and see if that helps. There are enough backend syncs
showing up in the difficult workloads (scale=1000,
Robert Haas wrote:
Based on what I saw looking at this, I'm thinking that the backend
fsyncs probably happen in clusters - IOW, it's not 2504 backend fsyncs
spread uniformly throughout the test, but clusters of 100 or more that
happen in very quick succession, followed by relief when the
Robert Haas wrote:
During each cluster, the system probably slows way down, and then recovers when
the queue is emptied. So the TPS improvement isn't at all a uniform
speedup, but simply relief from the stall that would otherwise result
from a full queue.
That does seem to be the case
2011/1/18 Greg Smith g...@2ndquadrant.com:
Bruce Momjian wrote:
Should we be writing until 2:30 then sleep 30 seconds and fsync at 3:00?
The idea of having a dead period doing no work at all between write phase
and sync phase may have some merit. I don't have enough test data yet on
some
Robert Haas wrote:
Idea #4: For ext3 filesystems that like to dump the entire buffer
cache instead of only the requested file, write a little daemon that
runs alongside of (and completely indepdently of) PostgreSQL. Every
30 s, it opens a 1-byte file, changes the byte, fsyncs the file, and
To be frank, I really don't care about fixing this behavior on ext3,
especially in the context of that sort of hack. That filesystem is not
the future, it's not possible to ever really make it work right, and
every minute spent on pandering to its limitations would be better spent
elsewhere
Greg Smith wrote:
One of the components to the write queue is some notion that writes that
have been waiting longest should eventually be flushed out. Linux has
this number called dirty_expire_centiseconds which suggests it enforces
just that, set to a default of 30 seconds. This is why
On Sun, Jan 16, 2011 at 7:13 PM, Greg Smith g...@2ndquadrant.com wrote:
I have finished a first run of benchmarking the current 9.1 code at various
sizes. See http://www.2ndquadrant.us/pgbench-results/index.htm for many
details. The interesting stuff is in Test Set 3, near the bottom. That's
Jeff Janes wrote:
Have you ever tested Robert's other idea of having a metronome process
do a periodic fsync on a dummy file which is located on the same ext3fs
as the table files? I think that that would be interesting to see.
To be frank, I really don't care about fixing this behavior on
On Jan 15, 2011, at 8:15 AM, Robert Haas wrote:
Well, the point of this is not to save time in the bgwriter - I'm not
surprised to hear that wasn't noticeable. The point is that when the
fsync request queue fills up, backends start performing an fsync *for
every block they write*, and that's
On Mon, Jan 17, 2011 at 6:07 PM, Jim Nasby j...@nasby.net wrote:
On Jan 15, 2011, at 8:15 AM, Robert Haas wrote:
Well, the point of this is not to save time in the bgwriter - I'm not
surprised to hear that wasn't noticeable. The point is that when the
fsync request queue fills up, backends
Jim Nasby wrote:
Wow, that's the kind of thing that would be incredibly difficult to figure out,
especially while your production system is in flames... Can we change ereport
that happens in that case from DEBUG1 to WARNING? Or provide some other means
to track it
That's why we already
Bruce Momjian wrote:
Should we be writing until 2:30 then sleep 30 seconds and fsync at 3:00?
The idea of having a dead period doing no work at all between write
phase and sync phase may have some merit. I don't have enough test data
yet on some more fundamental issues in this area to
On Tue, Jan 11, 2011 at 5:27 PM, Robert Haas robertmh...@gmail.com wrote:
On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith g...@2ndquadrant.com wrote:
One of the ideas Simon and I had been considering at one point was adding
some better de-duplication logic to the fsync absorb code, which I'm
On Sun, Jan 16, 2011 at 7:32 PM, Jeff Janes jeff.ja...@gmail.com wrote:
But since you already wrote a patch to do the whole thing, I figured
I'd time it.
Thanks!
I arranged to test an instrumented version of your patch under large
shared_buffers of 4GB, conditions that would maximize the
I have finished a first run of benchmarking the current 9.1 code at
various sizes. See http://www.2ndquadrant.us/pgbench-results/index.htm
for many details. The interesting stuff is in Test Set 3, near the
bottom. That's the first one that includes buffer_backend_fsync data.
This iall on
On Sun, Jan 16, 2011 at 10:13 PM, Greg Smith g...@2ndquadrant.com wrote:
I have finished a first run of benchmarking the current 9.1 code at various
sizes. See http://www.2ndquadrant.us/pgbench-results/index.htm for many
details. The interesting stuff is in Test Set 3, near the bottom.
Robert Haas wrote:
On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith g...@2ndquadrant.com wrote:
One of the ideas Simon and I had been considering at one point was adding
some better de-duplication logic to the fsync absorb code, which I'm
reminded by the pattern here might be helpful
On Sat, Jan 15, 2011 at 5:47 AM, Greg Smith g...@2ndquadrant.com wrote:
No toe damage, this is great, I hadn't gotten to coding for this angle yet
at all. Suffering from an overload of ideas and (mostly wasted) test data,
so thanks for exploring this concept and proving it works.
Yeah -
On Sat, 2011-01-15 at 05:47 -0500, Greg Smith wrote:
Robert Haas wrote:
On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith g...@2ndquadrant.com wrote:
One of the ideas Simon and I had been considering at one point was adding
some better de-duplication logic to the fsync absorb code, which
On Sat, Jan 15, 2011 at 8:55 AM, Simon Riggs si...@2ndquadrant.com wrote:
On Sat, 2011-01-15 at 05:47 -0500, Greg Smith wrote:
Robert Haas wrote:
On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith g...@2ndquadrant.com wrote:
One of the ideas Simon and I had been considering at one point was
Robert Haas wrote:
Idea #2: At the beginning of a checkpoint when we scan all the
buffers, count the number of buffers that need to be synced for each
relation. Use the same hashtable that we use for tracking pending
fsync requests. Then, interleave the writes and the fsyncs...
Idea #3: Stick
On Sat, Jan 15, 2011 at 9:25 AM, Greg Smith g...@2ndquadrant.com wrote:
Once upon a time we got a patch from Itagaki Takahiro whose purpose was to
sort writes before sending them out:
http://archives.postgresql.org/pgsql-hackers/2007-06/msg00541.php
Ah, a fine idea!
Which has very low odds
Robert Haas wrote:
I'll believe it when I see it. How about this:
a 1
a 2
sync a
b 1
b 2
sync b
c 1
c 2
sync c
Or maybe some variant, where we become willing to fsync a file a
certain number of seconds after writing the last block, or when all
the writes are done, whichever comes first.
On Sat, 2011-01-15 at 09:15 -0500, Robert Haas wrote:
On Sat, Jan 15, 2011 at 8:55 AM, Simon Riggs si...@2ndquadrant.com wrote:
On Sat, 2011-01-15 at 05:47 -0500, Greg Smith wrote:
Robert Haas wrote:
On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith g...@2ndquadrant.com wrote:
One of the
On Sat, Jan 15, 2011 at 10:31 AM, Greg Smith g...@2ndquadrant.com wrote:
That's going to give worse performance than the current code in some cases.
OK.
How does the checkpoint target give you any time to sync them? Unless
you squeeze the writes together more tightly, but that seems sketchy.
Robert Haas wrote:
That seems like a bad idea - don't we routinely recommend that people
crank this up to 0.9? You'd be effectively bounding the upper range
of this setting to a value to the less than the lowest value we
recommend anyone use today.
I was just giving an example of how I
On Sat, Jan 15, 2011 at 14:05, Robert Haas robertmh...@gmail.com wrote:
Idea #4: For ext3 filesystems that like to dump the entire buffer
cache instead of only the requested file, write a little daemon that
runs alongside of (and completely indepdently of) PostgreSQL. Every
30 s, it opens a
On Sat, Jan 15, 2011 at 5:57 PM, Greg Smith g...@2ndquadrant.com wrote:
I was just giving an example of how I might do an initial split. There's a
checkpoint happening now at time T; we have a rough idea that it needs to be
finished before some upcoming time T+D. Currently with default
Robert Haas wrote:
What is the basis for thinking that the sync should get the same
amount of time as the writes? That seems pretty arbitrary. Right
now, you're allowing 3 seconds per fsync, which could be a lot more or
a lot less than 40% of the total checkpoint time...
Just that it's where
On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith g...@2ndquadrant.com wrote:
Having the pg_stat_bgwriter.buffers_backend_fsync patch available all the
time now has made me reconsider how important one potential bit of
refactoring here would be. I managed to catch one of the situations where
really
On Mon, 2010-12-06 at 23:26 -0300, Alvaro Herrera wrote:
Why would multiple bgwriter processes worry you?
Because it complicates the tracking of files requiring fsync.
As Greg says, the last attempt to do that was a lot of code.
--
Simon Riggs http://www.2ndQuadrant.com/books/
Alvaro Herrera wrote:
Why would multiple bgwriter processes worry you?
Of course, it wouldn't work to have multiple processes trying to execute
a checkpoint simultaneously, but what if we separated the tasks so that
one process is in charge of checkpoints, and another one is in charge of
the
Excerpts from Greg Smith's message of dom dic 05 20:02:48 -0300 2010:
When ends up happening if you push toward fully sync I/O is the design
you see in some other databases, where you need multiple writer
processes. Then requests for new pages can continue to allocate as
needed, while
Heikki Linnakangas wrote:
If you fsync() a file with one dirty page in it, it's going to return
very quickly, but a 1GB file will take a while. That could be
problematic if you have a thousand small files and a couple of big
ones, as you would want to reserve more time for the big ones. I'm
On Sun, Dec 5, 2010 at 2:53 PM, Greg Smith g...@2ndquadrant.com wrote:
Heikki Linnakangas wrote:
If you fsync() a file with one dirty page in it, it's going to return very
quickly, but a 1GB file will take a while. That could be problematic if you
have a thousand small files and a couple of
Rob Wultsch wrote:
Forgive me, but is all of this a step on the slippery slope to
direct io? And is this a bad thing
I don't really think so. There's an important difference in my head
between direct I/O, where the kernel is told write this immediately!,
and what I'm trying to achive. I
Greg Stark wrote:
Using sync_file_range you can specify the set of blocks to sync and
then block on them only after some time has passed. But there's no
documentation on how this relates to the I/O scheduler so it's not
clear it would have any effect on the problem.
I believe this is the
On Wed, Dec 1, 2010 at 4:25 AM, Greg Smith g...@2ndquadrant.com wrote:
I ask because I don't have a mental model of how the pause can help.
Given that this dirty data has been hanging around for many minutes
already, what is a 3 second pause going to heal?
The difference is that once an
Using sync_file_range you can specify the set of blocks to sync and
then block on them only after some time has passed. But there's no
documentation on how this relates to the I/O scheduler so it's not
clear it would have any effect on the problem. We might still have to
delay the begining
On Thu, Dec 2, 2010 at 2:24 PM, Greg Stark gsst...@mit.edu wrote:
On Wed, Dec 1, 2010 at 4:25 AM, Greg Smith g...@2ndquadrant.com wrote:
I ask because I don't have a mental model of how the pause can help.
Given that this dirty data has been hanging around for many minutes
already, what is a 3
On 01.12.2010 06:25, Greg Smith wrote:
Jeff Janes wrote:
I ask because I don't have a mental model of how the pause can help.
Given that this dirty data has been hanging around for many minutes
already, what is a 3 second pause going to heal?
The difference is that once an fsync call is made,
Heikki Linnakangas wrote:
Do you have any idea how to autotune the delay between fsyncs?
I'm thinking to start by counting the number of relations that need them
at the beginning of the checkpoint. Then use the same basic math that
drives the spread writes, where you assess whether you're
On 01.12.2010 23:30, Greg Smith wrote:
Heikki Linnakangas wrote:
Do you have any idea how to autotune the delay between fsyncs?
I'm thinking to start by counting the number of relations that need them
at the beginning of the checkpoint. Then use the same basic math that
drives the spread
Ron Mayer wrote:
Might smoother checkpoints be better solved by talking
to the OS vendors virtual-memory-tunning-knob-authors
to work with them on exposing the ideal knobs; rather than
saying that our only tool is a hammer(fsync) so the problem
must be handled as a nail.
Maybe, but it's
Maybe, but it's hard to argue that the current implementation--just
doing all of the sync calls as fast as possible, one after the other--is
going to produce worst-case behavior in a lot of situations. Given that
it's not a huge amount of code to do better, I'd rather do some work in
that
On Sun, Nov 14, 2010 at 3:48 PM, Greg Smith g...@2ndquadrant.com wrote:
...
One change that turned out be necessary rather than optional--to get good
performance from the system under tuning--was to make regular background
writer activity, including fsync absorb checks, happen during these
Jeff Janes wrote:
Have you tested out this absorb during syncing phase code without
the sleep between the syncs?
I.e. so that it still a tight loop, but the loop alternates between
sync and absorb, with no intentional pause?
Yes; that's how it was developed. It helped to have just the
Josh Berkus wrote:
On 11/20/10 6:11 PM, Jeff Janes wrote:
True, but I think that changing these from their defaults is not
considered to be a dark art reserved for kernel hackers, i.e they are
something that sysadmins are expected to tweak to suite their work
load, just like the shmmax and
2010/11/21 Andres Freund and...@anarazel.de:
On Sunday 21 November 2010 23:19:30 Martijn van Oosterhout wrote:
For a similar problem we had (kernel buffering too much) we had success
using the fadvise and madvise WONTNEED syscalls to force the data to
exit the cache much sooner than it would
Jeff Janes wrote:
And for very large memory
systems, even 1% may be too much to cache (dirty*_ratio can only be
set in integer percent points), so recent kernels introduced
dirty*_bytes parameters. I like these better because they do what
they say. With the dirty*_ratio, I could never figure
Robert Haas wrote:
Doing all the writes and then all the fsyncs meets this requirement
trivially, but I'm not so sure that's a good idea. For example, given
files F1 ... Fn with dirty pages needing checkpoint writes, we could
do the following: first, do any pending fsyncs for files not among F1
On Sun, Nov 21, 2010 at 04:54:00PM -0500, Greg Smith wrote:
Ultimately what I want to do here is some sort of smarter write-behind
sync operation, perhaps with a LRU on relations with pending fsync
requests. The idea would be to sync relations that haven't been touched
in a while in
On Sunday 21 November 2010 23:19:30 Martijn van Oosterhout wrote:
For a similar problem we had (kernel buffering too much) we had success
using the fadvise and madvise WONTNEED syscalls to force the data to
exit the cache much sooner than it would otherwise. This was on Linux
and it had the
On 11/20/10 6:11 PM, Jeff Janes wrote:
True, but I think that changing these from their defaults is not
considered to be a dark art reserved for kernel hackers, i.e they are
something that sysadmins are expected to tweak to suite their work
load, just like the shmmax and such.
I disagree.
On Sun, Nov 21, 2010 at 4:54 PM, Greg Smith g...@2ndquadrant.com wrote:
Let me throw some numbers out [...]
Interesting.
Ultimately what I want to do here is some sort of smarter write-behind sync
operation, perhaps with a LRU on relations with pending fsync requests. The
idea would be to
On Mon, Nov 15, 2010 at 6:15 PM, Robert Haas robertmh...@gmail.com wrote:
On Sun, Nov 14, 2010 at 6:48 PM, Greg Smith g...@2ndquadrant.com wrote:
The second issue is that the delay between sync calls is currently
hard-coded, at 3 seconds. I believe the right path here is to consider the
On Sat, Nov 20, 2010 at 6:21 PM, Jeff Janes jeff.ja...@gmail.com wrote:
The thing to realize
that complicates the design is that the actual sync execution may take a
considerable period of time. It's much more likely for that to happen than
in the case of an individual write, as the current
On Sat, Nov 20, 2010 at 5:17 PM, Robert Haas robertmh...@gmail.com wrote:
On Sat, Nov 20, 2010 at 6:21 PM, Jeff Janes jeff.ja...@gmail.com wrote:
Doing all the writes and then all the fsyncs meets this requirement
trivially, but I'm not so sure that's a good idea. For example, given
files F1
On Sun, Nov 14, 2010 at 6:48 PM, Greg Smith g...@2ndquadrant.com wrote:
The second issue is that the delay between sync calls is currently
hard-coded, at 3 seconds. I believe the right path here is to consider the
current checkpoint_completion_target to still be valid, then work back from
Final patch in this series for today spreads out the individual
checkpoint fsync calls over time, and was written by myself and Simon
Riggs. Patch is based against a system that's already had the two
patches I sent over earlier today applied, rather than HEAD, as both are
useful for measuring
99 matches
Mail list logo