Re: [HACKERS] Spread checkpoint sync

2011-02-10 Thread Greg Smith
Looks like it's time to close the book on this one for 9.1 development...the unfortunate results are at http://www.2ndquadrant.us/pgbench-results/index.htm Test set #12 is the one with spread sync I was hoping would turn out better than #9, the reference I was trying to improve on. TPS is

Re: [HACKERS] Spread checkpoint sync

2011-02-10 Thread Robert Haas
On Thu, Feb 10, 2011 at 10:30 PM, Greg Smith g...@2ndquadrant.com wrote: 3) The existing write spreading code in the background writer needs to be overhauled, too, before spreading the syncs around is going to give the benefits I was hoping for. I've been thinking about this problem a bit. It

Re: [HACKERS] Spread checkpoint sync

2011-02-07 Thread Cédric Villemain
2011/2/7 Greg Smith g...@2ndquadrant.com: Robert Haas wrote: With the fsync queue compaction patch applied, I think most of this is now not needed.  Attached please find an attempt to isolate the portion that looks like it might still be useful.  The basic idea of what remains here is to

Re: [HACKERS] Spread checkpoint sync

2011-02-07 Thread Greg Smith
Cédric Villemain wrote: Is it worth a new thread with the different IO improvements done so far or on-going and how we may add new GUC(if required !!!) with intelligence between those patches ? ( For instance, hint bit IO limit needs probably a tunable to define something similar to

Re: [HACKERS] Spread checkpoint sync

2011-02-07 Thread Kevin Grittner
Greg Smith g...@2ndquadrant.com wrote: As a larger statement on this topic, I'm never very excited about redesigning here starting from any point other than saw a bottleneck doing x on a production system. There's a long list of such things already around waiting to be addressed, and I've

Re: [HACKERS] Spread checkpoint sync

2011-02-07 Thread Greg Smith
Kevin Grittner wrote: There are occasional posts from those wondering why their read-only queries are so slow after a bulk load, and why they are doing heavy writes. (I remember when I posted about that, as a relative newbie, and I know I've seen others.) Sure; I created

Re: [HACKERS] Spread checkpoint sync

2011-02-06 Thread Greg Smith
Robert Haas wrote: With the fsync queue compaction patch applied, I think most of this is now not needed. Attached please find an attempt to isolate the portion that looks like it might still be useful. The basic idea of what remains here is to make the background writer still do its normal

Re: [HACKERS] Spread checkpoint sync

2011-02-04 Thread Greg Smith
Michael Banck wrote: On Sat, Jan 15, 2011 at 05:47:24AM -0500, Greg Smith wrote: For example, the pre-release Squeeze numbers we're seeing are awful so far, but it's not really done yet either. Unfortunately, it does not look like Debian squeeze will change any more (or has changed

Re: [HACKERS] Spread checkpoint sync

2011-02-04 Thread Greg Smith
As already mentioned in the broader discussion at http://archives.postgresql.org/message-id/4d4c4610.1030...@2ndquadrant.com , I'm seeing no solid performance swing in the checkpoint sorting code itself. Better sometimes, worse others, but never by a large amount. Here's what the statistics

Re: [HACKERS] Spread checkpoint sync

2011-02-04 Thread Robert Haas
On Fri, Feb 4, 2011 at 2:08 PM, Greg Smith g...@2ndquadrant.com wrote: -The total number of buffers I'm computing based on the checkpoint writes being sorted it not a perfect match to the number reported by the checkpoint complete status line.  Sometimes they are the same, sometimes not.  Not

Re: [HACKERS] Spread checkpoint sync

2011-02-03 Thread Michael Banck
On Sat, Jan 15, 2011 at 05:47:24AM -0500, Greg Smith wrote: For example, the pre-release Squeeze numbers we're seeing are awful so far, but it's not really done yet either. Unfortunately, it does not look like Debian squeeze will change any more (or has changed much since your post) at this

Re: [HACKERS] Spread checkpoint sync

2011-02-01 Thread Greg Smith
Greg Smith wrote: I think the right way to compute relations to sync is to finish the sorted writes patch I sent over a not quite right yet update to already Attached update now makes much more sense than the misguided patch I submitted two weesk ago. This takes the original sorted write

Re: [HACKERS] Spread checkpoint sync

2011-02-01 Thread Robert Haas
On Mon, Jan 31, 2011 at 4:28 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Back to the idea at hand - I proposed something a bit along these lines upthread, but my idea was to proactively perform the fsyncs on the relations that had gone the longest without a

Re: [HACKERS] Spread checkpoint sync

2011-02-01 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote: I also think Bruce's idea of calling fsync() on each relation just *before* we start writing the pages from that relation might have some merit. What bothers me about that is that you may have a lot of the same dirty pages in the OS cache as the

Re: [HACKERS] Spread checkpoint sync

2011-02-01 Thread Bruce Momjian
Robert Haas wrote: Back to your idea: One problem with trying to bound the unflushed data is that it's not clear what the bound should be. I've had this mental model where we want the OS to write out pages to disk, but that's not always true, per Greg Smith's recent posts about Linux kernel

Re: [HACKERS] Spread checkpoint sync

2011-02-01 Thread Bruce Momjian
Greg Smith wrote: Greg Smith wrote: I think the right way to compute relations to sync is to finish the sorted writes patch I sent over a not quite right yet update to already Attached update now makes much more sense than the misguided patch I submitted two weesk ago. This takes the

Re: [HACKERS] Spread checkpoint sync

2011-02-01 Thread Robert Haas
On Tue, Feb 1, 2011 at 12:58 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Robert Haas robertmh...@gmail.com wrote: I also think Bruce's idea of calling fsync() on each relation just *before* we start writing the pages from that relation might have some merit. What bothers me about

Re: [HACKERS] Spread checkpoint sync

2011-02-01 Thread Bruce Momjian
Kevin Grittner wrote: Robert Haas robertmh...@gmail.com wrote: I also think Bruce's idea of calling fsync() on each relation just *before* we start writing the pages from that relation might have some merit. What bothers me about that is that you may have a lot of the same dirty

Re: [HACKERS] Spread checkpoint sync

2011-02-01 Thread Tom Lane
Bruce Momjian br...@momjian.us writes: My trivial idea was: let's assume we checkpoint every 10 minutes, and it takes 5 minutes for us to write the data to the kernel. If no one else is writing to those files, we can safely wait maybe 5 more minutes before issuing the fsync. If, however,

Re: [HACKERS] Spread checkpoint sync

2011-02-01 Thread Bruce Momjian
Tom Lane wrote: Bruce Momjian br...@momjian.us writes: My trivial idea was: let's assume we checkpoint every 10 minutes, and it takes 5 minutes for us to write the data to the kernel. If no one else is writing to those files, we can safely wait maybe 5 more minutes before issuing the

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Itagaki Takahiro
On Mon, Jan 31, 2011 at 13:41, Robert Haas robertmh...@gmail.com wrote: 1. Absorb fsync requests a lot more often during the sync phase. 2. Still try to run the cleaning scan during the sync phase. 3. Pause for 3 seconds after every fsync. So if we want the checkpoint to finish in, say, 20

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Robert Haas
On Mon, Jan 31, 2011 at 3:04 AM, Itagaki Takahiro itagaki.takah...@gmail.com wrote: On Mon, Jan 31, 2011 at 13:41, Robert Haas robertmh...@gmail.com wrote: 1. Absorb fsync requests a lot more often during the sync phase. 2. Still try to run the cleaning scan during the sync phase. 3. Pause for

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Heikki Linnakangas
On 31.01.2011 16:44, Robert Haas wrote: On Mon, Jan 31, 2011 at 3:04 AM, Itagaki Takahiro itagaki.takah...@gmail.com wrote: On Mon, Jan 31, 2011 at 13:41, Robert Haasrobertmh...@gmail.com wrote: 1. Absorb fsync requests a lot more often during the sync phase. 2. Still try to run the cleaning

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: IMHO we should re-consider the patch to sort the writes. Not so much because of the performance gain that gives, but because we can then re-arrange the fsyncs so that you write one file, then fsync it, then write the next file

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Robert Haas
On Mon, Jan 31, 2011 at 11:29 AM, Tom Lane t...@sss.pgh.pa.us wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: IMHO we should re-consider the patch to sort the writes. Not so much because of the performance gain that gives, but because we can then re-arrange the fsyncs so

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Mon, Jan 31, 2011 at 11:29 AM, Tom Lane t...@sss.pgh.pa.us wrote: That sounds like you have an entirely wrong mental model of where the cost comes from.  Those times are not independent. Yeah, Greg Smith made the same point a week or three ago.

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Robert Haas
On Mon, Jan 31, 2011 at 11:51 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Jan 31, 2011 at 11:29 AM, Tom Lane t...@sss.pgh.pa.us wrote: That sounds like you have an entirely wrong mental model of where the cost comes from.  Those times are not

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: 3. Pause for 3 seconds after every fsync. I think something along the lines of #3 is probably a good idea, Really? Any particular delay is guaranteed wrong. regards, tom lane -- Sent via pgsql-hackers mailing list

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Robert Haas
On Mon, Jan 31, 2011 at 12:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: 3. Pause for 3 seconds after every fsync. I think something along the lines of #3 is probably a good idea, Really?  Any particular delay is guaranteed wrong. What I was getting at

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Mon, Jan 31, 2011 at 11:51 AM, Tom Lane t...@sss.pgh.pa.us wrote: I wonder whether it'd be useful to keep track of the total amount of data written-and-not-yet-synced, and to issue fsyncs often enough to keep that below some parameter; the idea

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Robert Haas
On Mon, Jan 31, 2011 at 12:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Jan 31, 2011 at 11:51 AM, Tom Lane t...@sss.pgh.pa.us wrote: I wonder whether it'd be useful to keep track of the total amount of data written-and-not-yet-synced, and to issue

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Bruce Momjian
Robert Haas wrote: Back to the idea at hand - I proposed something a bit along these lines upthread, but my idea was to proactively perform the fsyncs on the relations that had gone the longest without a write, rather than the ones with the most dirty data. I'm not sure which is better.

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Greg Smith
Tom Lane wrote: I wonder whether it'd be useful to keep track of the total amount of data written-and-not-yet-synced, and to issue fsyncs often enough to keep that below some parameter; the idea being that the parameter would limit how much dirty kernel disk cache there is. Of course, ideally

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: Back to the idea at hand - I proposed something a bit along these lines upthread, but my idea was to proactively perform the fsyncs on the relations that had gone the longest without a write, rather than the ones with the most dirty data. Yeah. What

Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Greg Smith
Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: 3. Pause for 3 seconds after every fsync. I think something along the lines of #3 is probably a good idea, Really? Any particular delay is guaranteed wrong. '3 seconds' is just a placeholder for whatever comes

Re: [HACKERS] Spread checkpoint sync

2011-01-30 Thread Robert Haas
On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith g...@2ndquadrant.com wrote: I've attached an updated version of the initial sync spreading patch here, one that applies cleanly on top of HEAD and over top of the sync instrumentation patch too.  The conflict that made that hard before is gone now.

Re: [HACKERS] Spread checkpoint sync

2011-01-29 Thread Robert Haas
On Fri, Jan 28, 2011 at 12:53 AM, Greg Smith g...@2ndquadrant.com wrote: Where there are still very ugly maximum latency figures here in every case, these periods just aren't as wide with the patch in place. OK, committed the patch, with some additional commenting, and after fixing the compiler

Re: [HACKERS] Spread checkpoint sync

2011-01-27 Thread Greg Smith
Greg Smith wrote: I think a helpful next step here would be to put Robert's fsync compaction patch into here and see if that helps. There are enough backend syncs showing up in the difficult workloads (scale=1000, clients =32) that its impact should be obvious. Initial tests show everything

Re: [HACKERS] Spread checkpoint sync

2011-01-27 Thread Robert Haas
On Thu, Jan 27, 2011 at 12:18 PM, Greg Smith g...@2ndquadrant.com wrote: Greg Smith wrote: I think a helpful next step here would be to put Robert's fsync compaction patch into here and see if that helps.  There are enough backend syncs showing up in the difficult workloads (scale=1000,

Re: [HACKERS] Spread checkpoint sync

2011-01-27 Thread Greg Smith
Robert Haas wrote: Based on what I saw looking at this, I'm thinking that the backend fsyncs probably happen in clusters - IOW, it's not 2504 backend fsyncs spread uniformly throughout the test, but clusters of 100 or more that happen in very quick succession, followed by relief when the

Re: [HACKERS] Spread checkpoint sync

2011-01-27 Thread Greg Smith
Robert Haas wrote: During each cluster, the system probably slows way down, and then recovers when the queue is emptied. So the TPS improvement isn't at all a uniform speedup, but simply relief from the stall that would otherwise result from a full queue. That does seem to be the case

Re: [HACKERS] Spread checkpoint sync

2011-01-18 Thread Cédric Villemain
2011/1/18 Greg Smith g...@2ndquadrant.com: Bruce Momjian wrote: Should we be writing until 2:30 then sleep 30 seconds and fsync at 3:00? The idea of having a dead period doing no work at all between write phase and sync phase may have some merit.  I don't have enough test data yet on some

Re: [HACKERS] Spread checkpoint sync

2011-01-18 Thread Greg Smith
Robert Haas wrote: Idea #4: For ext3 filesystems that like to dump the entire buffer cache instead of only the requested file, write a little daemon that runs alongside of (and completely indepdently of) PostgreSQL. Every 30 s, it opens a 1-byte file, changes the byte, fsyncs the file, and

Re: [HACKERS] Spread checkpoint sync

2011-01-18 Thread Josh Berkus
To be frank, I really don't care about fixing this behavior on ext3, especially in the context of that sort of hack. That filesystem is not the future, it's not possible to ever really make it work right, and every minute spent on pandering to its limitations would be better spent elsewhere

Re: [HACKERS] Spread checkpoint sync

2011-01-17 Thread Bruce Momjian
Greg Smith wrote: One of the components to the write queue is some notion that writes that have been waiting longest should eventually be flushed out. Linux has this number called dirty_expire_centiseconds which suggests it enforces just that, set to a default of 30 seconds. This is why

Re: [HACKERS] Spread checkpoint sync

2011-01-17 Thread Jeff Janes
On Sun, Jan 16, 2011 at 7:13 PM, Greg Smith g...@2ndquadrant.com wrote: I have finished a first run of benchmarking the current 9.1 code at various sizes.  See http://www.2ndquadrant.us/pgbench-results/index.htm for many details.  The interesting stuff is in Test Set 3, near the bottom.  That's

Re: [HACKERS] Spread checkpoint sync

2011-01-17 Thread Greg Smith
Jeff Janes wrote: Have you ever tested Robert's other idea of having a metronome process do a periodic fsync on a dummy file which is located on the same ext3fs as the table files? I think that that would be interesting to see. To be frank, I really don't care about fixing this behavior on

Re: [HACKERS] Spread checkpoint sync

2011-01-17 Thread Jim Nasby
On Jan 15, 2011, at 8:15 AM, Robert Haas wrote: Well, the point of this is not to save time in the bgwriter - I'm not surprised to hear that wasn't noticeable. The point is that when the fsync request queue fills up, backends start performing an fsync *for every block they write*, and that's

Re: [HACKERS] Spread checkpoint sync

2011-01-17 Thread Robert Haas
On Mon, Jan 17, 2011 at 6:07 PM, Jim Nasby j...@nasby.net wrote: On Jan 15, 2011, at 8:15 AM, Robert Haas wrote: Well, the point of this is not to save time in the bgwriter - I'm not surprised to hear that wasn't noticeable.  The point is that when the fsync request queue fills up, backends

Re: [HACKERS] Spread checkpoint sync

2011-01-17 Thread Greg Smith
Jim Nasby wrote: Wow, that's the kind of thing that would be incredibly difficult to figure out, especially while your production system is in flames... Can we change ereport that happens in that case from DEBUG1 to WARNING? Or provide some other means to track it That's why we already

Re: [HACKERS] Spread checkpoint sync

2011-01-17 Thread Greg Smith
Bruce Momjian wrote: Should we be writing until 2:30 then sleep 30 seconds and fsync at 3:00? The idea of having a dead period doing no work at all between write phase and sync phase may have some merit. I don't have enough test data yet on some more fundamental issues in this area to

Re: [HACKERS] Spread checkpoint sync

2011-01-16 Thread Jeff Janes
On Tue, Jan 11, 2011 at 5:27 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith g...@2ndquadrant.com wrote: One of the ideas Simon and I had been considering at one point was adding some better de-duplication logic to the fsync absorb code, which I'm

Re: [HACKERS] Spread checkpoint sync

2011-01-16 Thread Robert Haas
On Sun, Jan 16, 2011 at 7:32 PM, Jeff Janes jeff.ja...@gmail.com wrote: But since you already wrote a patch to do the whole thing, I figured I'd time it. Thanks! I arranged to test an instrumented version of your patch under large shared_buffers of 4GB, conditions that would maximize the

Re: [HACKERS] Spread checkpoint sync

2011-01-16 Thread Greg Smith
I have finished a first run of benchmarking the current 9.1 code at various sizes. See http://www.2ndquadrant.us/pgbench-results/index.htm for many details. The interesting stuff is in Test Set 3, near the bottom. That's the first one that includes buffer_backend_fsync data. This iall on

Re: [HACKERS] Spread checkpoint sync

2011-01-16 Thread Robert Haas
On Sun, Jan 16, 2011 at 10:13 PM, Greg Smith g...@2ndquadrant.com wrote: I have finished a first run of benchmarking the current 9.1 code at various sizes.  See http://www.2ndquadrant.us/pgbench-results/index.htm for many details.  The interesting stuff is in Test Set 3, near the bottom.  

Re: [HACKERS] Spread checkpoint sync

2011-01-15 Thread Greg Smith
Robert Haas wrote: On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith g...@2ndquadrant.com wrote: One of the ideas Simon and I had been considering at one point was adding some better de-duplication logic to the fsync absorb code, which I'm reminded by the pattern here might be helpful

Re: [HACKERS] Spread checkpoint sync

2011-01-15 Thread Robert Haas
On Sat, Jan 15, 2011 at 5:47 AM, Greg Smith g...@2ndquadrant.com wrote: No toe damage, this is great, I hadn't gotten to coding for this angle yet at all.  Suffering from an overload of ideas and (mostly wasted) test data, so thanks for exploring this concept and proving it works. Yeah -

Re: [HACKERS] Spread checkpoint sync

2011-01-15 Thread Simon Riggs
On Sat, 2011-01-15 at 05:47 -0500, Greg Smith wrote: Robert Haas wrote: On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith g...@2ndquadrant.com wrote: One of the ideas Simon and I had been considering at one point was adding some better de-duplication logic to the fsync absorb code, which

Re: [HACKERS] Spread checkpoint sync

2011-01-15 Thread Robert Haas
On Sat, Jan 15, 2011 at 8:55 AM, Simon Riggs si...@2ndquadrant.com wrote: On Sat, 2011-01-15 at 05:47 -0500, Greg Smith wrote: Robert Haas wrote: On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith g...@2ndquadrant.com wrote: One of the ideas Simon and I had been considering at one point was

Re: [HACKERS] Spread checkpoint sync

2011-01-15 Thread Greg Smith
Robert Haas wrote: Idea #2: At the beginning of a checkpoint when we scan all the buffers, count the number of buffers that need to be synced for each relation. Use the same hashtable that we use for tracking pending fsync requests. Then, interleave the writes and the fsyncs... Idea #3: Stick

Re: [HACKERS] Spread checkpoint sync

2011-01-15 Thread Robert Haas
On Sat, Jan 15, 2011 at 9:25 AM, Greg Smith g...@2ndquadrant.com wrote: Once upon a time we got a patch from Itagaki Takahiro whose purpose was to sort writes before sending them out: http://archives.postgresql.org/pgsql-hackers/2007-06/msg00541.php Ah, a fine idea! Which has very low odds

Re: [HACKERS] Spread checkpoint sync

2011-01-15 Thread Greg Smith
Robert Haas wrote: I'll believe it when I see it. How about this: a 1 a 2 sync a b 1 b 2 sync b c 1 c 2 sync c Or maybe some variant, where we become willing to fsync a file a certain number of seconds after writing the last block, or when all the writes are done, whichever comes first.

Re: [HACKERS] Spread checkpoint sync

2011-01-15 Thread Simon Riggs
On Sat, 2011-01-15 at 09:15 -0500, Robert Haas wrote: On Sat, Jan 15, 2011 at 8:55 AM, Simon Riggs si...@2ndquadrant.com wrote: On Sat, 2011-01-15 at 05:47 -0500, Greg Smith wrote: Robert Haas wrote: On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith g...@2ndquadrant.com wrote: One of the

Re: [HACKERS] Spread checkpoint sync

2011-01-15 Thread Robert Haas
On Sat, Jan 15, 2011 at 10:31 AM, Greg Smith g...@2ndquadrant.com wrote: That's going to give worse performance than the current code in some cases. OK. How does the checkpoint target give you any time to sync them?  Unless you squeeze the writes together more tightly, but that seems sketchy.

Re: [HACKERS] Spread checkpoint sync

2011-01-15 Thread Greg Smith
Robert Haas wrote: That seems like a bad idea - don't we routinely recommend that people crank this up to 0.9? You'd be effectively bounding the upper range of this setting to a value to the less than the lowest value we recommend anyone use today. I was just giving an example of how I

Re: [HACKERS] Spread checkpoint sync

2011-01-15 Thread Marti Raudsepp
On Sat, Jan 15, 2011 at 14:05, Robert Haas robertmh...@gmail.com wrote: Idea #4: For ext3 filesystems that like to dump the entire buffer cache instead of only the requested file, write a little daemon that runs alongside of (and completely indepdently of) PostgreSQL.  Every 30 s, it opens a

Re: [HACKERS] Spread checkpoint sync

2011-01-15 Thread Robert Haas
On Sat, Jan 15, 2011 at 5:57 PM, Greg Smith g...@2ndquadrant.com wrote: I was just giving an example of how I might do an initial split.  There's a checkpoint happening now at time T; we have a rough idea that it needs to be finished before some upcoming time T+D.  Currently with default

Re: [HACKERS] Spread checkpoint sync

2011-01-15 Thread Greg Smith
Robert Haas wrote: What is the basis for thinking that the sync should get the same amount of time as the writes? That seems pretty arbitrary. Right now, you're allowing 3 seconds per fsync, which could be a lot more or a lot less than 40% of the total checkpoint time... Just that it's where

Re: [HACKERS] Spread checkpoint sync

2011-01-11 Thread Robert Haas
On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith g...@2ndquadrant.com wrote: Having the pg_stat_bgwriter.buffers_backend_fsync patch available all the time now has made me reconsider how important one potential bit of refactoring here would be.  I managed to catch one of the situations where really

Re: [HACKERS] Spread checkpoint sync

2010-12-08 Thread Simon Riggs
On Mon, 2010-12-06 at 23:26 -0300, Alvaro Herrera wrote: Why would multiple bgwriter processes worry you? Because it complicates the tracking of files requiring fsync. As Greg says, the last attempt to do that was a lot of code. -- Simon Riggs http://www.2ndQuadrant.com/books/

Re: [HACKERS] Spread checkpoint sync

2010-12-07 Thread Greg Smith
Alvaro Herrera wrote: Why would multiple bgwriter processes worry you? Of course, it wouldn't work to have multiple processes trying to execute a checkpoint simultaneously, but what if we separated the tasks so that one process is in charge of checkpoints, and another one is in charge of the

Re: [HACKERS] Spread checkpoint sync

2010-12-06 Thread Alvaro Herrera
Excerpts from Greg Smith's message of dom dic 05 20:02:48 -0300 2010: When ends up happening if you push toward fully sync I/O is the design you see in some other databases, where you need multiple writer processes. Then requests for new pages can continue to allocate as needed, while

Re: [HACKERS] Spread checkpoint sync

2010-12-05 Thread Greg Smith
Heikki Linnakangas wrote: If you fsync() a file with one dirty page in it, it's going to return very quickly, but a 1GB file will take a while. That could be problematic if you have a thousand small files and a couple of big ones, as you would want to reserve more time for the big ones. I'm

Re: [HACKERS] Spread checkpoint sync

2010-12-05 Thread Rob Wultsch
On Sun, Dec 5, 2010 at 2:53 PM, Greg Smith g...@2ndquadrant.com wrote: Heikki Linnakangas wrote: If you fsync() a file with one dirty page in it, it's going to return very quickly, but a 1GB file will take a while. That could be problematic if you have a thousand small files and a couple of

Re: [HACKERS] Spread checkpoint sync

2010-12-05 Thread Greg Smith
Rob Wultsch wrote: Forgive me, but is all of this a step on the slippery slope to direct io? And is this a bad thing I don't really think so. There's an important difference in my head between direct I/O, where the kernel is told write this immediately!, and what I'm trying to achive. I

Re: [HACKERS] Spread checkpoint sync

2010-12-04 Thread Greg Smith
Greg Stark wrote: Using sync_file_range you can specify the set of blocks to sync and then block on them only after some time has passed. But there's no documentation on how this relates to the I/O scheduler so it's not clear it would have any effect on the problem. I believe this is the

Re: [HACKERS] Spread checkpoint sync

2010-12-02 Thread Greg Stark
On Wed, Dec 1, 2010 at 4:25 AM, Greg Smith g...@2ndquadrant.com wrote: I ask because I don't have a mental model of how the pause can help. Given that this dirty data has been hanging around for many minutes already, what is a 3 second pause going to heal? The difference is that once an

Re: [HACKERS] Spread checkpoint sync

2010-12-02 Thread Josh Berkus
Using sync_file_range you can specify the set of blocks to sync and then block on them only after some time has passed. But there's no documentation on how this relates to the I/O scheduler so it's not clear it would have any effect on the problem. We might still have to delay the begining

Re: [HACKERS] Spread checkpoint sync

2010-12-02 Thread Robert Haas
On Thu, Dec 2, 2010 at 2:24 PM, Greg Stark gsst...@mit.edu wrote: On Wed, Dec 1, 2010 at 4:25 AM, Greg Smith g...@2ndquadrant.com wrote: I ask because I don't have a mental model of how the pause can help. Given that this dirty data has been hanging around for many minutes already, what is a 3

Re: [HACKERS] Spread checkpoint sync

2010-12-01 Thread Heikki Linnakangas
On 01.12.2010 06:25, Greg Smith wrote: Jeff Janes wrote: I ask because I don't have a mental model of how the pause can help. Given that this dirty data has been hanging around for many minutes already, what is a 3 second pause going to heal? The difference is that once an fsync call is made,

Re: [HACKERS] Spread checkpoint sync

2010-12-01 Thread Greg Smith
Heikki Linnakangas wrote: Do you have any idea how to autotune the delay between fsyncs? I'm thinking to start by counting the number of relations that need them at the beginning of the checkpoint. Then use the same basic math that drives the spread writes, where you assess whether you're

Re: [HACKERS] Spread checkpoint sync

2010-12-01 Thread Heikki Linnakangas
On 01.12.2010 23:30, Greg Smith wrote: Heikki Linnakangas wrote: Do you have any idea how to autotune the delay between fsyncs? I'm thinking to start by counting the number of relations that need them at the beginning of the checkpoint. Then use the same basic math that drives the spread

Re: [HACKERS] Spread checkpoint sync

2010-11-30 Thread Greg Smith
Ron Mayer wrote: Might smoother checkpoints be better solved by talking to the OS vendors virtual-memory-tunning-knob-authors to work with them on exposing the ideal knobs; rather than saying that our only tool is a hammer(fsync) so the problem must be handled as a nail. Maybe, but it's

Re: [HACKERS] Spread checkpoint sync

2010-11-30 Thread Josh Berkus
Maybe, but it's hard to argue that the current implementation--just doing all of the sync calls as fast as possible, one after the other--is going to produce worst-case behavior in a lot of situations. Given that it's not a huge amount of code to do better, I'd rather do some work in that

Re: [HACKERS] Spread checkpoint sync

2010-11-30 Thread Jeff Janes
On Sun, Nov 14, 2010 at 3:48 PM, Greg Smith g...@2ndquadrant.com wrote: ... One change that turned out be necessary rather than optional--to get good performance from the system under tuning--was to make regular background writer activity, including fsync absorb checks, happen during these

Re: [HACKERS] Spread checkpoint sync

2010-11-30 Thread Greg Smith
Jeff Janes wrote: Have you tested out this absorb during syncing phase code without the sleep between the syncs? I.e. so that it still a tight loop, but the loop alternates between sync and absorb, with no intentional pause? Yes; that's how it was developed. It helped to have just the

Re: [HACKERS] Spread checkpoint sync

2010-11-26 Thread Ron Mayer
Josh Berkus wrote: On 11/20/10 6:11 PM, Jeff Janes wrote: True, but I think that changing these from their defaults is not considered to be a dark art reserved for kernel hackers, i.e they are something that sysadmins are expected to tweak to suite their work load, just like the shmmax and

Re: [HACKERS] Spread checkpoint sync

2010-11-23 Thread Cédric Villemain
2010/11/21 Andres Freund and...@anarazel.de: On Sunday 21 November 2010 23:19:30 Martijn van Oosterhout wrote: For a similar problem we had (kernel buffering too much) we had success using the fadvise and madvise WONTNEED syscalls to force the data to exit the cache much sooner than it would

Re: [HACKERS] Spread checkpoint sync

2010-11-21 Thread Greg Smith
Jeff Janes wrote: And for very large memory systems, even 1% may be too much to cache (dirty*_ratio can only be set in integer percent points), so recent kernels introduced dirty*_bytes parameters. I like these better because they do what they say. With the dirty*_ratio, I could never figure

Re: [HACKERS] Spread checkpoint sync

2010-11-21 Thread Greg Smith
Robert Haas wrote: Doing all the writes and then all the fsyncs meets this requirement trivially, but I'm not so sure that's a good idea. For example, given files F1 ... Fn with dirty pages needing checkpoint writes, we could do the following: first, do any pending fsyncs for files not among F1

Re: [HACKERS] Spread checkpoint sync

2010-11-21 Thread Martijn van Oosterhout
On Sun, Nov 21, 2010 at 04:54:00PM -0500, Greg Smith wrote: Ultimately what I want to do here is some sort of smarter write-behind sync operation, perhaps with a LRU on relations with pending fsync requests. The idea would be to sync relations that haven't been touched in a while in

Re: [HACKERS] Spread checkpoint sync

2010-11-21 Thread Andres Freund
On Sunday 21 November 2010 23:19:30 Martijn van Oosterhout wrote: For a similar problem we had (kernel buffering too much) we had success using the fadvise and madvise WONTNEED syscalls to force the data to exit the cache much sooner than it would otherwise. This was on Linux and it had the

Re: [HACKERS] Spread checkpoint sync

2010-11-21 Thread Josh Berkus
On 11/20/10 6:11 PM, Jeff Janes wrote: True, but I think that changing these from their defaults is not considered to be a dark art reserved for kernel hackers, i.e they are something that sysadmins are expected to tweak to suite their work load, just like the shmmax and such. I disagree.

Re: [HACKERS] Spread checkpoint sync

2010-11-21 Thread Robert Haas
On Sun, Nov 21, 2010 at 4:54 PM, Greg Smith g...@2ndquadrant.com wrote: Let me throw some numbers out [...] Interesting. Ultimately what I want to do here is some sort of smarter write-behind sync operation, perhaps with a LRU on relations with pending fsync requests.  The idea would be to

Re: [HACKERS] Spread checkpoint sync

2010-11-20 Thread Jeff Janes
On Mon, Nov 15, 2010 at 6:15 PM, Robert Haas robertmh...@gmail.com wrote: On Sun, Nov 14, 2010 at 6:48 PM, Greg Smith g...@2ndquadrant.com wrote: The second issue is that the delay between sync calls is currently hard-coded, at 3 seconds.  I believe the right path here is to consider the

Re: [HACKERS] Spread checkpoint sync

2010-11-20 Thread Robert Haas
On Sat, Nov 20, 2010 at 6:21 PM, Jeff Janes jeff.ja...@gmail.com wrote:  The thing to realize that complicates the design is that the actual sync execution may take a considerable period of time.  It's much more likely for that to happen than in the case of an individual write, as the current

Re: [HACKERS] Spread checkpoint sync

2010-11-20 Thread Jeff Janes
On Sat, Nov 20, 2010 at 5:17 PM, Robert Haas robertmh...@gmail.com wrote: On Sat, Nov 20, 2010 at 6:21 PM, Jeff Janes jeff.ja...@gmail.com wrote: Doing all the writes and then all the fsyncs meets this requirement trivially, but I'm not so sure that's a good idea.  For example, given files F1

Re: [HACKERS] Spread checkpoint sync

2010-11-15 Thread Robert Haas
On Sun, Nov 14, 2010 at 6:48 PM, Greg Smith g...@2ndquadrant.com wrote: The second issue is that the delay between sync calls is currently hard-coded, at 3 seconds.  I believe the right path here is to consider the current checkpoint_completion_target to still be valid, then work back from

[HACKERS] Spread checkpoint sync

2010-11-14 Thread Greg Smith
Final patch in this series for today spreads out the individual checkpoint fsync calls over time, and was written by myself and Simon Riggs. Patch is based against a system that's already had the two patches I sent over earlier today applied, rather than HEAD, as both are useful for measuring