Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Alvaro Herrera
Gregory Stark wrote: > I can imagine a scenario where you have a system that's very busy for 60s and > then idle for 60s repeatedly. And for some reason you configure a > checkpoint_timeout on the order of 20m or so (assuming you're travelling > precisely 60mph). Is that Scottish m? -- Alvaro

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Gregory Stark
"Greg Smith" <[EMAIL PROTECTED]> writes: > If you write them twice, so what? You didn't even get to that point as an > option until all the important stuff was taken care of and the system was > near idle. Well even if it's near idle you were still occupying the i/o system for a few milliseconds.

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Greg Smith
On Tue, 26 Jun 2007, Tom Lane wrote: I'm not impressed with the idea of writing buffers because we might need them someday; that just costs extra I/O due to re-dirtying in too many scenarios. This is kind of an interesting statement to me because it really highlights the difference in how I

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Greg Smith
On Tue, 26 Jun 2007, Tom Lane wrote: I have no doubt that there are scenarios such as you are thinking about, but it definitely seems like a corner case that doesn't justify keeping the all-buffers scan. That scan is costing us extra I/O in ordinary non-corner cases, so it's not free to keep it

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > To recap, the sequence is: > 1. COPY FROM > 2. checkpoint > 3. VACUUM > Now you have buffer cache full of dirty buffers with usage_count=1, Well, it won't be very full, because VACUUM works in a limited number of buffers (and did even before the B

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Greg Smith
On Mon, 25 Jun 2007, Tom Lane wrote: right now, BgBufferSync starts over from the current clock-sweep point on each call --- that is, each bgwriter cycle. So it can't really be made to write very many buffers without excessive CPU work. Maybe we should redefine it to have some static state c

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Heikki Linnakangas
Tom Lane wrote: Heikki Linnakangas <[EMAIL PROTECTED]> writes: Tom Lane wrote: Who's "we"? AFAICS, CVS HEAD will treat a large copy the same as any other large heapscan. Umm, I'm talking about populating a table with COPY *FROM*. That's not a heap scan at all. No wonder we're failing to c

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> Who's "we"? AFAICS, CVS HEAD will treat a large copy the same as any >> other large heapscan. > Umm, I'm talking about populating a table with COPY *FROM*. That's not a > heap scan at all. No wonder we're failing to communicate

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Heikki Linnakangas
Tom Lane wrote: Heikki Linnakangas <[EMAIL PROTECTED]> writes: Tom Lane wrote: (Note that COPY per se will not trigger this behavior anyway, since it will act in a limited number of buffers because of the recent buffer access strategy patch.) Actually we dropped it from COPY, because it didn'

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> (Note that COPY per se will not trigger this behavior anyway, since it >> will act in a limited number of buffers because of the recent buffer >> access strategy patch.) > Actually we dropped it from COPY, because it didn't seem t

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Heikki Linnakangas
Tom Lane wrote: (Note that COPY per se will not trigger this behavior anyway, since it will act in a limited number of buffers because of the recent buffer access strategy patch.) Actually we dropped it from COPY, because it didn't seem to improve performance in the tests we ran. -- Heikki

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> This argument supposes that the bgwriter will do nothing while the COPY >> is proceeding. > It will clean buffers ahead of the COPY, but it won't write the buffers > COPY leaves behind since they have usage_count=1. Yeah, and th

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Heikki Linnakangas
Tom Lane wrote: Heikki Linnakangas <[EMAIL PROTECTED]> writes: One pathological case is a COPY of a table slightly smaller than shared_buffers. That will fill the buffer cache. If you then have a checkpoint, and after that a SELECT COUNT(*), or a VACUUM, the buffer cache will be full of pages

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> ... that's what the LRU scan is for. > Yeah, except the LRU scan is not doing a very good job at that. It will > ignore buffers with usage_count > 0, and it only scans > bgwriter_lru_percent buffers ahead of the clock hand. Whi

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Heikki Linnakangas
Tom Lane wrote: Anyway, if there are no XLOG records since the last checkpoint, there's probably nothing in shared buffers that needs flushing. There might be some dirty hint-bits, but the only reason to push those out is to make some free buffers available, and doing that is not checkpoint's jo

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> Hmm. But if we're going to do that, we might as well have a checkpoint >> for our troubles, no? The reason for the current design is the >> assumption that a bgwriter_all scan is less burdensome than a >> checkpoint, but that is

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Heikki Linnakangas
Tom Lane wrote: Greg Smith <[EMAIL PROTECTED]> writes: The way transitions between completely idle and all-out bursts happen were one problematic area I struggled with. Since the LRU point doesn't move during the idle parts, and the lingering buffers have a usage_count>0, the LRU scan won't t

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-26 Thread Heikki Linnakangas
Tom Lane wrote: Hmm. But if we're going to do that, we might as well have a checkpoint for our troubles, no? The reason for the current design is the assumption that a bgwriter_all scan is less burdensome than a checkpoint, but that is no longer true given this rewrite. Per comments in Create

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-25 Thread Tom Lane
Greg Smith <[EMAIL PROTECTED]> writes: > The way transitions between completely idle and all-out bursts happen were > one problematic area I struggled with. Since the LRU point doesn't move > during the idle parts, and the lingering buffers have a usage_count>0, the > LRU scan won't touch them;

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-25 Thread Greg Smith
On Mon, 25 Jun 2007, Heikki Linnakangas wrote: Please describe the class of transactions and the service guarantees so that we can reproduce that, and figure out what's the best solution. I'm confident you're already moving in that direction by noticing how the 90th percentile numbers were ki

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-25 Thread Greg Smith
On Mon, 25 Jun 2007, Heikki Linnakangas wrote: It only scans bgwriter_lru_percent buffers ahead of the clock hand. If the hand isn't moving, it keeps scanning the same buffers over and over again. You can crank it all the way up to 100%, though, in which case it would work, but that starts to

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-25 Thread Greg Smith
On Mon, 25 Jun 2007, Heikki Linnakangas wrote: Greg, is this the kind of workload you're having, or is there some other scenario you're worried about? The way transitions between completely idle and all-out bursts happen were one problematic area I struggled with. Since the LRU point doesn't

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-25 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> Heikki Linnakangas <[EMAIL PROTECTED]> writes: >>> If you have a system with a very bursty transaction rate, it's possible >>> that when it's time for a checkpoint, there hasn't been any WAL logged >>> activity since last checkpo

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-25 Thread Heikki Linnakangas
Tom Lane wrote: Heikki Linnakangas <[EMAIL PROTECTED]> writes: On further thought, there is one workload where removing the non-LRU part would be counterproductive: If you have a system with a very bursty transaction rate, it's possible that when it's time for a checkpoint, there hasn't been

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-25 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > On further thought, there is one workload where removing the non-LRU > part would be counterproductive: > If you have a system with a very bursty transaction rate, it's possible > that when it's time for a checkpoint, there hasn't been any WAL log

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-25 Thread Heikki Linnakangas
Tom Lane wrote: I agree with removing the non-LRU part of the bgwriter's write logic though; that should simplify matters a bit and cut down the overall I/O volume. On further thought, there is one workload where removing the non-LRU part would be counterproductive: If you have a system with

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-25 Thread Simon Riggs
On Mon, 2007-06-25 at 12:56 +0200, Magnus Hagander wrote: > Didn't we already add other featuers that makes recovery much *faster* than > before? In that case, are they faster enugh to neutralise this increased > time (a guestimate, of course) > > Or did I mess that up with stuff we added for 8.2

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-25 Thread Heikki Linnakangas
Magnus Hagander wrote: On Mon, Jun 25, 2007 at 10:15:07AM +0100, Simon Riggs wrote: As you say, we can put comments in the release notes to advise people of 50% increase in recovery time if the parameters stay the same. That would be balanced by the comment that checkpoints are now considerably

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-25 Thread Magnus Hagander
On Mon, Jun 25, 2007 at 10:15:07AM +0100, Simon Riggs wrote: > On Mon, 2007-06-25 at 01:33 -0400, Greg Smith wrote: > > On Sun, 24 Jun 2007, Simon Riggs wrote: > > > > > Greg can't choose to use checkpoint_segments as the limit and then > > > complain about unbounded recovery time, because that w

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-25 Thread Simon Riggs
On Mon, 2007-06-25 at 01:33 -0400, Greg Smith wrote: > On Sun, 24 Jun 2007, Simon Riggs wrote: > > > Greg can't choose to use checkpoint_segments as the limit and then > > complain about unbounded recovery time, because that was clearly a > > conscious choice. > > I'm complaining I apologise

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-25 Thread Heikki Linnakangas
Greg Smith wrote: LDC certainly makes things better in almost every case. My "allegiance" comes from having seen a class of transactions where LDC made things worse on a fast/overloaded system, in that it made some types of service guarantees harder to meet, and I just don't know who else migh

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-24 Thread Greg Smith
On Mon, 25 Jun 2007, Tom Lane wrote: I'm not sure why you hold such strong allegiance to the status quo. We know that the status quo isn't working very well. Don't get me wrong here; I am a big fan of this patch, think it's an important step forward, and it's exactly the fact that I'm so she

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-24 Thread Greg Smith
On Sun, 24 Jun 2007, Simon Riggs wrote: Greg can't choose to use checkpoint_segments as the limit and then complain about unbounded recovery time, because that was clearly a conscious choice. I'm complaining only because everyone seems content to wander in a direction where the multiplier on

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-24 Thread Tom Lane
Greg Smith <[EMAIL PROTECTED]> writes: > I am not a fan of introducing a replacement feature based on what I > consider too limited testing, and I don't feel this one has been beat on > long yet enough to start pruning features that would allow better backward > compatibility/transitioning. I t

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-24 Thread Greg Smith
On Sun, 24 Jun 2007, Simon Riggs wrote: I can't see why anyone would want to turn off smoothing: If they are doing many writes, then they will be effected by the sharp dive at checkpoint, which happens *every* checkpoint. There are service-level agreement situations where a short and sharp di

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-24 Thread Simon Riggs
On Fri, 2007-06-22 at 16:57 -0400, Greg Smith wrote: > If you're not, I think you should be. Keeping that replay interval > time down was one of the reasons why the people I was working with > were displeased with the implications of the very spread out style of > some LDC tunings. They were alre

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-24 Thread Simon Riggs
On Fri, 2007-06-22 at 16:21 -0400, Tom Lane wrote: > Heikki Linnakangas <[EMAIL PROTECTED]> writes: > > 3. Recovery will take longer, because the distance last committed redo > > ptr will lag behind more. > > True, you'd have to replay 1.5 checkpoint intervals on average instead > of 0.5 (more o

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-24 Thread Heikki Linnakangas
Simon Riggs wrote: On Fri, 2007-06-22 at 22:19 +0100, Heikki Linnakangas wrote: However, I think shortening the checkpoint interval is a perfectly valid solution to that. Agreed. That's what checkpoint_timeout is for. Greg can't choose to use checkpoint_segments as the limit and then complai

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-24 Thread Simon Riggs
On Fri, 2007-06-22 at 22:19 +0100, Heikki Linnakangas wrote: > However, I think shortening the checkpoint interval is a perfectly valid > solution to that. Agreed. That's what checkpoint_timeout is for. Greg can't choose to use checkpoint_segments as the limit and then complain about unbounded

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-23 Thread Greg Smith
This message is going to come off as kind of angry, and I hope you don't take that personally. I'm very frustrated with this whole area right now but am unable to do anything to improve that situation. On Fri, 22 Jun 2007, Tom Lane wrote: If you've got specific evidence why any of these thing

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-22 Thread Heikki Linnakangas
Greg Smith wrote: On Fri, 22 Jun 2007, Tom Lane wrote: Greg had worried about being able to turn this behavior off, so we'd still need at least a bool, and we might as well expose the fraction instead. I agree with removing the non-LRU part of the bgwriter's write logic though If you accep

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-22 Thread Heikki Linnakangas
Greg Smith wrote: True, you'd have to replay 1.5 checkpoint intervals on average instead of 0.5 (more or less, assuming checkpoints had been short). I don't think we're in the business of optimizing crash recovery time though. If you're not, I think you should be. Keeping that replay interva

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-22 Thread Greg Smith
On Fri, 22 Jun 2007, Tom Lane wrote: Greg had worried about being able to turn this behavior off, so we'd still need at least a bool, and we might as well expose the fraction instead. I agree with removing the non-LRU part of the bgwriter's write logic though If you accept that being able t

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-22 Thread Tom Lane
Greg Smith <[EMAIL PROTECTED]> writes: > As the person who was complaining about corner cases I'm not in a position > to talk more explicitly about, I can at least summarize my opinion of how > I feel everyone should be thinking about this patch and you can take what > you want from that. Sorry

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-22 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > Ok, if we approach this from the idea that there will be *no* GUC > variables at all to control this, and we remove the bgwriter_all_* > settings as well, does anyone see a reason why that would be bad? Here's > the ones mentioned this far: > 1.

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-22 Thread Greg Smith
On Fri, 22 Jun 2007, Tom Lane wrote: Yeah, I'm not sure that we've thought through the interactions with the existing bgwriter behavior. The entire background writer mess needs a rewrite, and the best way to handle that is going to shift considerably with LDC applied. As the person who was

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-22 Thread Heikki Linnakangas
Tom Lane wrote: Maybe I misread the patch, but I thought that if someone requested an immediate checkpoint, the checkpoint-in-progress would effectively flip to immediate mode. So that could be handled by offering an immediate vs extended checkpoint option in pg_start_backup. I'm not sure it's

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-22 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> I still think you've not demonstrated a need to expose this parameter. > Greg Smith wanted to explicitly control the I/O rate, and let the > checkpoint duration vary. I personally think that fixing the checkpoint > duration is b

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-22 Thread Heikki Linnakangas
Tom Lane wrote: Heikki Linnakangas <[EMAIL PROTECTED]> writes: Tom Lane wrote: (BTW, the patch seems a bit schizoid about whether checkpoint_rate is int or float.) Yeah, I've gone back and forth on the data type. I wanted it to be a float, but guc code doesn't let you specify a float in KB,

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-22 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> (BTW, the patch seems >> a bit schizoid about whether checkpoint_rate is int or float.) > Yeah, I've gone back and forth on the data type. I wanted it to be a > float, but guc code doesn't let you specify a float in KB, so I swit

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-22 Thread Bruce Momjian
Tom Lane wrote: > And checkpoint_rate really needs to be named checkpoint_min_rate, if > it's going to be a minimum. However, I question whether we need it at > all, because as the code stands, with the default BgWriterDelay you > would have to increase checkpoint_rate to 4x its proposed default b

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-22 Thread Heikki Linnakangas
Tom Lane wrote: 1. checkpoint_rate is used thusly: writes_per_nap = Min(1, checkpoint_rate / BgWriterDelay); where writes_per_nap is the max number of dirty blocks to write before taking a bgwriter nap. Now surely this is completely backward: if BgWriterDelay is increased, the number o

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-21 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> So the question is, why in the heck would anyone want the behavior that >> "checkpoints take exactly X time"?? > Because it's easier to tune. You don't need to know how much checkpoint > I/O you can tolerate. The system will use

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-21 Thread Heikki Linnakangas
Tom Lane wrote: Heikki Linnakangas <[EMAIL PROTECTED]> writes: The main tuning knob is checkpoint_smoothing, which is defined as a fraction of the checkpoint interval (both checkpoint_timeout and checkpoint_segments are taken into account). Normally, the write phase of a checkpoint takes exact

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-21 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > I don't think you understand how the settings work. Did you read the > documentation? If you did, it's apparently not adequate. I did read the documentation, and I'm not complaining that I don't understand it. I'm complaining that I don't like the

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-21 Thread Heikki Linnakangas
Tom Lane wrote: Heikki Linnakangas <[EMAIL PROTECTED]> writes: Tom Lane wrote: I tend to agree with whoever said upthread that the combination of GUC variables proposed here is confusing and ugly. It'd make more sense to have min and max checkpoint rates in KB/s, with the max checkpoint rate o

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-21 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> I tend to agree with whoever said upthread that the combination of GUC >> variables proposed here is confusing and ugly. It'd make more sense to >> have min and max checkpoint rates in KB/s, with the max checkpoint rate >> only ho

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-21 Thread Heikki Linnakangas
Tom Lane wrote: Heikki Linnakangas <[EMAIL PROTECTED]> writes: In fact, I think there's a small race condition in CVS HEAD: Yeah, probably --- the original no-locking design didn't have any side flags. The reason you need the lock is for a backend to be sure that a newly-started checkpoint is

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-21 Thread Heikki Linnakangas
Tom Lane wrote: Heikki Linnakangas <[EMAIL PROTECTED]> writes: I added a spinlock to protect the signaling fields between bgwriter and backends. The current non-locking approach gets really difficult as the patch adds two new flags, and both are more important than the existing ckpt_time_warn

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-21 Thread Heikki Linnakangas
ITAGAKI Takahiro wrote: The only thing I don't understand is the naming of 'checkpoint_smoothing'. Can users imagine the unit of 'smoothing' is a fraction? You explain the paremeter with the word 'fraction'. Why don't you simply name it 'checkpoint_fraction' ? | Specifies the target length of ch

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-20 Thread ITAGAKI Takahiro
Heikki Linnakangas <[EMAIL PROTECTED]> wrote: > Here's an updated WIP patch for load distributed checkpoints. > Since last patch, I did some clean up and refactoring, and added a bunch > of comments, and user documentation. The only thing I don't understand is the naming of 'checkpoint_smoothing

Re: [PATCHES] Load Distributed Checkpoints, take 3

2007-06-20 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > I added a spinlock to protect the signaling fields between bgwriter and > backends. The current non-locking approach gets really difficult as the > patch adds two new flags, and both are more important than the existing > ckpt_time_warn flag. Tha