Re: [PATCHES] Load Distributed Checkpoints, final patch

2008-03-11 Thread Bruce Momjian

Added to TODO:

* Test to see if calling PreallocXlogFiles() from the background writer
  will help with WAL segment creation latency

  http://archives.postgresql.org/pgsql-patches/2007-06/msg00340.php


---

Tom Lane wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
  Here's latest revision of Itagaki-sans Load Distributed Checkpoints patch:
 
 Applied with some minor revisions to make some of the internal APIs a
 bit cleaner; mostly, it seemed like a good idea to replace all those
 bool parameters with a flag-bits approach, so that you could have
 something like CHECKPOINT_FORCE | CHECKPOINT_WAIT instead of
 false, true, true, false ...
 
 For the moment I removed all the debugging elog's in the patch.
 We still have Greg Smith's checkpoint logging patch to look at
 (which I suppose needs adjustment now), and that seems like the
 appropriate venue to consider what to put in.
 
 Also, the question of redesigning the bgwriter's LRU scan is
 still open.  I believe that's on Greg's plate, too.
 
 One other closely connected item that might be worth looking at is the
 code for creating new future xlog segments (PreallocXlogFiles).  Greg
 was griping upthread about xlog segment creation being a real
 performance drag.  I realized that as we currently have it set up, the
 checkpoint code is next to useless for high-WAL-volume installations,
 because it only considers making *one* future XLOG segment.  Once you've
 built up enough XLOG segments, the system isn't too bad about recycling
 them, but there will be a nasty startup transient where foreground
 processes have to stop and make the things.  I wonder whether it would
 help if we (a) have the bgwriter call PreallocXlogFiles during its
 normal loop, and (b) back the slop in PreallocXlogFiles way off, so that
 it will make a future segment as soon as we start using the last
 existing segment, instead of only when we're nearly done.  This would at
 least make it more likely that the bgwriter does the work instead of a
 foreground process.  I'm hesitant to go much further than that, because
 I don't want to bloat the minimum disk footprint for low-volume
 installations, but the minimum footprint is really 2 xlog files anyway...
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-09-26 Thread Bruce Momjian

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Tom Lane wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
  Here's latest revision of Itagaki-sans Load Distributed Checkpoints patch:
 
 Applied with some minor revisions to make some of the internal APIs a
 bit cleaner; mostly, it seemed like a good idea to replace all those
 bool parameters with a flag-bits approach, so that you could have
 something like CHECKPOINT_FORCE | CHECKPOINT_WAIT instead of
 false, true, true, false ...
 
 For the moment I removed all the debugging elog's in the patch.
 We still have Greg Smith's checkpoint logging patch to look at
 (which I suppose needs adjustment now), and that seems like the
 appropriate venue to consider what to put in.
 
 Also, the question of redesigning the bgwriter's LRU scan is
 still open.  I believe that's on Greg's plate, too.
 
 One other closely connected item that might be worth looking at is the
 code for creating new future xlog segments (PreallocXlogFiles).  Greg
 was griping upthread about xlog segment creation being a real
 performance drag.  I realized that as we currently have it set up, the
 checkpoint code is next to useless for high-WAL-volume installations,
 because it only considers making *one* future XLOG segment.  Once you've
 built up enough XLOG segments, the system isn't too bad about recycling
 them, but there will be a nasty startup transient where foreground
 processes have to stop and make the things.  I wonder whether it would
 help if we (a) have the bgwriter call PreallocXlogFiles during its
 normal loop, and (b) back the slop in PreallocXlogFiles way off, so that
 it will make a future segment as soon as we start using the last
 existing segment, instead of only when we're nearly done.  This would at
 least make it more likely that the bgwriter does the work instead of a
 foreground process.  I'm hesitant to go much further than that, because
 I don't want to bloat the minimum disk footprint for low-volume
 installations, but the minimum footprint is really 2 xlog files anyway...
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-07-03 Thread Heikki Linnakangas

Tom Lane wrote:

Bruce Momjian [EMAIL PROTECTED] writes:

Heikki Linnakangas wrote:
For comparison, imola-328 has full_page_writes=off. Checkpoints last ~9 
minutes there, and the graphs look very smooth. That suggests that 
spreading the writes over a longer time wouldn't make a difference, but 
smoothing the rush at the beginning of checkpoint might. I'm going to 
try the algorithm I posted, that uses the WAL consumption rate from 
previous checkpoint interval in the calculations.



One thing that concerns me is that checkpoint smoothing happening just
after the checkpoint is causing I/O at the same time that
full_page_writes is causing additional I/O.


I'm tempted to just apply some sort of nonlinear correction to the
WAL-based progress measurement.  Squaring it would be cheap but is
probably too extreme.  Carrying over info from the previous cycle
doesn't seem like it would help much; rather, the point is exactly
that we *don't* want a constant write speed during the checkpoint.


While thinking about this, I made an observation on full_page_writes. 
Currently, we perform a full page write whenever LSN  RedoRecPtr. If 
we're clever, we can skip or defer some of the full page writes:


The rule is that when we replay, we need to always replay a full page 
image before we apply any regular WAL records on the page. When we begin 
a checkpoint, there's two possible outcomes: we crash before the new 
checkpoint is finished, and we replay starting from the previous redo 
ptr, or we finish the checkpoint successfully, and we replay starting 
from the new redo ptr (or we don't crash and don't need to recover).


To be able to recover from the previous redo ptr, we don't need to write 
a full page image if we have already written one since the previous redo 
ptr.


To be able to recover from the new redo ptr, we don't need to write a 
full page image, if we haven't flushed the page yet. It will be written 
and fsync'd by the time the checkpoint finishes.


IOW, we can skip full page images of pages that we have already taken a 
full page image of since previous checkpoint, and we haven't flushed yet 
during the current checkpoint.


This might reduce the overall WAL I/O a little bit, but more 
importantly, it spreads the impact of taking full page images over the 
checkpoint duration. That's a good thing on its own, but it also makes 
it unnecessary to compensate for the full_page_writes rush in the 
checkpoint smoothing.


I'm still trying to get my head around the bookkeeping required to get 
that right; I think it's possible using the new BM_CHECKPOINT_NEEDED 
flag and a new flag in the page header to mark pages that we've skipped 
taking the full page image when it was last modified.


For 8.3, we should probably just do some simple compensation in the 
checkpoint throttling code, if we want to do anything at all. But this 
is something to think about in the future.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-07-03 Thread Gregory Stark

Heikki Linnakangas [EMAIL PROTECTED] writes:

 For 8.3, we should probably just do some simple compensation in the checkpoint
 throttling code, if we want to do anything at all. But this is something to
 think about in the future.

Just as a stress test it might be interesting to run a quick tpcc test with
very short checkpoint intervals. Something like 30s. Just to make sure that
the logic is all correct and unexpected things don't start happening.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-07-03 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 While thinking about this, I made an observation on full_page_writes. 
 Currently, we perform a full page write whenever LSN  RedoRecPtr. If 
 we're clever, we can skip or defer some of the full page writes:

I'm not convinced this is safe; in particular, ISTM that a PITR slave
following the WAL log is likely to be at risk if it tries to restart
from the checkpoint you've omitted some full-page-images after.  There's
no guarantee it will have flushed pages at the same spots the master did.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-07-02 Thread Heikki Linnakangas

Heikki Linnakangas wrote:

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:
I'm scheduling more DBT-2 tests at a high # of warehouses per Greg 
Smith's suggestion just to see what happens, but I doubt that will 
change my mind on the above decisions.


When do you expect to have those results?


In a few days. I'm doing long tests because the variability in the 1h 
tests was very high.


I ran two tests with 200 warehouses to see how LDC behaves on a badly 
overloaded system, see tests imola-319 and imola-320. Seems to work 
quite well. In fact the checkpoint spike is relatively speaking less 
severe than with smaller # of warehouses even in the baseline test run, 
and LDC smooths it very nicely.


After those two tests, I noticed that I had full_page_writes=off in all 
tests performed earlier :(. That throws off the confidence in those 
results, so I ran more tests with full_page_writes on and off to compare 
the affect. I also wanted to compare the effectiveness of the patch when 
checkpoints are triggered by either checkpoint_timeout or 
checkpoint_segments.


imola-326 - imola-330 are all configured so that checkpoints happen 
roughly on a 50 minute interval. On imola-326, checkpoints are triggered 
by checkpoint_segments, and on imola-327 they're triggered by 
checkpoint_timeout. On imola-326, the write phase lasts ~7 minutes, and 
on imola-327, it lasts ~10 minutes. Because of full_page_writes, a lot 
more WAL is consumed right after starting the checkpoint, so we end up 
being more aggressive than necessary at the beginning.


For comparison, imola-328 has full_page_writes=off. Checkpoints last ~9 
minutes there, and the graphs look very smooth. That suggests that 
spreading the writes over a longer time wouldn't make a difference, but 
smoothing the rush at the beginning of checkpoint might. I'm going to 
try the algorithm I posted, that uses the WAL consumption rate from 
previous checkpoint interval in the calculations.


Imola-329 is the same as imola-328, but with updated CVS source tree 
instead of older tree + patch. The purpose of this test was basically to 
just verify that what was committed works the same as the patch.


Imola-330 is comparable with imola-327, checkpoints are triggered by 
timeout and full_page_writes=on. But 330 was patched to to call 
PreallocXlogFiles in bgwriter, per Tom's idea. According to logs, most 
WAL segments are created by bgwriter in that test, and response times 
look slightly better with the patch, though I'm not sure the difference 
is statistically significant.


As before, the results are available at 
http://community.enterprisedb.com/ldc/


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-07-02 Thread Bruce Momjian
Heikki Linnakangas wrote:
 For comparison, imola-328 has full_page_writes=off. Checkpoints last ~9 
 minutes there, and the graphs look very smooth. That suggests that 
 spreading the writes over a longer time wouldn't make a difference, but 
 smoothing the rush at the beginning of checkpoint might. I'm going to 
 try the algorithm I posted, that uses the WAL consumption rate from 
 previous checkpoint interval in the calculations.

One thing that concerns me is that checkpoint smoothing happening just
after the checkpoint is causing I/O at the same time that
full_page_writes is causing additional I/O.  Ideally we would do the
smoothing toward the end of the checkpoint cycle, but I realize that has
problems of its own.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-07-02 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Heikki Linnakangas wrote:
 For comparison, imola-328 has full_page_writes=off. Checkpoints last ~9 
 minutes there, and the graphs look very smooth. That suggests that 
 spreading the writes over a longer time wouldn't make a difference, but 
 smoothing the rush at the beginning of checkpoint might. I'm going to 
 try the algorithm I posted, that uses the WAL consumption rate from 
 previous checkpoint interval in the calculations.

 One thing that concerns me is that checkpoint smoothing happening just
 after the checkpoint is causing I/O at the same time that
 full_page_writes is causing additional I/O.

I'm tempted to just apply some sort of nonlinear correction to the
WAL-based progress measurement.  Squaring it would be cheap but is
probably too extreme.  Carrying over info from the previous cycle
doesn't seem like it would help much; rather, the point is exactly
that we *don't* want a constant write speed during the checkpoint.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-06-28 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:

Here's latest revision of Itagaki-sans Load Distributed Checkpoints patch:


Applied with some minor revisions to make some of the internal APIs a
bit cleaner; mostly, it seemed like a good idea to replace all those
bool parameters with a flag-bits approach, so that you could have
something like CHECKPOINT_FORCE | CHECKPOINT_WAIT instead of
false, true, true, false ...


Thanks.


For the moment I removed all the debugging elog's in the patch.
We still have Greg Smith's checkpoint logging patch to look at
(which I suppose needs adjustment now), and that seems like the
appropriate venue to consider what to put in.


Ok, I'll look at that next.


One other closely connected item that might be worth looking at is the
code for creating new future xlog segments (PreallocXlogFiles).  Greg
was griping upthread about xlog segment creation being a real
performance drag.  I realized that as we currently have it set up, the
checkpoint code is next to useless for high-WAL-volume installations,
because it only considers making *one* future XLOG segment.  Once you've
built up enough XLOG segments, the system isn't too bad about recycling
them, but there will be a nasty startup transient where foreground
processes have to stop and make the things.  I wonder whether it would
help if we (a) have the bgwriter call PreallocXlogFiles during its
normal loop, and (b) back the slop in PreallocXlogFiles way off, so that
it will make a future segment as soon as we start using the last
existing segment, instead of only when we're nearly done.  This would at
least make it more likely that the bgwriter does the work instead of a
foreground process.  I'm hesitant to go much further than that, because
I don't want to bloat the minimum disk footprint for low-volume
installations, but the minimum footprint is really 2 xlog files anyway...


That seems like a good idea. It might also become a problem if you have 
WAL archiving set up and the archiving falls behind so that existing log 
files are not recycled fast enough.


The comment in PreallocXlogFiles is out of date:


/*
 * Preallocate log files beyond the specified log endpoint, according to
 * the XLOGfile user parameter.
 */


As you pointed out, it only preallocates one log file. And there is no 
XLOGfile mentioned anywhere else in the source tree.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-06-28 Thread Greg Smith

On Wed, 27 Jun 2007, Tom Lane wrote:


Also, the question of redesigning the bgwriter's LRU scan is
still open.  I believe that's on Greg's plate, too.


Greg's plate was temporarily fried after his house was hit by lightening 
yesterday.  I just got everything back on-line again, so no coding 
progress, but I think I finished assimilating your epiphany during that 
time.  Now I realize that what you're suggesting is that under healthy 
low-load conditions, the LRU really should be able to keep up right behind 
the clock sweep point.  Noting how far behind it is serves as a 
measurement of it failing to match the rate buffers that could be re-used 
are being dirtied, and the only question is how fast and far it should try 
to drive the point it has cleaned to forward when that happens.


Once you've built up enough XLOG segments, the system isn't too bad 
about recycling them, but there will be a nasty startup transient where 
foreground processes have to stop and make the things.


Exactly.  I found it problematic in four situations:

1) Slow checkpoint doesn't finish in time and new segments are being 
created while the checkpoint is also busy (this is the really bad one)


2) Archive logger stop doing anything (usually because the archive disk is 
filled) and nothing gets recycled until that's fixed.


2) checkpoint_segments is changed, so then performance is really sluggish 
for a bit until all the segments are built back up again


3) You ran an early manual checkpoint which doesn't seem to recycle as 
many segments usefully


Any change that would be more proactive about creating segments in these 
situations than the current code would be a benefit, even though these are 
not common paths people encounter.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-06-28 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 The comment in PreallocXlogFiles is out of date:

Yeah, I changed it yesterday ...

 As you pointed out, it only preallocates one log file. And there is no 
 XLOGfile mentioned anywhere else in the source tree.

If memory serves, there once was a variable there, but we simplified it
out of existence for reasons no longer apparent.  Possibly it'd be worth
trolling the CVS log and archives to find out why we did that.

Anyway, what I'm thinking at the moment is that it's not so much that
PreallocXlogFiles needs to do more work as that it needs to be called
more often.  Right now we only do it once per checkpoint.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-06-27 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Here's latest revision of Itagaki-sans Load Distributed Checkpoints patch:

Applied with some minor revisions to make some of the internal APIs a
bit cleaner; mostly, it seemed like a good idea to replace all those
bool parameters with a flag-bits approach, so that you could have
something like CHECKPOINT_FORCE | CHECKPOINT_WAIT instead of
false, true, true, false ...

For the moment I removed all the debugging elog's in the patch.
We still have Greg Smith's checkpoint logging patch to look at
(which I suppose needs adjustment now), and that seems like the
appropriate venue to consider what to put in.

Also, the question of redesigning the bgwriter's LRU scan is
still open.  I believe that's on Greg's plate, too.

One other closely connected item that might be worth looking at is the
code for creating new future xlog segments (PreallocXlogFiles).  Greg
was griping upthread about xlog segment creation being a real
performance drag.  I realized that as we currently have it set up, the
checkpoint code is next to useless for high-WAL-volume installations,
because it only considers making *one* future XLOG segment.  Once you've
built up enough XLOG segments, the system isn't too bad about recycling
them, but there will be a nasty startup transient where foreground
processes have to stop and make the things.  I wonder whether it would
help if we (a) have the bgwriter call PreallocXlogFiles during its
normal loop, and (b) back the slop in PreallocXlogFiles way off, so that
it will make a future segment as soon as we start using the last
existing segment, instead of only when we're nearly done.  This would at
least make it more likely that the bgwriter does the work instead of a
foreground process.  I'm hesitant to go much further than that, because
I don't want to bloat the minimum disk footprint for low-volume
installations, but the minimum footprint is really 2 xlog files anyway...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-06-26 Thread Michael Glaesemann


On Jun 26, 2007, at 13:49 , Heikki Linnakangas wrote:

Maximum is 0.9, to leave some headroom for fsync and any other  
things that need to happen during a checkpoint.


I think it might be more user-friendly to make the maximum 1 (meaning  
as much smoothing as you could possibly get) and internally reserve a  
certain amount off for whatever headroom might be required. It's more  
common for users to see a value range from 0 to 1 rather than 0 to 0.9.


Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-06-26 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Barring any objections from committer, I'm finished with this patch.

Sounds great, I'll start looking this over.

 I'm scheduling more DBT-2 tests at a high # of warehouses per Greg 
 Smith's suggestion just to see what happens, but I doubt that will 
 change my mind on the above decisions.

When do you expect to have those results?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-06-26 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:

Barring any objections from committer, I'm finished with this patch.


Sounds great, I'll start looking this over.

I'm scheduling more DBT-2 tests at a high # of warehouses per Greg 
Smith's suggestion just to see what happens, but I doubt that will 
change my mind on the above decisions.


When do you expect to have those results?


In a few days. I'm doing long tests because the variability in the 1h 
tests was very high.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-06-26 Thread Heikki Linnakangas

Michael Glaesemann wrote:


On Jun 26, 2007, at 13:49 , Heikki Linnakangas wrote:

Maximum is 0.9, to leave some headroom for fsync and any other things 
that need to happen during a checkpoint.


I think it might be more user-friendly to make the maximum 1 (meaning as 
much smoothing as you could possibly get) and internally reserve a 
certain amount off for whatever headroom might be required. It's more 
common for users to see a value range from 0 to 1 rather than 0 to 0.9.


It would then be counter-intuitive if you set it to 1.0, and see that 
your checkpoints consistently take 90% of the checkpoint interval.


We could just allow any value up to 1.0, and note in the docs that you 
should leave some headroom, unless you don't mind starting the next 
checkpoint a bit late. That actually sounds pretty good.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-06-26 Thread Gregory Stark

Heikki Linnakangas [EMAIL PROTECTED] writes:

 We could just allow any value up to 1.0, and note in the docs that you should
 leave some headroom, unless you don't mind starting the next checkpoint a bit
 late. That actually sounds pretty good.

What exactly happens if a checkpoint takes so long that the next checkpoint
starts. Aside from it not actually helping is there much reason to avoid this
situation? Have we ever actually tested it?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-06-26 Thread Heikki Linnakangas

Gregory Stark wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:


We could just allow any value up to 1.0, and note in the docs that you should
leave some headroom, unless you don't mind starting the next checkpoint a bit
late. That actually sounds pretty good.


What exactly happens if a checkpoint takes so long that the next checkpoint
starts. Aside from it not actually helping is there much reason to avoid this
situation? 


Not really. We might run out of preallocated WAL segments, and will have 
to create more. Recovery could be longer than expected since the real 
checkpoint interval ends up being longer, but you can't make very 
accurate recovery time estimations anyway.



Have we ever actually tested it?


I haven't.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-06-26 Thread Greg Smith

On Tue, 26 Jun 2007, Gregory Stark wrote:


What exactly happens if a checkpoint takes so long that the next checkpoint
starts. Aside from it not actually helping is there much reason to avoid this
situation? Have we ever actually tested it?


More segments get created, and because of how they are cleared at the 
beginning this causes its own mini-I/O storm through the same buffered 
write channel the checkpoint writes are going into (which way or may not 
be the same way normal WAL writes go, depending on whether you're using 
O_[D]SYNC WAL writes).  I've seen some weird and intermittant breakdowns 
from the contention that occurs when this happens, and it's certainly 
something to be avoided.


To test it you could just use a big buffer cache, write like mad to it, 
and make checkpoint_segments smaller than it should be for that workload. 
It's easy enough to kill yourself exactly this way right now though, and 
the fact that LDC gives you a parameter to aim this particular foot-gun 
more precisely isn't a big deal IMHO as long as the documentation is 
clear.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-06-26 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 We could just allow any value up to 1.0, and note in the docs that you 
 should leave some headroom, unless you don't mind starting the next 
 checkpoint a bit late. That actually sounds pretty good.

Yeah, that sounds fine.  There isn't actually any harm in starting a
checkpoint later than otherwise expected, is there?  The worst
consequence I can think of is a backend having to take time to
manufacture a new xlog segment, because we didn't finish a checkpoint
in time to recycle old ones.  This might be better handled by allowing
a bit more slop in the number of recycled-into-the-future xlog segments.

Come to think of it, shouldn't we be allowing some extra slop in the
number of future segments to account for xlog archiving delays, when
that's enabled?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-06-26 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:
We could just allow any value up to 1.0, and note in the docs that you 
should leave some headroom, unless you don't mind starting the next 
checkpoint a bit late. That actually sounds pretty good.


Yeah, that sounds fine.  There isn't actually any harm in starting a
checkpoint later than otherwise expected, is there?  The worst
consequence I can think of is a backend having to take time to
manufacture a new xlog segment, because we didn't finish a checkpoint
in time to recycle old ones.  This might be better handled by allowing
a bit more slop in the number of recycled-into-the-future xlog segments.

Come to think of it, shouldn't we be allowing some extra slop in the
number of future segments to account for xlog archiving delays, when
that's enabled?


XLogFileSlop is currently 2 * checkpoint_segments + 1 since last 
checkpoint, which is just enough to accommodate a checkpoint that lasts 
the full checkpoint interval. If we want to keep as much slop there as 
before, then yes that should be increased to (2 + 
checkpoint_completion_target) * checkpoint_segments + 1, or just 3 * 
checkpoint_segments if we want to keep it simple.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-06-26 Thread Greg Smith

On Tue, 26 Jun 2007, Heikki Linnakangas wrote:

I'm scheduling more DBT-2 tests at a high # of warehouses per Greg Smith's 
suggestion just to see what happens, but I doubt that will change my mind on 
the above decisions.


I don't either, at worst I'd expect a small documentation update perhaps 
with some warnings based on what's discovered there.  The form you've 
added checkpoint_completion_target in is sufficient to address all the 
serious concerns I had; I can turn it off, I can smooth just a bit without 
increasing recovery time too much, or I can go all out smooth.


Certainly no one should consider waiting for the tests I asked you about a 
hurdle to getting this patch committed, slowing that down was never my 
intention by bringing that up.  I'm just curious to see if anything 
scurries out of some the darker corners in this area when they're 
illuminated.  I'd actually like to see this get committed relatively soon 
because there's two interleaved merges stuck behind this one (the more 
verbose logging patch and the LRU modifications).


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate