Re: [PATCHES] Load distributed checkpoint V3

Heikki Linnakangas Thu, 05 Apr 2007 08:51:38 -0700

Greg Smith wrote:

On Thu, 5 Apr 2007, Heikki Linnakangas wrote:
Bgwriter has two goals:
1. keep enough buffers clean that normal backends never need to do awrite
2. smooth checkpoints by writing buffers ahead of time
Load distributed checkpoints will do 2. in a much better way than thebgwriter_all_* guc options. I think we should remove that aspect ofbgwriter in favor of this patch.
...
Let me suggest a different way of looking at this problem. At anymoment, some percentage of your buffer pool is dirty. Whether it's 0%or 100% dramatically changes what the background writer should bedoing. Whether most of the data is usage_count>0 or not also makes adifference. None of the current code has any idea what type of bufferpool they're working with, and therefore they don't have enoughinformation to make a well-informed prediction about what is going tohappen in the near future.

The purpose of the bgwriter_all_* settings is to shorten the duration ofthe eventual checkpoint. The reason to shorten the checkpoint durationis to limit the damage to other I/O activity it causes. My thinking isthat assuming the LDC patch is effective (agreed, needs more testing) atsmoothening the checkpoint, the duration doesn't matter anymore. Do youwant to argue there's other reasons to shorten the checkpoint duration?

I'll tell you what I did to the all-scan. I ran a few hundred hoursworth of background writer tests to collect data on what it does wrong,then wrote a prototype automatic background writer that resets theall-scan parameters based on what I found. It keeps a running estimateof how dirty the pool at large is using a weighted average of the mostrecent scan with the past history. From there, I have a simple modelthat predicts how much of the buffer we can scan in any interval, andintends to enforce a maximum bound on the amount of physical I/O you'rewilling to stream out. The beta code is sitting athttp://www.westnet.com/~gsmith/content/postgresql/bufmgr.c if you wantto see what I've done so far. The parts that are done work fine--aslong as you give it a reasonable % to scan by default, it will correctall_max_pages and the interval in real-time to meet the scan raterequested you want given how much is currently dirty; the I/O rate iscomputed but doesn't limit properly yet.

Nice. Enforcing a max bound on the I/O seems reasonable, if we acceptthat shortening the checkpoint is a goal.

Why haven't I brought this all up yet? Two reasons. The first isbecause it doesn't work on my system; checkpoints and overall throughputget worse when you try to shorten them by running the background writerat optimal aggressiveness. Under really heavy load, the writes slowdown as all the disk caches fill, the background writer fights withreads on the data that isn't in the mostly dirty cache (introducingmassive seek delays), it stops cleaning effectively, and it's better forit to not even try. My next generation of code was going to start withthe LRU flush and then only move onto the all-scan if there's timeleftover.
The second is that I just started to get useful results here in the lastfew weeks, and I assumed it's too big of a topic to start suggestingmajor redesigns to the background writer mechanism at that point (fromme at least!). I was waiting for 8.3 to freeze before even trying. Ifyou want to push through a redesign there, maybe you can get away withit at this late moment. But I ask that you please don't remove anythingfrom the current design until you have significant test results to backup that change.


Point taken. I need to start testing the LDC patch.

Since we're discussing this, let me tell what I've been thinking aboutthe lru cleaning behavior of bgwriter. ISTM that that's morestraigthforward to tune automatically. Bgwriter basically needs toensure that the next X buffers with usage_count=0 in the clock sweep areclean. X is the predicted number of buffers backends will evict untilthe next bgwriter round.

The number of buffers evicted by normal backends in a bgwriter_delayperiod is simple to keep track of, just increase a counter inStrategyGetBuffer and reset it when bgwriter wakes up. We can use thatas an estimate of X with some safety margin.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Re: [PATCHES] Load distributed checkpoint V3

Reply via email to