Re: [PATCHES] Load distributed checkpoint V3

Greg Smith Thu, 05 Apr 2007 07:24:07 -0700

On Thu, 5 Apr 2007, Heikki Linnakangas wrote:

Bgwriter has two goals:
1. keep enough buffers clean that normal backends never need to do a write
2. smooth checkpoints by writing buffers ahead of time
Load distributed checkpoints will do 2. in a much better way than thebgwriter_all_* guc options. I think we should remove that aspect of bgwriterin favor of this patch.

My first question about the LDC patch was whether I could turn it off andreturn to the existing mechanism. I would like to see a large pile ofdata proving this new approach is better before the old one goes away. Ithink everyone needs to do some more research and measurement here beforeassuming the problem can be knocked out so easily.

The reason I've been busy working on patches to gather statistics on thisarea of code is because I've tried most simple answers to getting thebackground writer to work better and made little progress, and I'd like tosee everyone else doing the same at least collecting the right data.

Let me suggest a different way of looking at this problem. At any moment,some percentage of your buffer pool is dirty. Whether it's 0% or 100%dramatically changes what the background writer should be doing. Whethermost of the data is usage_count>0 or not also makes a difference. None ofthe current code has any idea what type of buffer pool they're workingwith, and therefore they don't have enough information to make awell-informed prediction about what is going to happen in the near future.

I'll tell you what I did to the all-scan. I ran a few hundred hours worthof background writer tests to collect data on what it does wrong, thenwrote a prototype automatic background writer that resets the all-scanparameters based on what I found. It keeps a running estimate of howdirty the pool at large is using a weighted average of the most recentscan with the past history. From there, I have a simple model thatpredicts how much of the buffer we can scan in any interval, and intendsto enforce a maximum bound on the amount of physical I/O you're willing tostream out. The beta code is sitting athttp://www.westnet.com/~gsmith/content/postgresql/bufmgr.c if you want tosee what I've done so far. The parts that are done work fine--as long asyou give it a reasonable % to scan by default, it will correctall_max_pages and the interval in real-time to meet the scan raterequested you want given how much is currently dirty; the I/O rate iscomputed but doesn't limit properly yet.

Why haven't I brought this all up yet? Two reasons. The first is becauseit doesn't work on my system; checkpoints and overall throughput get worsewhen you try to shorten them by running the background writer at optimalaggressiveness. Under really heavy load, the writes slow down as all thedisk caches fill, the background writer fights with reads on the data thatisn't in the mostly dirty cache (introducing massive seek delays), itstops cleaning effectively, and it's better for it to not even try. Mynext generation of code was going to start with the LRU flush and thenonly move onto the all-scan if there's time leftover.

The second is that I just started to get useful results here in the lastfew weeks, and I assumed it's too big of a topic to start suggesting majorredesigns to the background writer mechanism at that point (from me atleast!). I was waiting for 8.3 to freeze before even trying. If you wantto push through a redesign there, maybe you can get away with it at thislate moment. But I ask that you please don't remove anything from thecurrent design until you have significant test results to back up thatchange.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Re: [PATCHES] Load distributed checkpoint V3

Reply via email to