Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong
Added to TODO: * Consider wither increasing BM_MAX_USAGE_COUNT improves performance http://archives.postgresql.org/pgsql-hackers/2007-06/msg01007.php --- Gregory Stark wrote: > > "Tom Lane" <[EMAIL PROTECTED]> writes: > > > I don't really see why it's "overkill". > > Well I think it may be overkill in that we'll be writing out buffers that > still have a decent chance of being hit again. Effectively what we'll be doing > in the approximated LRU queue is writing out any buffer that reaches the 80% > point down the list. Even if it later gets hit and pulled up to the head > again. > > I suppose that's not wrong though, the whole idea of the clock sweep is that > that's precisely the level of precision to which it makes sense to approximate > the LRU. Ie, that any point in the top 20% is equivalent to any other and when > we use a buffer we want to promote it to somewhere "near" the head but any > point in the top 20% is good enough. Then any point in the last 20% should be > effectively "good enough" too be considered a target buffer to clean as well. > > If we find it's overkill then what we should consider doing is raising > BM_MAX_USAGE_COUNT. That's effectively tuning the percentage of the lru chain > that we decide we try to keep clean. > > -- > Gregory Stark > EnterpriseDB http://www.enterprisedb.com > > > ---(end of broadcast)--- > TIP 4: Have you searched our list archives? > >http://archives.postgresql.org -- Bruce Momjian <[EMAIL PROTECTED]>http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong
This has been saved for the 8.4 release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --- Gregory Stark wrote: > > "Tom Lane" <[EMAIL PROTECTED]> writes: > > > I don't really see why it's "overkill". > > Well I think it may be overkill in that we'll be writing out buffers that > still have a decent chance of being hit again. Effectively what we'll be doing > in the approximated LRU queue is writing out any buffer that reaches the 80% > point down the list. Even if it later gets hit and pulled up to the head > again. > > I suppose that's not wrong though, the whole idea of the clock sweep is that > that's precisely the level of precision to which it makes sense to approximate > the LRU. Ie, that any point in the top 20% is equivalent to any other and when > we use a buffer we want to promote it to somewhere "near" the head but any > point in the top 20% is good enough. Then any point in the last 20% should be > effectively "good enough" too be considered a target buffer to clean as well. > > If we find it's overkill then what we should consider doing is raising > BM_MAX_USAGE_COUNT. That's effectively tuning the percentage of the lru chain > that we decide we try to keep clean. > > -- > Gregory Stark > EnterpriseDB http://www.enterprisedb.com > > > ---(end of broadcast)--- > TIP 4: Have you searched our list archives? > >http://archives.postgresql.org -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong
On Fri, 29 Jun 2007, Jim Nasby wrote: On Jun 26, 2007, at 11:57 PM, Greg Smith wrote: I have a complete set of working code that tracks buffer usage statistics... Even if it's not used by bgwriter for self-tuning, having that information available would be very useful for anyone trying to hand-tune the system. The stats information that's in pg_stat_bgwriter combined with an occasional snapshot of the current pg_stat_buffercache (now with usage counts!) is just as useful. Right before freeze, I made sure everything I was using for hand-tuning in this area made it into one of those. Really all I do is collect that data as I happen to be scanning the buffer cache anyway. The way I'm keeping track of things internally is more intrusive to collect than something I'd like to be turned on by default just for information, and exposing what it knows to user-space isn't done yet. I was hoping to figure out a way to use it to help justify its overhead before bothering to optimize and report on it. The only reason I mentioned the code at all is because I didn't want anybody else to waste time writing that particular routine when I already have something that works for this purpose sitting around. Is this still a serious issue with LDC? Part of the reason I'm bugged about this area is that the scenario I'm bringing up--lots of dirty and high usage buffers in a pattern the BGW isn't good at writing causing buffer pool allocations to be slow--has the potential to get even worse with LDC. Right now, if you're in this particular failure mode, you can be "saved" by the next checkpoint because it is going to flush all the dirty buffers out as fast as possible and then you get to start over with a fairly clean slate. Once that stops happening, I've observed the potential to run into this sort of breakdown increase. I share Greg Stark's concern that we're going to end up wasting a lot of writes. I don't think the goal is to write buffers significantly faster than they have to in order to support new allocations; the idea is just to stop from ever scanning the same section more than once when it's not possible for it to find new things to do there. Right now there are substantial wasted CPU/locking resources if you try to tune the LRU writer up for a heavy load (by doing things like like increasing the percentage), as it just keeps scanning the same high-usage count buffers over and over. With the LRU now running during LDC, my gut feeling is its efficiency is even more important now than it used to be. If it's wasteful of resources, that's now even going to impact checkpoints, where before the two never happened at the same time. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong
"Jim Nasby" <[EMAIL PROTECTED]> writes: > Is this still a serious issue with LDC? I share Greg Stark's concern that > we're > going to end up wasting a lot of writes. I think that's Greg Smith's concern. I do think it's something that needs to be measured and watched for. It'll take some serious thought just to figure out what we need to measure. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong
On Jun 28, 2007, at 7:55 AM, Greg Smith wrote: On Thu, 28 Jun 2007, ITAGAKI Takahiro wrote: Do you need to increase shared_buffers in such case? If you have something going wild creating dirty buffers with a high usage count faster than they are being written to disk, increasing the size of the shared_buffers cache can just make the problem worse--now you have an ever bigger pile of dirty mess to shovel at checkpoint time. The existing background writers are particularly unsuited to helping out in this situation, I think the new planned implementation will be much better. Is this still a serious issue with LDC? I share Greg Stark's concern that we're going to end up wasting a lot of writes. Perhaps part of the problem is that we're using a single count to track buffer usage; perhaps we need separate counts for reads vs writes? -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong
On Jun 26, 2007, at 11:57 PM, Greg Smith wrote: On Wed, 27 Jun 2007, ITAGAKI Takahiro wrote: It might be good to use statistics information about buffer usage to modify X runtime. I have a complete set of working code that tracks buffer usage statistics as the background writer scans, so that it has an idea what % of the buffer cache is dirty, how many pages have each of the various usage counts, that sort of thing. The problem was that the existing BGW mechanisms were so clumsy and inefficient that giving them more information didn't make them usefully smarter. I'll revive that code again if it looks like it may help here. Even if it's not used by bgwriter for self-tuning, having that information available would be very useful for anyone trying to hand- tune the system. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong
On Thu, 28 Jun 2007, ITAGAKI Takahiro wrote: Do you need to increase shared_buffers in such case? If you have something going wild creating dirty buffers with a high usage count faster than they are being written to disk, increasing the size of the shared_buffers cache can just make the problem worse--now you have an ever bigger pile of dirty mess to shovel at checkpoint time. The existing background writers are particularly unsuited to helping out in this situation, I think the new planned implementation will be much better. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong
Greg Smith <[EMAIL PROTECTED]> wrote: > If your entire buffer cache is mostly filled with dirty buffers with high > usage counts, you are in for a long wait when you need new buffers > allocated and your next checkpoint is going to be traumatic. Do you need to increase shared_buffers in such case? I think the condition (mostly buffers have high usage counts) is very undesirable for us and near out-of-memory. We should deal with such cases, of course, but is it a more effective solution to make room in shared_buffers? Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong
On Wed, 27 Jun 2007, Gregory Stark wrote: I was seeing >90% dirty+usage_count>0 in the really ugly spots. You keep describing this as ugly but it sounds like a really good situation to me. The higher that percentage the better your cache hit ratio is. If your entire buffer cache is mostly filled with dirty buffers with high usage counts, you are in for a long wait when you need new buffers allocated and your next checkpoint is going to be traumatic. That's all I'm suggesting is a problem with that situation. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong
Gregory Stark <[EMAIL PROTECTED]> writes: > If we find it's overkill then what we should consider doing is raising > BM_MAX_USAGE_COUNT. That's effectively tuning the percentage of the lru chain > that we decide we try to keep clean. Yeah, I don't believe anyone has tried to do performance testing for different values of BM_MAX_USAGE_COUNT. It would be interesting to try that after all the dust settles. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong
"Tom Lane" <[EMAIL PROTECTED]> writes: > I don't really see why it's "overkill". Well I think it may be overkill in that we'll be writing out buffers that still have a decent chance of being hit again. Effectively what we'll be doing in the approximated LRU queue is writing out any buffer that reaches the 80% point down the list. Even if it later gets hit and pulled up to the head again. I suppose that's not wrong though, the whole idea of the clock sweep is that that's precisely the level of precision to which it makes sense to approximate the LRU. Ie, that any point in the top 20% is equivalent to any other and when we use a buffer we want to promote it to somewhere "near" the head but any point in the top 20% is good enough. Then any point in the last 20% should be effectively "good enough" too be considered a target buffer to clean as well. If we find it's overkill then what we should consider doing is raising BM_MAX_USAGE_COUNT. That's effectively tuning the percentage of the lru chain that we decide we try to keep clean. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong
Greg Smith <[EMAIL PROTECTED]> writes: > What may need to happen here is to add Tom's approach, but perhaps > restrain it using the current auto-tuning LRU patch's method of estimating > how many clean buffers are needed in the near future. Particularly on > large buffer caches, the idea of getting so far ahead of the sweep that > you're looping all the away around and following right behind the clock > sweep point may be overkill, but I think it will help enormously on > smaller caches that are often very dirty. I don't really see why it's "overkill". My assumption is that it won't really be hard to lap the clock sweep during startup --- most likely, on its first iteration the bgwriter will see all of the cache as not a candidate for writing (invalid, or at worst just touched) and will be caught up before any real load materializes. So the question is not can it get into that state, it's whether it can stay there under load. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong
"Greg Smith" <[EMAIL PROTECTED]> writes: > On Tue, 26 Jun 2007, Heikki Linnakangas wrote: > >> How much of the buffer cache do you think we should try to keep clean? And >> how large a percentage of the buffer cache do you think have usage_count=0 at >> any given point in time? > > What I discovered is that most of the really bad checkpoint pause cases I ran > into involved most of the buffer cache being dirty while also having a > non-zero > usage count, which left the background writer hard-pressed to work usefully > (the LRU writer couldn't do anything, and the all-scan was writing > wastefully). > I was seeing >90% dirty+usage_count>0 in the really ugly spots. You keep describing this as ugly but it sounds like a really good situation to me. The higher that percentage the better your cache hit ratio is. If you had 80% of the buffer cache be usage_count 0 that would be about average cache hit ratio. And if you had a cache hit ratio of zero then you would find as much as little as 50% of the buffers with usage_count>0. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong
On Tue, 26 Jun 2007, Heikki Linnakangas wrote: I haven't worked on [Greg's] patch. I started looking at this, using Itagaki's patch as the basis. The main focus of how I reworked things was to integrate the whole thing into the pg_stat_bgwriter mechanism. I thought that made the performance testing a lot easier to quantify; the original patch pushed out debug info into the logs which wasn't as helpful to me. I didn't do much with the actual approach, my version was still following Itagki's basic insight into the problem. I did change the smoothing method some, but as you say that's up for grabs anyway. Since you have the test environment ready, can you try alternative patches as well as they're proposed? The real upper limit on how much testing I can do is my home server's capabilities, which for example aren't robust enough disk-wise to run things like DBT2 on the scale I know you normally work on. I gots a disk for the database, one for the WAL, 256MB of cache on the controller, and a single dual-core procesor; can't fit too many warehouses here. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong
On Wed, 27 Jun 2007, ITAGAKI Takahiro wrote: It might be good to use statistics information about buffer usage to modify X runtime. I have a complete set of working code that tracks buffer usage statistics as the background writer scans, so that it has an idea what % of the buffer cache is dirty, how many pages have each of the various usage counts, that sort of thing. The problem was that the existing BGW mechanisms were so clumsy and inefficient that giving them more information didn't make them usefully smarter. I'll revive that code again if it looks like it may help here. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong
On Tue, 26 Jun 2007, Heikki Linnakangas wrote: How much of the buffer cache do you think we should try to keep clean? And how large a percentage of the buffer cache do you think have usage_count=0 at any given point in time? What I discovered is that most of the really bad checkpoint pause cases I ran into involved most of the buffer cache being dirty while also having a non-zero usage count, which left the background writer hard-pressed to work usefully (the LRU writer couldn't do anything, and the all-scan was writing wastefully). I was seeing >90% dirty+usage_count>0 in the really ugly spots. What I like about Tom's idea is that it will keep the LRU writer in the best possible zone for that case (writing out madly right behind the LRU sweeper as counts get to zero) while still being fine on the more normal ones like you describe. In particular, it should cut down on how much client backends write buffers in an overloaded case considerably. That will vary widely depending on your workload, of course, but keeping 1/4 of the buffer cache clean seems like overkill to me. What may need to happen here is to add Tom's approach, but perhaps restrain it using the current auto-tuning LRU patch's method of estimating how many clean buffers are needed in the near future. Particularly on large buffer caches, the idea of getting so far ahead of the sweep that you're looping all the away around and following right behind the clock sweep point may be overkill, but I think it will help enormously on smaller caches that are often very dirty. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong
Heikki Linnakangas <[EMAIL PROTECTED]> wrote: > Tom Lane wrote: > > In fact, the notion of the bgwriter's cleaning scan being "in front of" > > the clock sweep is entirely backward. It should try to be behind the > > sweep, ie, so far ahead that it's lapped the clock sweep and is trailing > > along right behind it, cleaning buffers immediately after their > > usage_count falls to zero. All the rest of the buffer arena is either > > clean or has positive usage_count. > > That will vary widely depending on your workload, of course, but keeping > 1/4 of the buffer cache clean seems like overkill to me. If any of those > buffers are re-dirtied after we write them, the write was a waste of time. Agreed intuitively, but I don't know how offen backends change usage_count 0 to 1. If the rate is high, backward-bgwriter would not work. It seems to happen frequently when we use large shared buffers. I read Tom is changing the bgwriter LRU policy from "clean dirty pages recycled soon" to "clean dirty pages just when they turn out to be less frequently used", right? I have another thought -- advancing bgwriter's sweep-startpoint a little ahead. [buf] 0lru Xbgw-start N |-|--->|-| I think X=0 is in the current behavior and X=N is in the backward-bgwriter. Are there any other appropriate values for X? It might be good to use statistics information about buffer usage to modify X runtime. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong
Greg Smith wrote: I broke Itagaki-san's patch into two pieces when I was doing the review cleanup on it specifically to make it easier to tinker with this part without losing some of its other neat features. Heikki, did you do anything with that LRU adjustment patch since I sent it out: http://archives.postgresql.org/pgsql-patches/2007-05/msg00142.php I like the idea of breaking down the patch into two parts, though I didn't like the bitmasked return code stuff in that first patch. I haven't worked on that patch. I started looking at this, using Itagaki's patch as the basis. In fact, as Tom posted his radical idea, I was writing down my thoughts on the bgwriter patch: I think regardless of the details of how bgwriter should work, the design is going have three parts: Part 1: Keeping track of how many buffers have been requested by backends since last bgwriter round. Part 2: An algorithm to turn that number into desired # of clean buffers we should have in front of the clock hand. That could include storing some historic data to use in the calculation. Part 3: A way to check that we have that many clean buffers in front of the clock hand. We might not do that exactly, an approximation would be enough. Itagaki's patch attached implements part 1 in the obvious way. A trivial implementation for part 2 is (desired # of clean buffers) = (buffers requested since last round). For part 3, we start from current clock hand and scan until we've seen/cleaned enough unpinned buffers with usage_count = 0, or until we reach bgwriter_lru_percent. I think we're good with part 1, but I'm sure everyone has their favourite idea for 2 and 3. Let's hear them now. Unless someone else has a burning desire to implement Tom's idea faster than me, I should be to build this new implementation myself in the next couple of days. I still have the test environment leftover from the last time I worked on this code, and I think everybody else who could handle this job has more important higher-level things they could be working on instead. Oh, that would be great! Since you have the test environment ready, can you try alternative patches as well as they're proposed? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong
Tom Lane wrote: I just had an epiphany, I think. As I wrote in the LDC discussion, http://archives.postgresql.org/pgsql-patches/2007-06/msg00294.php if the bgwriter's LRU-cleaning scan has advanced ahead of freelist.c's clock sweep pointer, then any buffers between them are either clean, or are pinned and/or have usage_count > 0 (in which case the bgwriter wouldn't bother to clean them, and freelist.c wouldn't consider them candidates for re-use). And *this invariant is not destroyed by the activities of other backends*. A backend cannot dirty a page without raising its usage_count from zero, and there are no race cases because the transition states will be pinned. This means that there is absolutely no point in having the bgwriter re-start its LRU scan from the clock sweep position each time, as it currently does. Any pages it revisits are not going to need cleaning. We might as well have it progress forward from where it stopped before. All true this far. Note that Itagaki-san's patch changes that though. With the patch, the LRU scan doesn't look for bgwriter_lru_maxpages dirty buffers to write. Instead, it checks that there's N (where N varies based on history) clean buffers with usage_count=0 in front of the clock sweep. If there isn't, it writes dirty buffers until there is again. In fact, the notion of the bgwriter's cleaning scan being "in front of" the clock sweep is entirely backward. It should try to be behind the sweep, ie, so far ahead that it's lapped the clock sweep and is trailing along right behind it, cleaning buffers immediately after their usage_count falls to zero. All the rest of the buffer arena is either clean or has positive usage_count. Really? How much of the buffer cache do you think we should try to keep clean? And how large a percentage of the buffer cache do you think have usage_count=0 at any given point in time? I'm not sure myself, but as a data point the usage counts on a quick DBT-2 test on my laptop look like this: usagecount | count +--- 0 | 1107 1 | 1459 2 | 459 3 | 235 4 | 352 5 | 481 | 3 NBuffers = 4096. That will vary widely depending on your workload, of course, but keeping 1/4 of the buffer cache clean seems like overkill to me. If any of those buffers are re-dirtied after we write them, the write was a waste of time. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong
Greg Smith <[EMAIL PROTECTED]> writes: > Unless someone else has a burning desire to implement Tom's idea faster > than me, I should be to build this new implementation myself in the next > couple of days. Sure, go for it. I'm going to work next on committing the LDC patch, but I'll try to avoid modifying any of the code involved in the LRU scan, so as to minimize merge problems for you. Now that we have a new plan for this, I think we can just omit any of the parts of the LDC patch that might have touched that code. I realized on re-reading that I'd misstated the conditions slightly: any time the cleaning scan falls behind the clock sweep at all (not necessarily a whole lap) it should forcibly advance its pointer to the current sweep position. This would mainly be relevant right at bgwriter startup, when it's starting from the sweep position and trying to get ahead; it might easily not be able to, until there's a lull in the demand for new buffers. (So until that happens, the changed code would work just the same as now: write the first lru_maxpages dirty buffers in front of the sweep point.) The main point of this change is that when there is a lull, the bgwriter will exploit it to get ahead, rather than sitting on its thumbs as it does today ... regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong
On Tue, 26 Jun 2007, Tom Lane wrote: It should try to be behind the sweep, ie, so far ahead that it's lapped the clock sweep and is trailing along right behind it, cleaning buffers immediately after their usage_count falls to zero. All the rest of the buffer arena is either clean or has positive usage_count. I've said before here that something has to fundamentally change with the LRU writer for it to ever be really useful, because most of the time it's executing over pages with a positive usage_count as you say here. One idea I threw out before was to have it premptively lower the usage counts as it scans ahead of the sweep point and then add the pages to the free list, which you rightly had some issues with. This suggestion of a change so you'd expect it to follow right behind the sweep point sounds like a better plan that should result in even less client back-end writes, and I really like a plan that finally casts the LRU writer control parameter in a MB/s context. (Some pointers to your comments when we've gone over this neighborhood before: http://archives.postgresql.org/pgsql-hackers/2007-03/msg00642.php http://archives.postgresql.org/pgsql-hackers/2007-04/msg00799.php ) I broke Itagaki-san's patch into two pieces when I was doing the review cleanup on it specifically to make it easier to tinker with this part without losing some of its other neat features. Heikki, did you do anything with that LRU adjustment patch since I sent it out: http://archives.postgresql.org/pgsql-patches/2007-05/msg00142.php I already fixed the race condition bug you found in my version of the code. Unless someone else has a burning desire to implement Tom's idea faster than me, I should be to build this new implementation myself in the next couple of days. I still have the test environment leftover from the last time I worked on this code, and I think everybody else who could handle this job has more important higher-level things they could be working on instead. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong
I just had an epiphany, I think. As I wrote in the LDC discussion, http://archives.postgresql.org/pgsql-patches/2007-06/msg00294.php if the bgwriter's LRU-cleaning scan has advanced ahead of freelist.c's clock sweep pointer, then any buffers between them are either clean, or are pinned and/or have usage_count > 0 (in which case the bgwriter wouldn't bother to clean them, and freelist.c wouldn't consider them candidates for re-use). And *this invariant is not destroyed by the activities of other backends*. A backend cannot dirty a page without raising its usage_count from zero, and there are no race cases because the transition states will be pinned. This means that there is absolutely no point in having the bgwriter re-start its LRU scan from the clock sweep position each time, as it currently does. Any pages it revisits are not going to need cleaning. We might as well have it progress forward from where it stopped before. In fact, the notion of the bgwriter's cleaning scan being "in front of" the clock sweep is entirely backward. It should try to be behind the sweep, ie, so far ahead that it's lapped the clock sweep and is trailing along right behind it, cleaning buffers immediately after their usage_count falls to zero. All the rest of the buffer arena is either clean or has positive usage_count. This means that we don't need the bgwriter_lru_percent parameter at all; all we need is the lru_maxpages limit on how much I/O to initiate per wakeup. On each wakeup, the bgwriter always cleans until either it's dumped lru_maxpages buffers, or it's caught up with the clock sweep. There is a risk that if the clock sweep manages to lap the bgwriter, the bgwriter would stop upon "catching up", when in reality there are dirty pages everywhere. This is easily prevented though, if we add to the shared BufferStrategyControl struct a counter that is incremented each time the clock sweep wraps around to buffer zero. (Essentially this counter stores the high-order bits of the sweep counter.) The bgwriter can then recognize having been lapped by comparing that counter to its own similar counter. If it does get lapped, it should advance its work pointer to the current sweep pointer and try to get ahead again. (There's no point in continuing to clean pages behind the sweep when those just ahead of it are dirty.) This idea changes the terms of discussion for Itagaki-san's automatic-adjustment-of-lru_maxpages patch. I'm not sure we'd still need it at all, as lru_maxpages would now be just an upper bound on the desired I/O rate, rather than the target itself. If we do still need such a patch, it probably needs to look a lot different than it does now. Comments? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly