Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong

2008-03-11 Thread Bruce Momjian

Added to TODO:


* Consider wither increasing BM_MAX_USAGE_COUNT improves performance

  http://archives.postgresql.org/pgsql-hackers/2007-06/msg01007.php



---

Gregory Stark wrote:
 
 Tom Lane [EMAIL PROTECTED] writes:
 
  I don't really see why it's overkill.  
 
 Well I think it may be overkill in that we'll be writing out buffers that
 still have a decent chance of being hit again. Effectively what we'll be doing
 in the approximated LRU queue is writing out any buffer that reaches the 80%
 point down the list. Even if it later gets hit and pulled up to the head
 again.
 
 I suppose that's not wrong though, the whole idea of the clock sweep is that
 that's precisely the level of precision to which it makes sense to approximate
 the LRU. Ie, that any point in the top 20% is equivalent to any other and when
 we use a buffer we want to promote it to somewhere near the head but any
 point in the top 20% is good enough. Then any point in the last 20% should be
 effectively good enough too be considered a target buffer to clean as well.
 
 If we find it's overkill then what we should consider doing is raising
 BM_MAX_USAGE_COUNT. That's effectively tuning the percentage of the lru chain
 that we decide we try to keep clean.
 
 -- 
   Gregory Stark
   EnterpriseDB  http://www.enterprisedb.com
 
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong

2007-09-26 Thread Bruce Momjian

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Gregory Stark wrote:
 
 Tom Lane [EMAIL PROTECTED] writes:
 
  I don't really see why it's overkill.  
 
 Well I think it may be overkill in that we'll be writing out buffers that
 still have a decent chance of being hit again. Effectively what we'll be doing
 in the approximated LRU queue is writing out any buffer that reaches the 80%
 point down the list. Even if it later gets hit and pulled up to the head
 again.
 
 I suppose that's not wrong though, the whole idea of the clock sweep is that
 that's precisely the level of precision to which it makes sense to approximate
 the LRU. Ie, that any point in the top 20% is equivalent to any other and when
 we use a buffer we want to promote it to somewhere near the head but any
 point in the top 20% is good enough. Then any point in the last 20% should be
 effectively good enough too be considered a target buffer to clean as well.
 
 If we find it's overkill then what we should consider doing is raising
 BM_MAX_USAGE_COUNT. That's effectively tuning the percentage of the lru chain
 that we decide we try to keep clean.
 
 -- 
   Gregory Stark
   EnterpriseDB  http://www.enterprisedb.com
 
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong

2007-06-29 Thread Jim Nasby

On Jun 26, 2007, at 11:57 PM, Greg Smith wrote:

On Wed, 27 Jun 2007, ITAGAKI Takahiro wrote:

It might be good to use statistics information about buffer usage  
to modify X runtime.


I have a complete set of working code that tracks buffer usage  
statistics as the background writer scans, so that it has an idea  
what % of the buffer cache is dirty, how many pages have each of  
the various usage counts, that sort of thing.  The problem was that  
the existing BGW mechanisms were so clumsy and inefficient that  
giving them more information didn't make them usefully smarter.   
I'll revive that code again if it looks like it may help here.


Even if it's not used by bgwriter for self-tuning, having that  
information available would be very useful for anyone trying to hand- 
tune the system.

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong

2007-06-29 Thread Jim Nasby

On Jun 28, 2007, at 7:55 AM, Greg Smith wrote:

On Thu, 28 Jun 2007, ITAGAKI Takahiro wrote:

Do you need to increase shared_buffers in such case?


If you have something going wild creating dirty buffers with a high  
usage count faster than they are being written to disk, increasing  
the size of the shared_buffers cache can just make the problem  
worse--now you have an ever bigger pile of dirty mess to shovel at  
checkpoint time.  The existing background writers are particularly  
unsuited to helping out in this situation, I think the new planned  
implementation will be much better.


Is this still a serious issue with LDC? I share Greg Stark's concern  
that we're going to end up wasting a lot of writes.


Perhaps part of the problem is that we're using a single count to  
track buffer usage; perhaps we need separate counts for reads vs writes?

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong

2007-06-29 Thread Gregory Stark

Jim Nasby [EMAIL PROTECTED] writes:

 Is this still a serious issue with LDC? I share Greg Stark's concern that 
 we're
 going to end up wasting a lot of writes.

I think that's Greg Smith's concern. I do think it's something that needs to
be measured and watched for. It'll take some serious thought just to figure
out what we need to measure.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong

2007-06-29 Thread Greg Smith

On Fri, 29 Jun 2007, Jim Nasby wrote:


On Jun 26, 2007, at 11:57 PM, Greg Smith wrote:
I have a complete set of working code that tracks buffer usage 
statistics...


Even if it's not used by bgwriter for self-tuning, having that information 
available would be very useful for anyone trying to hand-tune the system.


The stats information that's in pg_stat_bgwriter combined with an 
occasional snapshot of the current pg_stat_buffercache (now with usage 
counts!) is just as useful.  Right before freeze, I made sure everything I 
was using for hand-tuning in this area made it into one of those.  Really 
all I do is collect that data as I happen to be scanning the buffer cache 
anyway.


The way I'm keeping track of things internally is more intrusive to 
collect than something I'd like to be turned on by default just for 
information, and exposing what it knows to user-space isn't done yet.  I 
was hoping to figure out a way to use it to help justify its overhead 
before bothering to optimize and report on it.  The only reason I 
mentioned the code at all is because I didn't want anybody else to waste 
time writing that particular routine when I already have something that 
works for this purpose sitting around.



Is this still a serious issue with LDC?


Part of the reason I'm bugged about this area is that the scenario I'm 
bringing up--lots of dirty and high usage buffers in a pattern the BGW 
isn't good at writing causing buffer pool allocations to be slow--has the 
potential to get even worse with LDC.  Right now, if you're in this 
particular failure mode, you can be saved by the next checkpoint because 
it is going to flush all the dirty buffers out as fast as possible and 
then you get to start over with a fairly clean slate.  Once that stops 
happening, I've observed the potential to run into this sort of breakdown 
increase.


I share Greg Stark's concern that we're going to end up wasting a lot of 
writes.


I don't think the goal is to write buffers significantly faster than they 
have to in order to support new allocations; the idea is just to stop from 
ever scanning the same section more than once when it's not possible for 
it to find new things to do there.  Right now there are substantial wasted 
CPU/locking resources if you try to tune the LRU writer up for a heavy 
load (by doing things like like increasing the percentage), as it just 
keeps scanning the same high-usage count buffers over and over.  With the 
LRU now running during LDC, my gut feeling is its efficiency is even more 
important now than it used to be.  If it's wasteful of resources, that's 
now even going to impact checkpoints, where before the two never happened 
at the same time.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong

2007-06-28 Thread Greg Smith

On Thu, 28 Jun 2007, ITAGAKI Takahiro wrote:


Do you need to increase shared_buffers in such case?


If you have something going wild creating dirty buffers with a high usage 
count faster than they are being written to disk, increasing the size of 
the shared_buffers cache can just make the problem worse--now you have an 
ever bigger pile of dirty mess to shovel at checkpoint time.  The existing 
background writers are particularly unsuited to helping out in this 
situation, I think the new planned implementation will be much better.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong

2007-06-27 Thread Gregory Stark
Greg Smith [EMAIL PROTECTED] writes:

 On Tue, 26 Jun 2007, Heikki Linnakangas wrote:

 How much of the buffer cache do you think we should try to keep clean? And
 how large a percentage of the buffer cache do you think have usage_count=0 at
 any given point in time?

 What I discovered is that most of the really bad checkpoint pause cases I ran
 into involved most of the buffer cache being dirty while also having a 
 non-zero
 usage count, which left the background writer hard-pressed to work usefully
 (the LRU writer couldn't do anything, and the all-scan was writing 
 wastefully).
 I was seeing 90% dirty+usage_count0 in the really ugly spots.

You keep describing this as ugly but it sounds like a really good situation to
me. The higher that percentage the better your cache hit ratio is. If you had
80% of the buffer cache be usage_count 0 that would be about average cache hit
ratio. And if you had a cache hit ratio of zero then you would find as much as
little as 50% of the buffers with usage_count0.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong

2007-06-27 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 I don't really see why it's overkill.  

Well I think it may be overkill in that we'll be writing out buffers that
still have a decent chance of being hit again. Effectively what we'll be doing
in the approximated LRU queue is writing out any buffer that reaches the 80%
point down the list. Even if it later gets hit and pulled up to the head
again.

I suppose that's not wrong though, the whole idea of the clock sweep is that
that's precisely the level of precision to which it makes sense to approximate
the LRU. Ie, that any point in the top 20% is equivalent to any other and when
we use a buffer we want to promote it to somewhere near the head but any
point in the top 20% is good enough. Then any point in the last 20% should be
effectively good enough too be considered a target buffer to clean as well.

If we find it's overkill then what we should consider doing is raising
BM_MAX_USAGE_COUNT. That's effectively tuning the percentage of the lru chain
that we decide we try to keep clean.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong

2007-06-27 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 If we find it's overkill then what we should consider doing is raising
 BM_MAX_USAGE_COUNT. That's effectively tuning the percentage of the lru chain
 that we decide we try to keep clean.

Yeah, I don't believe anyone has tried to do performance testing for
different values of BM_MAX_USAGE_COUNT.  It would be interesting to
try that after all the dust settles.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong

2007-06-27 Thread Greg Smith

On Wed, 27 Jun 2007, Gregory Stark wrote:


I was seeing 90% dirty+usage_count0 in the really ugly spots.


You keep describing this as ugly but it sounds like a really good situation to
me. The higher that percentage the better your cache hit ratio is.


If your entire buffer cache is mostly filled with dirty buffers with high 
usage counts, you are in for a long wait when you need new buffers 
allocated and your next checkpoint is going to be traumatic.  That's all 
I'm suggesting is a problem with that situation.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong

2007-06-27 Thread ITAGAKI Takahiro
Greg Smith [EMAIL PROTECTED] wrote:

 If your entire buffer cache is mostly filled with dirty buffers with high 
 usage counts, you are in for a long wait when you need new buffers 
 allocated and your next checkpoint is going to be traumatic.

Do you need to increase shared_buffers in such case?

I think the condition (mostly buffers have high usage counts) is
very undesirable for us and near out-of-memory. We should deal with
such cases, of course, but is it a more effective solution to make
room in shared_buffers?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong

2007-06-26 Thread Tom Lane
I just had an epiphany, I think.

As I wrote in the LDC discussion,
http://archives.postgresql.org/pgsql-patches/2007-06/msg00294.php
if the bgwriter's LRU-cleaning scan has advanced ahead of freelist.c's
clock sweep pointer, then any buffers between them are either clean,
or are pinned and/or have usage_count  0 (in which case the bgwriter
wouldn't bother to clean them, and freelist.c wouldn't consider them
candidates for re-use).  And *this invariant is not destroyed by the
activities of other backends*.  A backend cannot dirty a page without
raising its usage_count from zero, and there are no race cases because
the transition states will be pinned.

This means that there is absolutely no point in having the bgwriter
re-start its LRU scan from the clock sweep position each time, as
it currently does.  Any pages it revisits are not going to need
cleaning.  We might as well have it progress forward from where it
stopped before.

In fact, the notion of the bgwriter's cleaning scan being in front of
the clock sweep is entirely backward.  It should try to be behind the
sweep, ie, so far ahead that it's lapped the clock sweep and is trailing
along right behind it, cleaning buffers immediately after their
usage_count falls to zero.  All the rest of the buffer arena is either
clean or has positive usage_count.

This means that we don't need the bgwriter_lru_percent parameter at all;
all we need is the lru_maxpages limit on how much I/O to initiate per
wakeup.  On each wakeup, the bgwriter always cleans until either it's
dumped lru_maxpages buffers, or it's caught up with the clock sweep.

There is a risk that if the clock sweep manages to lap the bgwriter,
the bgwriter would stop upon catching up, when in reality there are
dirty pages everywhere.  This is easily prevented though, if we add
to the shared BufferStrategyControl struct a counter that is incremented
each time the clock sweep wraps around to buffer zero.  (Essentially
this counter stores the high-order bits of the sweep counter.)  The
bgwriter can then recognize having been lapped by comparing that counter
to its own similar counter.  If it does get lapped, it should advance
its work pointer to the current sweep pointer and try to get ahead
again.  (There's no point in continuing to clean pages behind the sweep
when those just ahead of it are dirty.)

This idea changes the terms of discussion for Itagaki-san's
automatic-adjustment-of-lru_maxpages patch.  I'm not sure we'd still
need it at all, as lru_maxpages would now be just an upper bound on the
desired I/O rate, rather than the target itself.  If we do still need
such a patch, it probably needs to look a lot different than it does
now.

Comments?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong

2007-06-26 Thread Greg Smith

On Tue, 26 Jun 2007, Tom Lane wrote:

It should try to be behind the sweep, ie, so far ahead that it's lapped 
the clock sweep and is trailing along right behind it, cleaning buffers 
immediately after their usage_count falls to zero.  All the rest of the 
buffer arena is either clean or has positive usage_count.


I've said before here that something has to fundamentally change with the 
LRU writer for it to ever be really useful, because most of the time it's 
executing over pages with a positive usage_count as you say here.  One 
idea I threw out before was to have it premptively lower the usage counts 
as it scans ahead of the sweep point and then add the pages to the free 
list, which you rightly had some issues with.  This suggestion of a change 
so you'd expect it to follow right behind the sweep point sounds like a 
better plan that should result in even less client back-end writes, and I 
really like a plan that finally casts the LRU writer control parameter in 
a MB/s context.


(Some pointers to your comments when we've gone over this neighborhood 
before: http://archives.postgresql.org/pgsql-hackers/2007-03/msg00642.php 
http://archives.postgresql.org/pgsql-hackers/2007-04/msg00799.php )


I broke Itagaki-san's patch into two pieces when I was doing the review 
cleanup on it specifically to make it easier to tinker with this part 
without losing some of its other neat features.  Heikki, did you do 
anything with that LRU adjustment patch since I sent it out: 
http://archives.postgresql.org/pgsql-patches/2007-05/msg00142.php


I already fixed the race condition bug you found in my version of the 
code.


Unless someone else has a burning desire to implement Tom's idea faster 
than me, I should be to build this new implementation myself in the next 
couple of days.  I still have the test environment leftover from the last 
time I worked on this code, and I think everybody else who could handle 
this job has more important higher-level things they could be working on 
instead.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong

2007-06-26 Thread Tom Lane
Greg Smith [EMAIL PROTECTED] writes:
 Unless someone else has a burning desire to implement Tom's idea faster 
 than me, I should be to build this new implementation myself in the next 
 couple of days.

Sure, go for it.  I'm going to work next on committing the LDC patch,
but I'll try to avoid modifying any of the code involved in the LRU
scan, so as to minimize merge problems for you.  Now that we have a new
plan for this, I think we can just omit any of the parts of the LDC
patch that might have touched that code.

I realized on re-reading that I'd misstated the conditions slightly:
any time the cleaning scan falls behind the clock sweep at all (not
necessarily a whole lap) it should forcibly advance its pointer to the
current sweep position.  This would mainly be relevant right at bgwriter
startup, when it's starting from the sweep position and trying to get
ahead; it might easily not be able to, until there's a lull in the
demand for new buffers.  (So until that happens, the changed code would
work just the same as now: write the first lru_maxpages dirty buffers
in front of the sweep point.)  The main point of this change is that when
there is a lull, the bgwriter will exploit it to get ahead, rather than
sitting on its thumbs as it does today ...

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong

2007-06-26 Thread Heikki Linnakangas

Tom Lane wrote:

I just had an epiphany, I think.

As I wrote in the LDC discussion,
http://archives.postgresql.org/pgsql-patches/2007-06/msg00294.php
if the bgwriter's LRU-cleaning scan has advanced ahead of freelist.c's
clock sweep pointer, then any buffers between them are either clean,
or are pinned and/or have usage_count  0 (in which case the bgwriter
wouldn't bother to clean them, and freelist.c wouldn't consider them
candidates for re-use).  And *this invariant is not destroyed by the
activities of other backends*.  A backend cannot dirty a page without
raising its usage_count from zero, and there are no race cases because
the transition states will be pinned.

This means that there is absolutely no point in having the bgwriter
re-start its LRU scan from the clock sweep position each time, as
it currently does.  Any pages it revisits are not going to need
cleaning.  We might as well have it progress forward from where it
stopped before.


All true this far.

Note that Itagaki-san's patch changes that though. With the patch, the 
LRU scan doesn't look for bgwriter_lru_maxpages dirty buffers to write. 
Instead, it checks that there's N (where N varies based on history) 
clean buffers with usage_count=0 in front of the clock sweep. If there 
isn't, it writes dirty buffers until there is again.



In fact, the notion of the bgwriter's cleaning scan being in front of
the clock sweep is entirely backward.  It should try to be behind the
sweep, ie, so far ahead that it's lapped the clock sweep and is trailing
along right behind it, cleaning buffers immediately after their
usage_count falls to zero.  All the rest of the buffer arena is either
clean or has positive usage_count.


Really? How much of the buffer cache do you think we should try to keep 
clean? And how large a percentage of the buffer cache do you think have 
usage_count=0 at any given point in time? I'm not sure myself, but as a 
data point the usage counts on a quick DBT-2 test on my laptop look like 
this:


 usagecount | count
+---
  0 |  1107
  1 |  1459
  2 |   459
  3 |   235
  4 |   352
  5 |   481
| 3

NBuffers = 4096.

That will vary widely depending on your workload, of course, but keeping 
1/4 of the buffer cache clean seems like overkill to me. If any of those 
buffers are re-dirtied after we write them, the write was a waste of time.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong

2007-06-26 Thread Heikki Linnakangas

Greg Smith wrote:
I broke Itagaki-san's patch into two pieces when I was doing the review 
cleanup on it specifically to make it easier to tinker with this part 
without losing some of its other neat features.  Heikki, did you do 
anything with that LRU adjustment patch since I sent it out: 
http://archives.postgresql.org/pgsql-patches/2007-05/msg00142.php


I like the idea of breaking down the patch into two parts, though I 
didn't like the bitmasked return code stuff in that first patch.


I haven't worked on that patch. I started looking at this, using 
Itagaki's patch as the basis. In fact, as Tom posted his radical idea, I 
was writing down my thoughts on the bgwriter patch:


I think regardless of the details of how bgwriter should work, the
design is going have three parts:

Part 1: Keeping track of how many buffers have been requested by
backends since last bgwriter round.

Part 2: An algorithm to turn that number into desired # of clean buffers
we should have in front of the clock hand. That could include storing
some historic data to use in the calculation.

Part 3: A way to check that we have that many clean buffers in front of
the clock hand. We might not do that exactly, an approximation would be
enough.

Itagaki's patch attached implements part 1 in the obvious way. A trivial 
implementation for part 2 is (desired # of clean buffers) = (buffers
requested since last round). For part 3, we start from current clock 
hand and scan until we've seen/cleaned enough unpinned buffers with 
usage_count = 0, or until we reach bgwriter_lru_percent.


I think we're good with part 1, but I'm sure everyone has their 
favourite idea for 2 and 3. Let's hear them now.


Unless someone else has a burning desire to implement Tom's idea faster 
than me, I should be to build this new implementation myself in the next 
couple of days.  I still have the test environment leftover from the 
last time I worked on this code, and I think everybody else who could 
handle this job has more important higher-level things they could be 
working on instead.


Oh, that would be great! Since you have the test environment ready, can 
you try alternative patches as well as they're proposed?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong

2007-06-26 Thread ITAGAKI Takahiro
Heikki Linnakangas [EMAIL PROTECTED] wrote:

 Tom Lane wrote:
  In fact, the notion of the bgwriter's cleaning scan being in front of
  the clock sweep is entirely backward.  It should try to be behind the
  sweep, ie, so far ahead that it's lapped the clock sweep and is trailing
  along right behind it, cleaning buffers immediately after their
  usage_count falls to zero.  All the rest of the buffer arena is either
  clean or has positive usage_count.

 That will vary widely depending on your workload, of course, but keeping 
 1/4 of the buffer cache clean seems like overkill to me. If any of those 
 buffers are re-dirtied after we write them, the write was a waste of time.

Agreed intuitively, but I don't know how offen backends change usage_count
0 to 1. If the rate is high, backward-bgwriter would not work. It seems to
happen frequently when we use large shared buffers.

I read Tom is changing the bgwriter LRU policy from clean dirty pages
recycled soon to clean dirty pages just when they turn out to be less
frequently used, right? I have another thought -- advancing bgwriter's
sweep-startpoint a little ahead. 

[buf] 0lru Xbgw-start  N
  |-|---|-|

I think X=0 is in the current behavior and X=N is in the backward-bgwriter.
Are there any other appropriate values for X? It might be good to use
statistics information about buffer usage to modify X runtime.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong

2007-06-26 Thread Greg Smith

On Tue, 26 Jun 2007, Heikki Linnakangas wrote:

How much of the buffer cache do you think we should try to keep 
clean? And how large a percentage of the buffer cache do you think have 
usage_count=0 at any given point in time?


What I discovered is that most of the really bad checkpoint pause cases I 
ran into involved most of the buffer cache being dirty while also having a 
non-zero usage count, which left the background writer hard-pressed to 
work usefully (the LRU writer couldn't do anything, and the all-scan was 
writing wastefully).  I was seeing 90% dirty+usage_count0 in the really 
ugly spots.


What I like about Tom's idea is that it will keep the LRU writer in the 
best possible zone for that case (writing out madly right behind the LRU 
sweeper as counts get to zero) while still being fine on the more normal 
ones like you describe.  In particular, it should cut down on how much 
client backends write buffers in an overloaded case considerably.


That will vary widely depending on your workload, of course, but keeping 1/4 
of the buffer cache clean seems like overkill to me.


What may need to happen here is to add Tom's approach, but perhaps 
restrain it using the current auto-tuning LRU patch's method of estimating 
how many clean buffers are needed in the near future.  Particularly on 
large buffer caches, the idea of getting so far ahead of the sweep that 
you're looping all the away around and following right behind the clock 
sweep point may be overkill, but I think it will help enormously on 
smaller caches that are often very dirty.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong

2007-06-26 Thread Greg Smith

On Wed, 27 Jun 2007, ITAGAKI Takahiro wrote:

It might be good to use statistics information about buffer usage to 
modify X runtime.


I have a complete set of working code that tracks buffer usage statistics 
as the background writer scans, so that it has an idea what % of the 
buffer cache is dirty, how many pages have each of the various usage 
counts, that sort of thing.  The problem was that the existing BGW 
mechanisms were so clumsy and inefficient that giving them more 
information didn't make them usefully smarter.  I'll revive that code 
again if it looks like it may help here.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Bgwriter LRU cleaning: we've been going at this all wrong

2007-06-26 Thread Greg Smith


On Tue, 26 Jun 2007, Heikki Linnakangas wrote:

I haven't worked on [Greg's] patch. I started looking at this, using 
Itagaki's patch as the basis.


The main focus of how I reworked things was to integrate the whole thing 
into the pg_stat_bgwriter mechanism.  I thought that made the performance 
testing a lot easier to quantify; the original patch pushed out debug info 
into the logs which wasn't as helpful to me.  I didn't do much with the 
actual approach, my version was still following Itagki's basic insight 
into the problem.  I did change the smoothing method some, but as you say 
that's up for grabs anyway.


Since you have the test environment ready, can you try alternative 
patches as well as they're proposed?


The real upper limit on how much testing I can do is my home server's 
capabilities, which for example aren't robust enough disk-wise to run 
things like DBT2 on the scale I know you normally work on.  I gots a disk 
for the database, one for the WAL, 256MB of cache on the controller, and a 
single dual-core procesor; can't fit too many warehouses here.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match