On Tue, 22 Apr 2008, PFC wrote:

Example : let's imagine a "cache priority" setting.

Which we can presume the DBA will set incorrectly because the tools needed to set that right aren't easy to use.

An alternative would be for the background writer to keep some stats and do the thing for us :
- begin bgwriter scan
- setup hashtable of [relid => page count]
- at each page that is scanned, increment "page count" for this relation...

I've already got a plan sketched out that does this I didn't manage to get finished in time for 8.3. What I wanted it for was not for this purpose, but for instrumentation of what's in the cache that admins can look at. Right now you can get that out pg_buffercache, but that's kind of intrusive because of the locks it takes. In many cases I'd be perfectly happy with an approximation of what's inside the buffer cache, accumulated while the page header is being locked anyway as the BGW passed over it. And as you note having this data available can be handy for internal self-tuning as well once it's out there.

Jim threw out that you can just look at the page hit percentages instead. That's not completely true. If you've had some nasty query blow out your buffer cache, or if the server has been up a looong time and the total stas don't really reflect recent reality, what's in the buffer cache and what the stats say have been historical cached can diverge.

This would not examine whatever is in the OS' cache, though.

I don't know that it's too unrealistic to model the OS as just being an extrapolated bigger version of the buffer cache. I can think of a couple of ways those can diverge:

1) Popular pages that get high usage counts can end up with a higher representation in shared_buffers than the OS

2) If you've being doing something like a bulk update, you can have lots of pages that have been written recently in the OS cache that aren't really accounted for fully in shared_buffers, because they never get a high enough usage count to stay there (only used once) but can fill the OS cache as they're spooled up to write.

I'm not sure that either of these cases are so strong they invalidate your basic idea though. There's a pending 8.4 TODO to investigate whether increasing the maximum usage count a buffer can get would be an improvement. If that number got bumped up I could see (2) become more of a problem.

I'd be a somewhat concerned about turning this mechanism on by default though, at least at first. A hybrid approach that gives the DBA some control might work well. Maybe have an "adjust estimates for cache contents" knob that you can toggle on a per-session or per-table basis?

--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to