Re: [HACKERS] Per-table random_page_cost for tables that we know are always cached

Greg Smith Tue, 22 Apr 2008 22:32:27 -0700

On Tue, 22 Apr 2008, PFC wrote:

Example : let's imagine a "cache priority" setting.

Which we can presume the DBA will set incorrectly because the tools neededto set that right aren't easy to use.

An alternative would be for the background writer to keep some stats and dothe thing for us :
- begin bgwriter scan
- setup hashtable of [relid => page count]
- at each page that is scanned, increment "page count" for thisrelation...

I've already got a plan sketched out that does this I didn't manage to getfinished in time for 8.3. What I wanted it for was not for this purpose,but for instrumentation of what's in the cache that admins can look at.Right now you can get that out pg_buffercache, but that's kind ofintrusive because of the locks it takes. In many cases I'd be perfectlyhappy with an approximation of what's inside the buffer cache, accumulatedwhile the page header is being locked anyway as the BGW passed over it.And as you note having this data available can be handy for internalself-tuning as well once it's out there.

Jim threw out that you can just look at the page hit percentages instead.That's not completely true. If you've had some nasty query blow out yourbuffer cache, or if the server has been up a looong time and the totalstas don't really reflect recent reality, what's in the buffer cache andwhat the stats say have been historical cached can diverge.

This would not examine whatever is in the OS' cache, though.

I don't know that it's too unrealistic to model the OS as just being anextrapolated bigger version of the buffer cache. I can think of a coupleof ways those can diverge:

1) Popular pages that get high usage counts can end up with a higherrepresentation in shared_buffers than the OS

2) If you've being doing something like a bulk update, you can have lotsof pages that have been written recently in the OS cache that aren'treally accounted for fully in shared_buffers, because they never get ahigh enough usage count to stay there (only used once) but can fill the OScache as they're spooled up to write.

I'm not sure that either of these cases are so strong they invalidate yourbasic idea though. There's a pending 8.4 TODO to investigate whetherincreasing the maximum usage count a buffer can get would be animprovement. If that number got bumped up I could see (2) become more ofa problem.

I'd be a somewhat concerned about turning this mechanism on by defaultthough, at least at first. A hybrid approach that gives the DBA somecontrol might work well. Maybe have an "adjust estimates for cachecontents" knob that you can toggle on a per-session or per-table basis?


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Per-table random_page_cost for tables that we know are always cached

Reply via email to