2011/3/24 Jim Nasby <j...@nasby.net>:
> On Mar 22, 2011, at 11:46 AM, Cédric Villemain wrote:
>> 2011/3/22 Greg Stark <gsst...@mit.edu>:
>>> On Mon, Mar 21, 2011 at 6:08 PM, Jim Nasby <j...@nasby.net> wrote:
>>>> Has anyone looked at the overhead of measuring how long IO requests to the 
>>>> kernel take? If we did that not only could we get an idea of what our IO 
>>>> workload looked like, we could also figure out whether a block came out of 
>>>> cache or not. That information could potentially be useful to the planner, 
>>>> but even if the database couldn't use that knowledge itself it would be a 
>>>> damn useful statistic to have... IMHO, far more useful than our current 
>>>> hit rate statistics.
>>>>
>>>
>>> I've done this -- actually better, I used mincore to actually check
>>> whether the block was in cache before issuing the read -- but it turns
>>> out you can't get what you're looking for this way.
>>
>> The linux fincore() syscall never get in the kernel, maybe something
>> to revive...
>
> Is there an equivalent in other OSes? Could we use time measurement as an 
> alternative if not?

fincore() syscall is a shortcut for mmap+mincore calls, suggested by
people working on libprefetch.
see http://lwn.net/Articles/371538/

The alternative via time measurement is interesting, should be easy to
ouput both measures in pg_statio_* and see what happens...

>
>>>
>>> It turns out when you do this you see one block being read from disk
>>> followed by n blocks that all appear to be cache hits. Because they've
>>> been prefetched by the kernel.
>>
>> I did the same, I now believe that it is not very important to have
>> the very exact numbers.
>> Prefetech blocks *are* in memory when we request them, the first read
>> access read more than one block because the cost is the same.
>
> Yeah... there's places in the planner where we make guesses as to the 
> likelyhood of something being in-cache. If we could actually track complete 
> hit rate over time (PG buffers + FS cache), then we wouldn't have to guess at 
> things anymore.
>
> And having this info in pg_stats would be extremely valuable.

yes, also Robert wrote some interesting items to keep in mind when
thinking about that, in another thread, recently.
A fs-cache snapshot or just a 'percent_in_cache' per relation/file (?)
 is easy to do/add to some auto-analyze daemon.

*but* making a good use of it in the planner is not as trivial as it
looks. (i.e. without breaking what is working well)

Once I get time to add hooks in costsize.c, a simple extension can do
the trick. (just need some shared_buffers to keep FS-pg_stats and
hooks to use it in some places).

>
>>> What you end up with is actually something like the number of iops
>>> which is also an interesting measure but not really what you were
>>> looking for.
>>>
>>> My getrusage patch, which I should still dig out though it's rather
>>> too late to be committing now unless someone tells me otherwise, would
>>> tell you how much i/o a plan node actually did. But you won't know
>>> which blocks did the i/o since I was only tracking totals for the plan
>>> node. That's probably what you're looking for here.
>>
>> Please show us the patch :)
> --
> Jim C. Nasby, Database Architect                   j...@nasby.net
> 512.569.9461 (cell)                         http://jim.nasby.net
>
>
>



-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to